I had three main objectives when scaling Pegleg to multiple instances:
I used these three criteria to evaluate the suitability of each solution for my purposes. Because of a lack of tools and time, I didn't load test any of these scenarios scientifically, so the decision on criteria 1 was made anecdotally (suitable enough for a small side-project like Pegleg). I'll circle back to load testing near the end.
Going from zero to hero took a few iterations (and a lot of research and trial-and-error). I ended up with three different deployment strategies for scaling Meteor:
After running on the manual AWS setup for a month, I ultimately ended up going with the private VPS option for simplicity and cost reasons that I'll get into below.
The free hosting provided by Meteor at meteor.com is great for deploying prototypes and toys but if your traffic starts to ramp up, you'll eventually have to move on to your own infrastructure. Given that deploying to meteor.com is basically a black box, there's some background information you need to deploy a Meteor app to your own stack:
As of Meteor 0.6.3.1, the current compatible version of node is 0.8.x, so this is what you'll need to install on your server(s).
You'll need to back up the data from the DB on your meteor.com site
Meteor uses MongoHQ and unless you have some special reason to run your own MongoDB server, I highly recommend doing the same. The sandbox account gives you 512MB of storage for free and the prices are reasonable beyond that.
You'll need to bundle your app either before or during deploy using
meteor bundle (or
If you add packages using Meteorite locally then you'll want Meteorite on the server as well
Meteor expects certain environment variables are set when the app is started:
The URL to your MongoDB instance using the mongodb:// protocol.
If you serve your site from a domain other than localhost, you'll need to set this so that URLs within your app point to the right place (Meteor.absoluteUrl depends on this variable being set).
The port the app server should run on. This will vary depending on your environment and setup as we'll discuss below.
Other packages may require specific environment variables (e.g. MAIL_URL).
When you're running more than one instance of your app, you'll need to spread the requests across them using a load balancer.
Some of the above is covered in the Meteor docs under "Running on your own infrastructure", which is a good thing to read before continuing.
After reading the docs and a lot of articles on using AWS, I set up an EC2 AMI with Node, NPM, Meteor and Forever installed on it (available publically as
ami-d4f196bd). Then I went through the ordeal of installing the AWS EC2 Tools locally on my machine, which isn't fun since the documentation isn't the greatest (It's doable, just not fun, I'm going to avoid going on that tangent because it's a post unto itself). Finally I spun up a few instances of my AMI, hooked up the Load Balancer and was ready to deploy. For notes on deploying to AWS with a Load Balancer, check out this blog post about load balancing on AWS.
I started off with Meteor.sh and modified it to handle multiple instances. Again, this requires installing the AWS API Tools. You can find the modified script in the following fork:
Since the bash script ran the deploy in sequence rather than in parallel, it quickly became unacceptably long to deploy to many instances. I needed a parallelized and easily customizable deploy process so I turned to the tool I have the most experience with: Capistrano.
Capistrano is a Ruby-based deploy tool that is generally used with Rails. It parallelizes deploys to multiple servers and gives you complex control over how the deploy is performed and what you can do both locally and on the server. Another added benefit of this type of deployment is that it's deployed straight from your Git repo, cloned right on the target server, so there are no files to copy and you'll never be unsure of which version is in production. Obviously this means you need to keep your code in Git and the repo has to be accessible at the server, but that's a pretty basic requirement for any development these days. You'll also need to install the railsless-deploy gem and add a
require railsless-deploy to your Capfile.
You can find the deploy script I created to do this in the following Gist: Capistrano AWS EC2 manual deploy script
This actually worked pretty well. In my app, I added a
/ping endpoint to my router that responded with
200 OK to act as the health check endpoint for the Load Balancer using the Router package, like this:
Meteor.Router.add('/ping', [200, "OK"]);
This was a decent solution but had some points against it:
During my research into my manual AWS deploy setup I kept coming across mentions of Elastic Beanstalk. Because I was in the thick of trying to figure out basics about AWS and EC2 I only took a cursory look into it until I had everything up and running. Once that first stab was working, I realised that with the Micro instances I would need a way of auto-scaling the number of instances when load went up and down to save cost, and for each new instance I would need to deploy. Turns out that's exactly what Elastic Beanstalk is for. AWS had recently released Elastic Beanstalk for Node.js so the stars were aligned.
EB is intended to set up everything you need for a production application: instances, load balancer, monitoring, and auto-scaling, all configurable through their web interface. It's got its own set of AWS Tools that need to be installed (just as annoying as the EC2 Tools, but now I had experience). These tools hook into Git and allow you to deploy with one command on the command-line once they're all setup:
It's a bit more complicated than that, though. To get your Meteor app up and running properly for each new instance requires a configuration file that you have to store in the
/.ebextensions subdir of your application. After parsing through the configuration docs and probably 50+ trial deploys, I finally got everything up and running with a green health check and a running app.
To save you some of the trouble, I've created a Gist with the working deploy script, which not only sets up Meteor and Meteorite, but does some custom tweaking to the built-in nginx server that serves static assets:
This option had potential but I didn't spend a lot of time with it once I got it working. It still depends on expensive EC2 instances and my mind was already on the next possible solution by the morning after I got this working.
A few people suggested checking out DigitalOcean and I even got a great $200 promo code to use so I decided to give them a try. The thinking was that one really beefy server could probably run several instances of a Meteor app and handle the load as well or better than many small ones. At the very least it wouldn't be as variable as trying to rely on AWS Micro instances and at $20/month, the cost was potentially way more manageable.
The setup I decided to pursue was nginx (for serving static assets) in front of both HAProxy (which can provide session affinity) and node/Meteor. Rather than distributing the load across server instances, I would be distributing the load across multiple app instances running on different ports on the same machine. This would all be deployed with Capistrano, with some slight tweaks to the script I used earlier for AWS.
So I setup my new droplet with the bare minimum to run a Meteor project: Node, NPM, Meteor, Meteorite, Forever (just as on EC2, but now I was a stone-cold expert). Then I added nginx and HAProxy and tweaked their configurations to serve the static assets from nginx and the app data from the load-balanced app servers. You can take a look at my configuration files here:
I've been running this setup for a couple of months now with 3 app instances behind the load balancer and haven't had a single moment of downtime or slowness. Once everything is committed and pushed to my git repo deploy is dead simple, all it takes is:
cap deploy -s instances=3
The only possible annoyance is if I want to run more instances on the same server I'll need to update my HAProxy configuration, which I could actually do via the deploy script with a little
sed magic, but for now it's not a problem I have.
There are many benefits to this setup:
This is a topic I don't know much about and from what I've read in the meteor-talk group, it's not a trivial task to figure out with Meteor applications. This whole thing would definitely be more scientific with some load numbers to back it up, but for a small app like mine, anecdotal results are good enough for me. In the final result, the app works great with minimal latency and no unexpected behaviour at all, so I'm happy. It would be interesting to hear results from others who have done some load testing against Meteor apps / Node apps in general to perhaps improve performance based on their findings.
I built Pegleg back in January as a side-project to learn Meteor and to help friends and film-buffs find full-length movies on YouTube by working together. Initially I was using the generously provided and dead simple free meteor.com hosting, which is the perfect platform for getting your Meteor prototypes some real-world users on the web. With a few blog posts about it and word of mouth, it slowly gained a bit of popularity.
After being up and slowly improved for about six or eight weeks, the link to Pegleg was posted on Hacker News on a Friday morning and made it to the home page. After an initial bout of euphoria about having hit on the holy grail of an HN home page mention, I was quickly hit with the reality of what that means. The free hosting provided by Meteor wasn't designed to handle that kind of load and the site slowed to an unusable crawl.
After trying to resolve things with the gracious and patient help of Kara and David at Meteor one thing became certain: I needed to scale off of the Meteor free hosting to meet the needs of the new influx of users. This was easier said than done because there wasn't much in the way of documentation on that front. David pointed me to this blog post about load balancing on AWS to get me started and that's where we'll begin this journey.
As a caveat, I'm certainly not a sys-ops master, and I often find jockeying servers and worrying about deployments to be a tedious necessary evil to get my sites online. This situation forced me to learn a bunch of the stuff I'd been wilfully ignoring for as long as I could. If you are great at this stuff I'd love to hear feedback on better ways to approach this.