Scaling Meteor

My experiences scaling Pegleg (pegleg.it), a Meteor project, beyond a single instance

View My GitHub Profile

It's Good To Have Goals

I had three main objectives when scaling Pegleg to multiple instances:

  1. Site handles the load and works correctly
  2. Simple deployment
  3. Low cost

I used these three criteria to evaluate the suitability of each solution for my purposes. Because of a lack of tools and time, I didn't load test any of these scenarios scientifically, so the decision on criteria 1 was made anecdotally (suitable enough for a small side-project like Pegleg). I'll circle back to load testing near the end.

Back to the Future (aka TL;DR)

Going from zero to hero took a few iterations (and a lot of research and trial-and-error). I ended up with three different deployment strategies for scaling Meteor:

  1. Manual deployment to AWS
  2. AWS Elastic Beanstalk deployment
  3. Custom deployment on private VPS

After running on the manual AWS setup for a month, I ultimately ended up going with the private VPS option for simplicity and cost reasons that I'll get into below.

Understanding The Problem

The free hosting provided by Meteor at meteor.com is great for deploying prototypes and toys but if your traffic starts to ramp up, you'll eventually have to move on to your own infrastructure. Given that deploying to meteor.com is basically a black box, there's some background information you need to deploy a Meteor app to your own stack:

Node

As of Meteor 0.6.3.1, the current compatible version of node is 0.8.x, so this is what you'll need to install on your server(s).

MongoDB

Backing up your data

You'll need to back up the data from the DB on your meteor.com site

Hosting your data

Meteor uses MongoHQ and unless you have some special reason to run your own MongoDB server, I highly recommend doing the same. The sandbox account gives you 512MB of storage for free and the prices are reasonable beyond that.

Meteor (and Meteorite)

Bundling your app

You'll need to bundle your app either before or during deploy using meteor bundle (or mrt bundle)

Packages

If you add packages using Meteorite locally then you'll want Meteorite on the server as well

Environment Variables

Meteor expects certain environment variables are set when the app is started:

MONGO_URL
The URL to your MongoDB instance using the mongodb:// protocol.

ROOT_URL
If you serve your site from a domain other than localhost, you'll need to set this so that URLs within your app point to the right place (Meteor.absoluteUrl depends on this variable being set).

PORT
The port the app server should run on. This will vary depending on your environment and setup as we'll discuss below.

Other packages may require specific environment variables (e.g. MAIL_URL).

Load Balancing & Session Affinity

When you're running more than one instance of your app, you'll need to spread the requests across them using a load balancer.

Some of the above is covered in the Meteor docs under "Running on your own infrastructure", which is a good thing to read before continuing.

Manual Deploy to AWS

After reading the docs and a lot of articles on using AWS, I set up an EC2 AMI with Node, NPM, Meteor and Forever installed on it (available publically as ami-d4f196bd). Then I went through the ordeal of installing the AWS EC2 Tools locally on my machine, which isn't fun since the documentation isn't the greatest (It's doable, just not fun, I'm going to avoid going on that tangent because it's a post unto itself). Finally I spun up a few instances of my AMI, hooked up the Load Balancer and was ready to deploy. For notes on deploying to AWS with a Load Balancer, check out this blog post about load balancing on AWS.

I started off with Meteor.sh and modified it to handle multiple instances. Again, this requires installing the AWS API Tools. You can find the modified script in the following fork:

Meteor.sh fork with support for multiple EC2 instances

Since the bash script ran the deploy in sequence rather than in parallel, it quickly became unacceptably long to deploy to many instances. I needed a parallelized and easily customizable deploy process so I turned to the tool I have the most experience with: Capistrano.

Capistrano is a Ruby-based deploy tool that is generally used with Rails. It parallelizes deploys to multiple servers and gives you complex control over how the deploy is performed and what you can do both locally and on the server. Another added benefit of this type of deployment is that it's deployed straight from your Git repo, cloned right on the target server, so there are no files to copy and you'll never be unsure of which version is in production. Obviously this means you need to keep your code in Git and the repo has to be accessible at the server, but that's a pretty basic requirement for any development these days. You'll also need to install the railsless-deploy gem and add a require railsless-deploy to your Capfile.

You can find the deploy script I created to do this in the following Gist: Capistrano AWS EC2 manual deploy script

This actually worked pretty well. In my app, I added a /ping endpoint to my router that responded with 200 OK to act as the health check endpoint for the Load Balancer using the Router package, like this:

Meteor.Router.add('/ping', [200, "OK"]);

This was a decent solution but had some points against it:

  1. Trying to deploy Meteor (or any Node apps) to Micro instances is doable but not that reliable. Because of the way Micro instances work, they'll give you short bursts of power but then they'll be throttled back down to pretty much nothing, making that instance pretty much unresponsive for a while.
  2. Going above the Micro level starts to cost a lot more money, so it's up to you whether you're willing to spend the money. Ultimately, I ended up spinning up 9 Micro instances that were running above 50% capacity for a month just to see what would happen since I was on the "free tier" (and because I'm stubborn) and it cost me $160. I may have been better off using a couple of Small instances instead, but overall the cost was way too high for this to be the end of the story for me.

AWS Elastic Beanstalk Deployment

During my research into my manual AWS deploy setup I kept coming across mentions of Elastic Beanstalk. Because I was in the thick of trying to figure out basics about AWS and EC2 I only took a cursory look into it until I had everything up and running. Once that first stab was working, I realised that with the Micro instances I would need a way of auto-scaling the number of instances when load went up and down to save cost, and for each new instance I would need to deploy. Turns out that's exactly what Elastic Beanstalk is for. AWS had recently released Elastic Beanstalk for Node.js so the stars were aligned.

EB is intended to set up everything you need for a production application: instances, load balancer, monitoring, and auto-scaling, all configurable through their web interface. It's got its own set of AWS Tools that need to be installed (just as annoying as the EC2 Tools, but now I had experience). These tools hook into Git and allow you to deploy with one command on the command-line once they're all setup:

git aws.push

It's a bit more complicated than that, though. To get your Meteor app up and running properly for each new instance requires a configuration file that you have to store in the /.ebextensions subdir of your application. After parsing through the configuration docs and probably 50+ trial deploys, I finally got everything up and running with a green health check and a running app.

To save you some of the trouble, I've created a Gist with the working deploy script, which not only sets up Meteor and Meteorite, but does some custom tweaking to the built-in nginx server that serves static assets:

EB Configuration File

This option had potential but I didn't spend a lot of time with it once I got it working. It still depends on expensive EC2 instances and my mind was already on the next possible solution by the morning after I got this working.

Custom Deployment on Private VPS

A few people suggested checking out DigitalOcean and I even got a great $200 promo code to use so I decided to give them a try. The thinking was that one really beefy server could probably run several instances of a Meteor app and handle the load as well or better than many small ones. At the very least it wouldn't be as variable as trying to rely on AWS Micro instances and at $20/month, the cost was potentially way more manageable.

The setup I decided to pursue was nginx (for serving static assets) in front of both HAProxy (which can provide session affinity) and node/Meteor. Rather than distributing the load across server instances, I would be distributing the load across multiple app instances running on different ports on the same machine. This would all be deployed with Capistrano, with some slight tweaks to the script I used earlier for AWS.

So I setup my new droplet with the bare minimum to run a Meteor project: Node, NPM, Meteor, Meteorite, Forever (just as on EC2, but now I was a stone-cold expert). Then I added nginx and HAProxy and tweaked their configurations to serve the static assets from nginx and the app data from the load-balanced app servers. You can take a look at my configuration files here:

  1. Nginx configuration
  2. HAProxy configuration
  3. Capistrano deploy script

I've been running this setup for a couple of months now with 3 app instances behind the load balancer and haven't had a single moment of downtime or slowness. Once everything is committed and pushed to my git repo deploy is dead simple, all it takes is:

cap deploy -s instances=3

The only possible annoyance is if I want to run more instances on the same server I'll need to update my HAProxy configuration, which I could actually do via the deploy script with a little sed magic, but for now it's not a problem I have.

There are many benefits to this setup:

  1. It's much, much cheaper than AWS ($20/month)
  2. It's way more reliable and requires fewer instances for the same load (anecdotally)
  3. Server setup is a one-time operation since all of the app instances are running on the same machine
  4. You can clone images to new machines as things grow (DigitalOcean calls them 'droplets')

Load Testing & Conclusion

This is a topic I don't know much about and from what I've read in the meteor-talk group, it's not a trivial task to figure out with Meteor applications. This whole thing would definitely be more scientific with some load numbers to back it up, but for a small app like mine, anecdotal results are good enough for me. In the final result, the app works great with minimal latency and no unexpected behaviour at all, so I'm happy. It would be interesting to hear results from others who have done some load testing against Meteor apps / Node apps in general to perhaps improve performance based on their findings.


Background (for those interested)

Beginnings

I built Pegleg back in January as a side-project to learn Meteor and to help friends and film-buffs find full-length movies on YouTube by working together. Initially I was using the generously provided and dead simple free meteor.com hosting, which is the perfect platform for getting your Meteor prototypes some real-world users on the web. With a few blog posts about it and word of mouth, it slowly gained a bit of popularity.

Blow-Up

After being up and slowly improved for about six or eight weeks, the link to Pegleg was posted on Hacker News on a Friday morning and made it to the home page. After an initial bout of euphoria about having hit on the holy grail of an HN home page mention, I was quickly hit with the reality of what that means. The free hosting provided by Meteor wasn't designed to handle that kind of load and the site slowed to an unusable crawl.

Short-Circuit

After trying to resolve things with the gracious and patient help of Kara and David at Meteor one thing became certain: I needed to scale off of the Meteor free hosting to meet the needs of the new influx of users. This was easier said than done because there wasn't much in the way of documentation on that front. David pointed me to this blog post about load balancing on AWS to get me started and that's where we'll begin this journey.

Batteries Not Included

As a caveat, I'm certainly not a sys-ops master, and I often find jockeying servers and worrying about deployments to be a tedious necessary evil to get my sites online. This situation forced me to learn a bunch of the stuff I'd been wilfully ignoring for as long as I could. If you are great at this stuff I'd love to hear feedback on better ways to approach this.