Skip to content

Speeding up composer on AWS OpsWorks

At EasyBib, we're heavy users of composer and AWS OpsWorks. Since we recently moved a lot of our applications to a continuous deployment model, the benefits of speeding up the deployment process (~4-5 minutes) became more obvious.

Composer install

Whenever we run composer install, there are a lot of rount-trips between the server, our satis and Github (or Amazon S3).

One of my first ideas was to to get around a continous reinstall by symlinking the vendor directory between releases. This doesn't work consistently for two reasons:

What's a release?

OpsWorks, or Chef in particular, calls deployed code releases.

A release is the checkout/clone/download of your application and lives in /srv/www:

srv/
└── www
    └── my_app
        ├── current -> /srv/www/my_app/releases/20131008134950
        ├── releases
        └── shared

The releases directory, contains your application code and the latest is always symlinked into place using current.

Atomic deploys

  1. When deploying code, deploys need to be atomic. We don't want to break whatever is currently online — not even for a second or a fraction of it.
  2. We have to be able to roll-back deployments.

Symlinking the vendor directory between releases doesn't work because it would break existing code (because who knows how long the composer install takes or a restart of the application server) and it would require an additional safety net in place to be able to rollback a failed deployed successfully.

Ruby & Chef to the rescue

Whenever a deployment is run, Chef allows us to hook into the process using deploy hooks. These hooks are documented for OpsWorks as well.

The available hooks are:

  • before migrate
  • before symlink (!)
  • before restart
  • after restart

In order to use them, create a deploy directory in your application and put a couple ruby files in there:

  • before_migrate.rb
  • before_symlink.rb
  • before_restart.rb
  • after_restart.rb

If you're a little in the know about Rails, these hooks will look familiar.

The migration hook is probably used to run database migrations — something we don't do and probably will never do. ;-) But be assured: at this point in time the checkout of your applications is complete: or in other words, the code is on the instance.

The symlink hook is what we use to run composer install to get the web app up to speed, we'll take a closer look in a second.

Before restart is a hook used to run commands before your application server reloads — for example something like purging cache directories, whatever you want to get in order before /etc/init.d/php-fpm reload is executed to revive APC.

And last but not least, after restart — used on our applications to send an annotation to NewRelic — that we successfully deployed a new release.

Before symlink

So up until now, the before_symlink.rb looked like this:

composer_command = "/usr/local/bin/php"
composer_command << " #{release_path}/composer.phar"
composer_command << " --no-dev"
composer_command << " --prefer-source"
composer_command << " --optimize-autoloader"
composer_command << " install"

run "cd #{release_path} && #{composer_command}"

Note: release_path is a variable automatically available/populated in the scope of this script. If you need more, your node attributes are available as well.

Anyway, after reading Scaling Symfony2 with AWS OpsWorks, it inspired me to attempt to copy my vendors around. But instead of doing it in a recipe, I decided to use one of the available deploy hooks for this:

app_current = ::File.expand_path("#{release_path}/../../current")
vendor_dir  = "#{app_current}/vendor"

deploy_user  = "www-data"
deploy_group = "www-data"

release_vendor = "#{release_path}/vendor"

::FileUtils.cp_r vendor_dir, release_vendor if ::File.exists?(vendor_dir)
::FileUtils.chown_R deploy_user, deploy_group, release_vendor if ::File.exists?(release_vendor)

composer_command = "/usr/local/bin/php"
composer_command << " #{release_path}/composer.phar"
composer_command << " --no-dev"
composer_command << " --prefer-source"
composer_command << " --optimize-autoloader"
composer_command << " install"

run "cd #{release_path} && #{composer_command}"

Step by step:

  • copy the current release's vendor to the new release (if it exists)
  • chown all files to the webserver (if the new vendor exists)

This allows the deploy hook to complete, even if we're on a fresh instance.

Benchmarks?

Effectively, this cut deployment from 4-5 minutes, to 2-3 minutes. With tailwind, a 50% improvement.

FIN

That's all. Happy deploying!

SQL MAX() and GROUP BY for CouchDB

While re-writing a couple SQL statements into CouchDB we got stuck when we wanted to do a SELECT MAX(...), id ... GROUP BY id in CouchDB.

MySQL

Imagine the following SQL table with data:

mysql> SHOW FIELDS FROM deploys;
+-------------+-------------+------+-----+---------+-------+
| Field       | Type        | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| project     | varchar(10) | NO   |     | NULL    |       |
| deploy_time | datetime    | NO   |     | NULL    |       |
+-------------+-------------+------+-----+---------+-------+
2 rows in set (0.01 sec)

In order to get the latest deploy for each project, I'd issue:

mysql> SELECT MAX(deploy_time), project FROM deploys GROUP BY project;
+---------------------+----------+
| MAX(deploy_time)    | project  |
+---------------------+----------+
| 2013-10-04 22:01:26 | project1 |
| 2013-10-04 22:02:17 | project2 |
| 2013-10-04 22:02:20 | project3 |
+---------------------+----------+
3 rows in set (0.00 sec)

Simple. But what do you do in CouchDB?

CouchDB

My documents look like this:

{
  "_id": "hash",
  "project": "github-user/repo/branch",
  "deploy_time": {
     "date": "2013-10-04 22:02:20",
     /* ... */
  },
  /* ... */
}

So, after more than a couple hours trying to wrap our heads around map-reduce in CouchDB, it's working.

Here's the view's map function:

function (doc) {
  if (doc.project) {
    emit(doc.project, doc.deploy_time.date);
  }
}

This produces nice key value pairs — in fact, multiple — for each project.

And because the map-function returns multiple, we need to reduce our set.

So here is the reduce:

function (doc, values, rereduce) {
  var max_time = 0, max_value, time_parsed;
  values.forEach(function(deploy_date) {
    time_parsed = Date.parse(deploy_date.replace(/ /gi,'T'));
    if (max_time < time_parsed) {
      max_time = time_parsed;
      max_value = deploy_date;
    }
  });
  return max_value;
}

The little .replace(/ /gi,'T') took especially long to figure out. Special thanks to Cloudant's señor JS-date-engineer Adam Kocoloski for helping out. ;-)

Step by step

  • iterate over values
  • fix each value (add the "T") to make Spidermonkey comply
  • parse, and compare
  • return the "latest" in the end

A note of advice: To save yourself some trouble, install a local copy of Spidermonkey and test your view code in there and not in your browser.

Open the view in your browser: http://localhost/db/_design/deploys/_view/last_deploys?group=true

{
  "rows":[
    {
      "key":"testowner/testname/testbranch",
      "value":"2013-09-13 11:41:03"
    },
    {
      "key":"testowner/testname/testbranch2",
      "value":"2013-09-12 16:48:39"
    }
  ]
}

Yay, works!

FIN

That's all for today.

Bento and VirtualBox

Last week I blogged some Vagrant tips and pretty much jinxed the run I had in the past months.

Here's how:

  • I decided to upgrade to Vagrant 1.1, which broke bento: the current bento master is incompatible with Vagrant 1.1. But selecting the right rbenv env and installing the latest available Vagrant gem (inside the rbenv environment) fixed it.

  • My base box build, but for some reason, the guest addition setup broke and while it worked on Mac OSX, it broke the image completely on Ubuntu. Don't ask me why.

Especially the last bit reminded me to share another small tip, or work-around.

Fixing up base boxes

So every once in a while something doesn't work as expected. In my case, the guest additions were installed but not loaded and also failed to load when I started a VM. This in itself wasn't so bad, but it broke the vboxfs shares and while there may be cases where you don't need that (e.g. for a database VM), a VM without your code mounted into it is pretty useless.

The fix wasn't too hard:

  • I created a blank Vagrantfile (no recipes, just a simple box definition).
  • vagrant up and wait for the box to fail.
  • vagrant ssh to enter the box, or start with the GUI option and login through it.
  • execute sudo /etc/init.d/vboxadd setup

Then, exit the VM and execute the following: vagrant package vm_name --output box_name.box.

Import the box again with vagrant box add etc., update your Vagrantfile and test — then distribute.

Thoughs

Let me use this to reiterate on how critical a fixed set of versions are.

VirtualBox 4.2.x and bento (or veewee) seem to be incompatible somewhere and even though an error doesn't surface while the box is build or validated, it's still happening. It's paramount that there's always a rollback of some kind before you end up breaking too many things and stall your team for long.

Fin

That would be all for today.