Skip to content

Speeding up composer on AWS OpsWorks

At EasyBib, we're heavy users of composer and AWS OpsWorks. Since we recently moved a lot of our applications to a continuous deployment model, the benefits of speeding up the deployment process (~4-5 minutes) became more obvious.

Composer install

Whenever we run composer install, there are a lot of rount-trips between the server, our satis and Github (or Amazon S3).

One of my first ideas was to to get around a continous reinstall by symlinking the vendor directory between releases. This doesn't work consistently for two reasons:

What's a release?

OpsWorks, or Chef in particular, calls deployed code releases.

A release is the checkout/clone/download of your application and lives in /srv/www:

srv/
└── www
    └── my_app
        ├── current -> /srv/www/my_app/releases/20131008134950
        ├── releases
        └── shared

The releases directory, contains your application code and the latest is always symlinked into place using current.

Atomic deploys

  1. When deploying code, deploys need to be atomic. We don't want to break whatever is currently online — not even for a second or a fraction of it.
  2. We have to be able to roll-back deployments.

Symlinking the vendor directory between releases doesn't work because it would break existing code (because who knows how long the composer install takes or a restart of the application server) and it would require an additional safety net in place to be able to rollback a failed deployed successfully.

Ruby & Chef to the rescue

Whenever a deployment is run, Chef allows us to hook into the process using deploy hooks. These hooks are documented for OpsWorks as well.

The available hooks are:

  • before migrate
  • before symlink (!)
  • before restart
  • after restart

In order to use them, create a deploy directory in your application and put a couple ruby files in there:

  • before_migrate.rb
  • before_symlink.rb
  • before_restart.rb
  • after_restart.rb

If you're a little in the know about Rails, these hooks will look familiar.

The migration hook is probably used to run database migrations — something we don't do and probably will never do. ;-) But be assured: at this point in time the checkout of your applications is complete: or in other words, the code is on the instance.

The symlink hook is what we use to run composer install to get the web app up to speed, we'll take a closer look in a second.

Before restart is a hook used to run commands before your application server reloads — for example something like purging cache directories, whatever you want to get in order before /etc/init.d/php-fpm reload is executed to revive APC.

And last but not least, after restart — used on our applications to send an annotation to NewRelic — that we successfully deployed a new release.

Before symlink

So up until now, the before_symlink.rb looked like this:

composer_command = "/usr/local/bin/php"
composer_command << " #{release_path}/composer.phar"
composer_command << " --no-dev"
composer_command << " --prefer-source"
composer_command << " --optimize-autoloader"
composer_command << " install"

run "cd #{release_path} && #{composer_command}"

Note: release_path is a variable automatically available/populated in the scope of this script. If you need more, your node attributes are available as well.

Anyway, after reading Scaling Symfony2 with AWS OpsWorks, it inspired me to attempt to copy my vendors around. But instead of doing it in a recipe, I decided to use one of the available deploy hooks for this:

app_current = ::File.expand_path("#{release_path}/../../current")
vendor_dir  = "#{app_current}/vendor"

deploy_user  = "www-data"
deploy_group = "www-data"

release_vendor = "#{release_path}/vendor"

::FileUtils.cp_r vendor_dir, release_vendor if ::File.exists?(vendor_dir)
::FileUtils.chown_R deploy_user, deploy_group, release_vendor if ::File.exists?(release_vendor)

composer_command = "/usr/local/bin/php"
composer_command << " #{release_path}/composer.phar"
composer_command << " --no-dev"
composer_command << " --prefer-source"
composer_command << " --optimize-autoloader"
composer_command << " install"

run "cd #{release_path} && #{composer_command}"

Step by step:

  • copy the current release's vendor to the new release (if it exists)
  • chown all files to the webserver (if the new vendor exists)

This allows the deploy hook to complete, even if we're on a fresh instance.

Benchmarks?

Effectively, this cut deployment from 4-5 minutes, to 2-3 minutes. With tailwind, a 50% improvement.

FIN

That's all. Happy deploying!

Deploying PHP applications: PEAR and composer resources for chef

This is something experimental I have been working on for our chef deployments. So the objective was/is to find a sane way to install PEAR packages and install dependencies with composer.

execute in chef recipes

In chef recipes, almost everything is a resource. In case you're just getting started with Chef, a list of current resources is available on the Opscode Wiki. It's a link I put in my browser bar since I frequently work on chef recipes.

Some examples for resources are:

  • package (to install software)
  • cron (setup a crontab)
  • directory (create directories)
  • template (install customized configuration files)
  • user and group (to create users and groups)
  • mdadm (to setup a RAID)

The above list are examples — so there is more. But if there isn't a designated resource, you can always use an execute block.

An example for an execute block is the following:

execute "discover a pear channel" do
  command "pear channel-discover easybib.github.com/pear"
end

This works pretty well, but it is also not very robust.

Fail hard

By default whenever a command fails, chef fails hard.

To illustrate what I'm talking about, let's test and execute the command from our execute block multiple times on the shell to see its exit status ($?):

till:~/ $ pear channel-discover easybib.github.com/pear
Adding Channel "easybib.github.com/pear" succeeded
Discovery of channel "easybib.github.com/pear" succeeded
till:~/ $ echo $?
0
till:~/ $ pear channel-discover easybib.github.com/pear
Channel "easybib.github.com/pear" is already initialized
till:~/ $ echo $?
1

So whenever a command returns not 0, chef will bail.

One solution is to brute-force your way through these things with ignore_failure true in your execute block. But that's usually not a great idea either because it hides other issues from you (and me) when we need to debug this later on.

For example, if this PEAR channel is unavailable during your next chef-run, it would be much, much harder to find the root cause as of why the install commands failed.

Another solution is using the not_if or only_if options with execute:

execute "discover a pear channel" do
  command "pear channel-discover easybib.github.com/pear"
  not_if do
    `pear channel-info easybib.github.com/pear`
  end
end

If the command wrapped in not_if succeeds (success is exit status), we would skip the execute block.

Optimize?

Since I discovered not_if and only_if, it allows me write recipes which work in most cases. More robust code, which allows me to re-execute recipes on already running instances. So for example when I update a recipe or configuration file which is distributed through a recipe I can re-run the entire recipe and it will not fail but instead complete successfully.

One problem remains with this approach I end up doing the same checks again and again.

apt-repair-sources on Ubuntu

When I ran our setup on an instance the other day, I noticed how it failed with a "package not found" (or similar) error. After debugging this a bit, we discovered that Karmic moved from "archive.ubuntu.com" to "old-releases.ubuntu.com" (Probably diskspace or something — but who knows? :-)). And because the sources pointed to the former, it broke the bootstrap process on new and existing EC2 instances and Vagrant VMs for us. A truely consistent experience!

Whenever apt-get update is run in a chef-recipe and it exists with a non-zero status, the process is stopped. Of course there are ways to work around it (for example: ignore_failure true), but then again, most of these workarounds are hacks and not suitable for a production environment (IMHO, of course): we often discover new sources from launchpad PPAs and so on and it's paramount to want to know if discovery failed. You cannot assume that all went well

Scalarium fixed their AMI already and updated the sources to point to "old-releases". Running instances are of course still broken.

Enter apt-repair-sources

apt-repair-sources is a small (opinionated) tool written in Ruby.

It offers:

  • --dry-run (-d), which is the default
  • --fix-it-for-me (-f), which attempts to correct all problems

The reason why apt-repair-sources was written in Ruby is, that I wanted a tool to run with only the most basic setup (on Scalarium). Since Ruby comes installed by default, it was my weapon of choice (vs. Python or PHP). Another advantage was that I had an opportunity to check out more Ruby (aside from cooking with chef) and used this project to learn more anything about testing in Ruby (using Test::Unit).

Dry run

A dry run can be used to essentially debug the sources on a system.

Here's the output of a dry-run, and all is well:

till@dev:~/apt-repair-sources/bin$ ./apt-repair-sources 
There are no errors in /etc/apt/sources.list
There are no errors in /etc/apt/sources.list.d/chris-lea-node.js-lucid.list
There are no errors in /etc/apt/sources.list.d/node.list
There are no errors in /etc/apt/sources.list.d/chris-lea-redis-server.list
There are no errors in /etc/apt/sources.list.d/silverline.list

Here's the output of a system, where sources are currently broken:

tillklampaeckel@ulic:~/apt-repair-sources/bin$ ./apt-repair-sources 
/etc/apt/sources.list: http://us-east-1.ec2.archive.ubuntu.com/ubuntu/dists/karmic/main/binary-amd64/Packages.gz
/etc/apt/sources.list: http://us-east-1.ec2.archive.ubuntu.com/ubuntu/dists/karmic/main/source/Sources.gz
/etc/apt/sources.list: http://us-east-1.ec2.archive.ubuntu.com/ubuntu/dists/karmic-updates/main/binary-amd64/Packages.gz
/etc/apt/sources.list: http://us-east-1.ec2.archive.ubuntu.com/ubuntu/dists/karmic-updates/main/source/Sources.gz
/etc/apt/sources.list: http://security.ubuntu.com/ubuntu/dists/karmic-security/main/binary-amd64/Packages.gz
/etc/apt/sources.list: http://security.ubuntu.com/ubuntu/dists/karmic-security/main/source/Sources.gz
There are no errors in /etc/apt/sources.list.d/gearman-developers-ppa-karmic.list
/etc/apt/sources.list.d/karmic-multiverse.list: http://archive.ubuntu.com/ubuntu/dists/karmic/multiverse/binary-amd64/Packages.gz
/etc/apt/sources.list.d/karmic-multiverse.list: http://archive.ubuntu.com/ubuntu/dists/karmic/multiverse/source/Sources.gz
/etc/apt/sources.list.d/karmic-multiverse.list: http://archive.ubuntu.com/ubuntu/dists/karmic-updates/multiverse/binary-amd64/Packages.gz
/etc/apt/sources.list.d/karmic-multiverse.list: http://archive.ubuntu.com/ubuntu/dists/karmic-updates/multiverse/source/Sources.gz
/etc/apt/sources.list.d/karmic-multiverse.list: http://security.ubuntu.com/ubuntu/dists/karmic-security/multiverse/binary-amd64/Packages.gz
/etc/apt/sources.list.d/karmic-multiverse.list: http://security.ubuntu.com/ubuntu/dists/karmic-security/multiverse/source/Sources.gz

Problem?

Fix it for me

Fix it for me attempts to correct the sources like this:

  • sources with *.releases.ubuntu.com are moved to archive.ubuntu.com
  • sources with *.archive.ubuntu.com are moved to old-releases.ubuntu.com
  • sources with security.ubuntu.com are moved to old-releases.ubuntu.com

On top of these things, it will check Launchpad and third-party PPAs as well, if an issue is found, it'll just disable the entry in the sources file (by commenting it out: #).

Future releases will probably re-check commented out entries and also attempt to do some kind of sanity-checking of entries using the release name, etc.. These things are hard though and it might be the wrong approach to be opinionated here because e.g. Lucid packages sometimes also work on Karmic. Disabling these might break other things, etc..

Here's a run:

tillklampaeckel@ulic:~/apt-repair-sources/bin$ sudo ./apt-repair-sources -f
tillklampaeckel@ulic:~/apt-repair-sources/bin$ echo $?
0
tillklampaeckel@ulic:~/apt-repair-sources/bin$ ./apt-repair-sources
There are no errors in /etc/apt/sources.list
There are no errors in /etc/apt/sources.list.d/gearman-developers-ppa-karmic.list
There are no errors in /etc/apt/sources.list.d/karmic-multiverse.list

Great success!

Automation

Both modes usually exit with zero (0), which makes it easy to include them for bootstrap processes, general trouble-shooting or periodic cronjobs etc..

Reason to not exit with 0:

  • attempt to run apt-repair-sources on another distro than Ubuntu
  • old-releases.ubuntu.com is down
  • you run with -d and -f (which of course makes no sense :-))
  • trollop (a rubygem i use for CLI option parsing is not found)

Setup

Gems!

# sudo gem install apt-repair-sources

Manually

  • install Ruby Enterprise Edition (steal Karmic here; this should be your default anyway)
  • sudo gem install trollop (don't use what is in apt)
  • clone my repo: git clone git://github.com/lagged/apt-repair-sources.git
  • cd ./apt-repair/sources/bin && ./apt-repair-sources

Todo

  • create a gem
  • add support for Debian
  • improve my Ruby

Fin

Sure hope it's useful for someone else out there.

The code is on github, and I take pull-requests: https://github.com/lagged/apt-repair-sources

Operating CouchDB

These are some random operational things I learned about CouchDB. While I realize that my primary use-case (a CouchDB install with currently 230+ million documents) may be oversized for many, these are still things important things to know and to consider. And I would have loved to know of some of these before we grew that large.

I hope these findings are useful for others.

Compaction

CouchDB doesn't take great care of diskspace — the assumption is that disk is cheap. To get on top of it, you should run database and view compaction as often as you can. And the good news is, these operations help you to reclaim a lot of space (e.g. I've seen an uncompacted view of 200 GB trim down to ~30 GB).

Cleanup

In case you changed the actual view code, make sure to run the clean-up command (curl -X POST http://server/db/_view_cleanup) to regain disk space.

Performance impact

Database and view compaction (especially on larger sets) will slow down reads and writes considerably. Schedule downtime, or do it in off-peak hours if you think the operation is fast enough.

This is especially hideous when you run in a cloud environment where disk i/o generally sucks (OHAI, EBS!!!).

To stop either of those background-tasks, just restart CouchDB.

(Just fyi, the third option is of course to throw resources (hardware) at it.)

Resuming view compaction?

HA, HA! [Note, sarcasm!] — view compaction is not resumable, like database compaction.

View files

I suggest you split views into several design documents — this will have the following benefit.

For each design document, CouchDB will create a .view file (by default these are in var/lib/couchdb/database-name/.database-name_design/). It's just faster to run compact and cleanup operation on multiple (hopefully smaller files) versus one giant file.

In the end, you don't run the operation against the file directly, but against CouchDB — but CouchDB will deal with a smaller file which makes the operation faster and generally shorter — I call this poor man's view partitioning.

Warming the cache

Cache warming is when a cache is populated with items in order to avoid the cache and server being hit with too much traffic when a server starts up and here is what you can do with CouchDB in this regard.

The basics are obvious — updates to a CouchDB view are performed when the view is requested. This has certain advantages and works well in most situations. Something I noticed is that especially on smaller VPS servers where resources tend to be oversold and and are rare in general, generating view updates can slow your application down to a full stop.

As a matter of fact and CouchDB does often not respond during that operation when the disk was saturated (take into account that even a 2 GB database will get hard to work with if you only have 1 GB of RAM for CouchDB and the OS, and whatever else is running on the same server).

The options are:

  1. To get more traffic so views are constantly update and the updates performed are kept at a minimum.
  2. Make your application query views with ?stale=ok and instead update the views on a set interval, for example via a curl request in a cronjob.

Cache-warming for dummies, the CouchDB way.

View data

For various reasons such as space management and performance, it doesn't hurt to put all views on its own dedicated partition.

In order to do this, add the following to your local.ini (in [couchdb]): view_index_dir = /new_device

And assuming your database is called "helloworld" and the view dir is /new_device, your .view-files will end up in /new_device/.helloworld_design.

Overshard

I've blogged on CouchDB and CouchDB-Lounge before. No matter if you use the Lounge or build sharding into your application — consider it. From what I learned it's better to shard earlier (= overshard), than when it's too late. The faster your CouchDB grows, the more painful it will be to deal with all the data stuck in.

My suggestion is that when you rapidly approach 50,000,000 documents and see yourself exceeding (and maybe doubling) this number soon, take a deep breath and think about a sharding strategy.

Oversharding has the advantage that for example you run 10 CouchDB instances on the same server and move each of them (or a couple) to their own dedicated hardware once they exceed the resources of the single hardware.

If sharding is not your cup of tea, just outsource to Cloudant — they do a great job.

CouchDB-Lounge

CouchDB-Lounge is Meebo's python-based sharding framework for CouchDB. The lounge is essentially an nginx-webserver and a twistd service which proxies your requests to different shards using a shards.conf. The number of shards and also the level of redundancy are all defined in it.

CouchDB-Lounge is a great albeit young project. The current shortcomings IMHO include general stability of the twistd service and absence of features such as _bulk_docs which makes migrating a larger set into CouchDB-Lounge a tedious operation. Never the less, this is something to keep an eye on.

Related to CouchDB-Lounge, there's also lode — a JavaScript- and node.js-based rewrite of the Python lounge.

Erlang-Lounge

What I call Erlang-Lounge is Cloudant's internal erlang-based sharding framework for CouchDB. It's in production at Cloudant and to be released soon. From what I know Cloudant will probably offer a free opensource version and support once they released it.

Disk, CPU or memory — which is it?

This one is hard to say. But despite how awesome Erlang is, even CouchDB depends on the system's available resources. ;-)

Disk

For starters, disk i/o is almost always the bottleneck. To verify if this the bottleneck in your particular case, please run and analyze [iostat][] during certain operations which appear to be slow in your context. For everyone on AWS, consider a RAID-0 setup, for everyone else, buy faster disks.

CPU

The more CPU in a server, the more beam processes. CouchDB (or Erlang) seem to take great advantage of this resource. I haven't really figured out a connection between CPU and general performance though because in my case memory and disk were always the bottleneck.

Memory

... seems to be another underestimated bottleneck. For example, I noticed that replication can slow down to a state where it seems faster to copy-paste documents from one instance to another when CouchDB is unable to cache an entire b-tree in RAM.

We've been testing some things on a nifty high-memory 4XL AWS instance and during a compact operation, almost 90% of my ram (70 GB) was used by the OS to cache. And don't make my mistake and rely on (h)top to verify this, cat /proc/meminfo instead.

Caching

Caching is trivial with CouchDB.

e-tags

Each document and view responds with an Etag header — here is an example:

curl -I http://foo:[email protected]:5984/foobar/till-laptop_1273064525
HTTP/1.1 200 OK
Server: CouchDB/0.11.0a-1.0.7 (Erlang OTP/R13B)
Etag: "1-92b4825ffe4c61630d3bd9a456c7a9e0"
Date: Wed, 05 May 2010 13:20:12 GMT
Content-Type: text/plain;charset=utf-8
Content-Length: 1771
Cache-Control: must-revalidate

The Etag will only changes, when the data in the document change. Hence it's trivial to avoid hitting the database if you don't have to. The request above is a very lightweight HEAD request which only gathers the data and does not pull the entire document.

_changes

_changes represents a live-update feed of your CouchDB database. It's located at http://server/dbname/_changes.

Whenever a data changing operation is completed, _changes will reflect that, which makes it easy for a developer to stay on top to for example invalidate an application cache only when needed (and not like it's done usually when the cache expired).

Logging

Logrotate

First off, a lot of people run CouchDB from source which means that in 99% of all installs, the logrotation is not activated.

To fix this (on Ubuntu/Debian), do the following:

ln -s /usr/local/etc/logrotate.d/couch /etc/logrotate.d/couchdb

Make sure to familiarize yourself a little with logrotatet because depending on space and business of your installation, you should adjust the config a little to not run out of diskspace. If CouchDB is unable to log, it will crash.

Loglevel

In most cases it's more than alright to just run with a log level of error.

Add the following to your local.ini (in [log]): level = error

Log directory

Still running out of diskspace? Add the following to your local.ini (in [log]):

file = /path/to/more/diskspace/couch.log

... if you adjusted the above, you will need to correct the config for logrotate.d as well.

No logging?

Last but not least — if no logs are needed, just turn them off completely.

Fin

That's all kids.

A toolchain for CouchDB Lounge

One of our biggest issues with CouchDB is currently the lack of compaction of our database, and by lack of, I don't mean that CouchDB doesn't support it, I mean that we are unable to actually run it.

Compaction in a nutshell

Compaction in a nutshell is pretty cool.

As you know, CouchDB is not very space-efficient. For once, CouchDB saves revisions of all documents. Which means, whenever you update a document a new revision is saved. You can rollback any time, or expose it as a nifty feature in your application — regardless, those revisions are kept around until your database is compacted.

Think about it in terms of IMAP - emails are not deleted until you hit that magic "compact" button which 99% of all people who use IMAP don't know what it's for anyway.

Another thing is that whenever new documents are written to CouchDB and bulk mode is not used, it'll save them in a way which is not very efficient either. In terms of actual storage and indexing (so rumour has it).

Compaction woes

Since everything is simple with CouchDB, compaction is a simple process in CouchDB too. Yay!

When compaction is started, CouchDB will create a new database file where it stores the data in a very optimized way (I will not detail on this, go read a science book or google if you are really interested in this!). When the compaction process finished, CouchDB will exchange your old database file with the new database file.

The woes start with that e.g. when you have 700 GB uncompacted data, you will probably need another 400 GB for compaction to finish because it will create a second database file.

The second issue is that when you have constant writing on your database, the compaction process will actually never finish. It kind of sucks and for those people who aim to provide close to 100% availability, this is extremely painful to learn.