What's wrong with composer and your .travis.yml?

Sunday, August 24. 2014
Comments

I'm a huge advocate of CI and one service in particular called Travis-Ci.

Travis-CI runs a continuous integration platform for both open source and commercial products. In a nutshell: Travis-CI listens for a commit to a Github repository and runs your test suite. Simple as that, no Jenkins required.

At Imagine Easy we happily take advantage of both. :)

So what's wrong?

For some reason, every other open source project (and probably a lot of closed source projects), use Travis-CI wrong in a way, that it will eventually break your builds.

When exactly? Whenever Travis-CI promotes composer to be a first-class citizen on the platform and attempts to run composer install automatically for you.

There may be breakage, but there may also be slowdown because by then you may end up with not one, but two composer install runs before your tests actually run.

Here's what needs fixing

A lot of projects use composer like so:

language: php
before_script: composer install --dev
script: phpunit

Here's what you have to adjust

language: php
install: composer install --dev
script: phpunit

install vs. before_script

I had never seen the install target either. Not consciously at least. And since I don't do a lot of CI for Ruby projects, I wasn't exposed to it either. On a Ruby build, Travis-CI will automatically run bundler for you, using said install target.

order of execution

In a nutshell, here are the relevant targets and how the execute:

  1. before_install
  2. install
  3. before_script
  4. script

The future

The future is that Travis-CI will do the following:

  1. before_install will self-update the composer(.phar)
  2. install will run composer install
  3. There is also the rumour of a composer_opts (or similar) setting so you can provide something like --prefer-source to the install target, without having to add an install target

Fin

Is any of this your fault? I don't think so, since the documentation leaves a lot to be desired. Scanning it while writing this blog post, I can't find a mention of install target on the pages related to building PHP products.

Long story short: go update your configurations now! ;)

I've started with doctrine/cache and doctrine/dbal, and will make it a habit to send a PR each time I see a configuration which is not what it should be.

Speeding up composer on AWS OpsWorks

Tuesday, October 8. 2013
Comments

At EasyBib, we're heavy users of composer and AWS OpsWorks. Since we recently moved a lot of our applications to a continuous deployment model, the benefits of speeding up the deployment process (~4-5 minutes) became more obvious.

Composer install

Whenever we run composer install, there are a lot of rount-trips between the server, our satis and Github (or Amazon S3).

One of my first ideas was to to get around a continous reinstall by symlinking the vendor directory between releases. This doesn't work consistently for two reasons:

What's a release?

OpsWorks, or Chef in particular, calls deployed code releases.

A release is the checkout/clone/download of your application and lives in /srv/www:

srv/
└── www
    └── my_app
        ├── current -> /srv/www/my_app/releases/20131008134950
        ├── releases
        └── shared

The releases directory, contains your application code and the latest is always symlinked into place using current.

Atomic deploys

  1. When deploying code, deploys need to be atomic. We don't want to break whatever is currently online — not even for a second or a fraction of it.
  2. We have to be able to roll-back deployments.

Symlinking the vendor directory between releases doesn't work because it would break existing code (because who knows how long the composer install takes or a restart of the application server) and it would require an additional safety net in place to be able to rollback a failed deployed successfully.

Ruby & Chef to the rescue

Whenever a deployment is run, Chef allows us to hook into the process using deploy hooks. These hooks are documented for OpsWorks as well.

The available hooks are:

  • before migrate
  • before symlink (!)
  • before restart
  • after restart

In order to use them, create a deploy directory in your application and put a couple ruby files in there:

  • before_migrate.rb
  • before_symlink.rb
  • before_restart.rb
  • after_restart.rb

If you're a little in the know about Rails, these hooks will look familiar.

The migration hook is probably used to run database migrations — something we don't do and probably will never do. ;-) But be assured: at this point in time the checkout of your applications is complete: or in other words, the code is on the instance.

The symlink hook is what we use to run composer install to get the web app up to speed, we'll take a closer look in a second.

Before restart is a hook used to run commands before your application server reloads — for example something like purging cache directories, whatever you want to get in order before /etc/init.d/php-fpm reload is executed to revive APC.

And last but not least, after restart — used on our applications to send an annotation to NewRelic — that we successfully deployed a new release.

Before symlink

So up until now, the before_symlink.rb looked like this:

composer_command = "/usr/local/bin/php"
composer_command << " #{release_path}/composer.phar"
composer_command << " --no-dev"
composer_command << " --prefer-source"
composer_command << " --optimize-autoloader"
composer_command << " install"

run "cd #{release_path} && #{composer_command}"

Note: release_path is a variable automatically available/populated in the scope of this script. If you need more, your node attributes are available as well.

Anyway, after reading Scaling Symfony2 with AWS OpsWorks, it inspired me to attempt to copy my vendors around. But instead of doing it in a recipe, I decided to use one of the available deploy hooks for this:

app_current = ::File.expand_path("#{release_path}/../../current")
vendor_dir  = "#{app_current}/vendor"

deploy_user  = "www-data"
deploy_group = "www-data"

release_vendor = "#{release_path}/vendor"

::FileUtils.cp_r vendor_dir, release_vendor if ::File.exists?(vendor_dir)
::FileUtils.chown_R deploy_user, deploy_group, release_vendor if ::File.exists?(release_vendor)

composer_command = "/usr/local/bin/php"
composer_command << " #{release_path}/composer.phar"
composer_command << " --no-dev"
composer_command << " --prefer-source"
composer_command << " --optimize-autoloader"
composer_command << " install"

run "cd #{release_path} && #{composer_command}"

Step by step:

  • copy the current release's vendor to the new release (if it exists)
  • chown all files to the webserver (if the new vendor exists)

This allows the deploy hook to complete, even if we're on a fresh instance.

Benchmarks?

Effectively, this cut deployment from 4-5 minutes, to 2-3 minutes. With tailwind, a 50% improvement.

FIN

That's all. Happy deploying!

SQL MAX() and GROUP BY for CouchDB

Friday, October 4. 2013
Comments

While re-writing a couple SQL statements into CouchDB we got stuck when we wanted to do a SELECT MAX(...), id ... GROUP BY id in CouchDB.

MySQL

Imagine the following SQL table with data:

mysql> SHOW FIELDS FROM deploys;
+-------------+-------------+------+-----+---------+-------+
| Field       | Type        | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| project     | varchar(10) | NO   |     | NULL    |       |
| deploy_time | datetime    | NO   |     | NULL    |       |
+-------------+-------------+------+-----+---------+-------+
2 rows in set (0.01 sec)

In order to get the latest deploy for each project, I'd issue:

mysql> SELECT MAX(deploy_time), project FROM deploys GROUP BY project;
+---------------------+----------+
| MAX(deploy_time)    | project  |
+---------------------+----------+
| 2013-10-04 22:01:26 | project1 |
| 2013-10-04 22:02:17 | project2 |
| 2013-10-04 22:02:20 | project3 |
+---------------------+----------+
3 rows in set (0.00 sec)

Simple. But what do you do in CouchDB?

CouchDB

My documents look like this:

{
  "_id": "hash",
  "project": "github-user/repo/branch",
  "deploy_time": {
     "date": "2013-10-04 22:02:20",
     /* ... */
  },
  /* ... */
}

So, after more than a couple hours trying to wrap our heads around map-reduce in CouchDB, it's working.

Here's the view's map function:

function (doc) {
  if (doc.project) {
    emit(doc.project, doc.deploy_time.date);
  }
}

This produces nice key value pairs — in fact, multiple — for each project.

And because the map-function returns multiple, we need to reduce our set.

So here is the reduce:

function (doc, values, rereduce) {
  var max_time = 0, max_value, time_parsed;
  values.forEach(function(deploy_date) {
    time_parsed = Date.parse(deploy_date.replace(/ /gi,'T'));
    if (max_time < time_parsed) {
      max_time = time_parsed;
      max_value = deploy_date;
    }
  });
  return max_value;
}

The little .replace(/ /gi,'T') took especially long to figure out. Special thanks to Cloudant's señor JS-date-engineer Adam Kocoloski for helping out. ;-)

Step by step

  • iterate over values
  • fix each value (add the "T") to make Spidermonkey comply
  • parse, and compare
  • return the "latest" in the end

A note of advice: To save yourself some trouble, install a local copy of Spidermonkey and test your view code in there and not in your browser.

Open the view in your browser: http://localhost/db/_design/deploys/_view/last_deploys?group=true

{
  "rows":[
    {
      "key":"testowner/testname/testbranch",
      "value":"2013-09-13 11:41:03"
    },
    {
      "key":"testowner/testname/testbranch2",
      "value":"2013-09-12 16:48:39"
    }
  ]
}

Yay, works!

FIN

That's all for today.

Bento and VirtualBox

Tuesday, April 23. 2013
Comments

Last week I blogged some Vagrant tips and pretty much jinxed the run I had in the past months.

Here's how:

  • I decided to upgrade to Vagrant 1.1, which broke bento: the current bento master is incompatible with Vagrant 1.1. But selecting the right rbenv env and installing the latest available Vagrant gem (inside the rbenv environment) fixed it.

  • My base box build, but for some reason, the guest addition setup broke and while it worked on Mac OSX, it broke the image completely on Ubuntu. Don't ask me why.

Especially the last bit reminded me to share another small tip, or work-around.

Fixing up base boxes

So every once in a while something doesn't work as expected. In my case, the guest additions were installed but not loaded and also failed to load when I started a VM. This in itself wasn't so bad, but it broke the vboxfs shares and while there may be cases where you don't need that (e.g. for a database VM), a VM without your code mounted into it is pretty useless.

The fix wasn't too hard:

  • I created a blank Vagrantfile (no recipes, just a simple box definition).
  • vagrant up and wait for the box to fail.
  • vagrant ssh to enter the box, or start with the GUI option and login through it.
  • execute sudo /etc/init.d/vboxadd setup

Then, exit the VM and execute the following: vagrant package vm_name --output box_name.box.

Import the box again with vagrant box add etc., update your Vagrantfile and test — then distribute.

Thoughs

Let me use this to reiterate on how critical a fixed set of versions are.

VirtualBox 4.2.x and bento (or veewee) seem to be incompatible somewhere and even though an error doesn't surface while the box is build or validated, it's still happening. It's paramount that there's always a rollback of some kind before you end up breaking too many things and stall your team for long.

Fin

That would be all for today.

Wanderlust

Tuesday, April 16. 2013
Comments

At the last meetup of the Berlin PHP Usergroup, Christoph gave a talk about Vagrant.

Good enough of a reason to write down or re-cap some things I've learned with or about Vagrant over the last two years.

Base boxes

There are lots of base boxes available, but don't be tempted to rely on them (e.g. via config.vm.box_url).

  1. Vagrantbox.es doesn't actually mirror images and that is a huge pain.
  2. Available base boxes tend to be outdated. (Think kernel, packages, etc.) Running updates each time you provision is painful.
  3. Available base boxes use U.S. mirrors only/mostly/always — because we all live in the U.S. of A..
  4. Your VirtualBox guest additions may not match with your system and this may create random issues.

Your best bet is to create your own base box and establish a workflow e.g. using veewee or bento.

At EasyBib, we use bento and we created a definition which replaces the sources with Ubuntu's nifty mirror syntax (since we're pretty distributed at times, everyone appreciates this) and upgrades the base system. Either one of these tools introduce more Ruby into your organization and you may think "WTF — why do I need this?!", but the clear advantage is that no one has to write down a lot of steps how to recreate these boxes and anyone can do it.

In bento's case, the requirement is Ruby 1.9.1+ (getting this installed is IMO the hardest) and bundler. bundle install in your bento-clone gets you everything needed and then the three commands require build, validate and export a box which is ready to use. Ensure to put whatever you need into the definition — for example in the update.sh. Avoid too many manual steps before you export because the next person will have to know and repeat them. Bento serves as documentation as well.

I version our boxes with like easybib-something-10.04.4_vbox-4.1.8_0.1.box and upload them to an S3 bucket. The first number is the Ubuntu release and the second is the version of the VirtualBox guest additions. Simple. The third version is our internal iteration — typically a base box isn't perfect from the beginning while e.g. the Ubuntu and VirtualBox part are settled, there might be other improvements. With an extra version you avoid conflicts and extra work like vagrant box delete etc. and ensure the latest box is always used.

Also — in case the software stack is very different across your projects, it also helps to to create different boxes which come with different software pre-installed.

Standardize on versions

Vagrant and VirtualBox have frequent releases. I suggest to standardize on one so members of your team don't have random issues at hand and fires to fight.

Even for a small team of up to ten developers this makes a lot of sense. Because people tend to add a lot of randomness anyway — different hardware, operating systems and so on. Fight only the battles you want to fight, and deploy otherwise.

Vagrant also recently went from being a rubygem to providing installers. I haven't had the time to roll this out yet, but I expect this to help as well since at least as far as ruby is concerned all the dependencies are bundled.

This of course still implies that testing is required so you and your team don't walk into a stupid little regression and waste away the day trying to figure out what went wrong. And of course even if Vagrant is smoother, it still leaves you with VirtualBox and tools like bento and plenty of potential breakage.

Chef Versions

On a side-note — Chef 10 and 11 may also introduce a lot of breakage in recipes. It helps to roll your base box with a specific version as well. With bento the work-around was pretty straight forward: I replaced the chef-client.sh and installed Chef 10 (instead of 11 — or whatever the latest is).

VirtualBox and guest additions

In theory, it's alright to run with different guest additions in a box than the version of VirtualBox you have installed on the host. It should at least match the main release — for example: 4.1.8 guest additions and 4.1.12 VirtualBox should do fine. That's a big should though, because it also may cause random issues like crashes and hangs.

If you don't want or cannot rebuild the base box for some reason, you can also use vbguest which is a Vagrant plugin to update the guest additions when you start the virtual machine. Keep in mind that this adds a couple minutes to the bootstrapping.

Learn some Ruby

There are little things where it helps to know a little Ruby. And by Ruby, I don't mean Rails. A Vagrantfile itself is Ruby code — this implies that it is fully customizable.

An example of something we as a team couldn't agree on is the location of where projects (and essentially cookbooks) are located on your local disk. Every other team member has a different preference:

case ENV['USER']
when 'till'
  local_cookbook_dir = "~/Documents/workspaces/easybib-cookbooks"
when 'someonelse'
  local_cookbook_dir = "~/dev/till/easybib-cookbooks"
else
  local_cookbook_dir = "~/Sites/easybib/cookbooks"
end

if not File.directory?(File.expand_path(local_cookbook_dir))
  raise "You need to checkout your cookbooks into #{local_cookbook_dir}"
end

# ...

web_config.vm.provision :chef_solo do |chef|
  chef.cookbooks_path = local_cookbook_dir
  chef.add_recipe "ohai"
  # ...
end

It's as simple as that.

Another example — setting VirtualBox options for everyone but a certain user:

web_config.vm.boot_mode = :gui unless ENV['USER'] == 'mr_I_dont_run_X'

Bonus tip: Once you made changes, make sure to at least re-provision. Commit and push after!

Learn Chef or Puppet

I often see projects where developers end up writing a lot of shell script to bootstrap VMs, but learning Chef or Puppet is not really that hard.

I find it harder to validate exit codes (again and again and again) in bash than using a DSL (which is what Chef and Puppet essentially are). The code in your cookbooks (Chef) or manifests (Puppet) is certainly not faster than a shell script but a lot easier to read and more maintainable in the end.

Bash-scripting is not hard either, but in order to produce a set of scripts which can be ran again and again (not just to bootstrap a fresh VM but e.g. also to run updates on one that is running), defensive coding is paramount. And while that is certainly not impossible, it's often a waste of time when frameworks like Chef or Puppet have that covered.

But let's skip on the benefits of using identical tools to bootstrap Vagrant, staging and production because I find them more than obvious.

Learn some Linux

Every once in a while you will run into weird issues with the VMs. These may include one of your VMs losing connectivity (sudo restart networking to the rescue) or weird behavior like assets not refreshing (sendfile off; in nginx). Take it as an opportunity to learn some about the system that is run in production.

In the end all required configuration changes will go back into your provisioning and make sure to share your experience with at least the people on your team.

Backup everything

Whatever you find and use — make a copy of it and put it on Amazon S3 or the local network. With larger teams even a local Ubuntu mirror (or whatever you use) can come in handy.

This includes base boxes, packages, etc.. Nothing is more annoying than waking up and not being able to bootstrap your VMs because someone decided to remove something in order to force you to upgrade.

Don't dumb it down!

Typically, PHP applications are developed on a single host — Apache, PHP and MySQL on localhost. With Vagrant it becomes surprisingly easy to mimic production.

Not to say that I have to run 20 virtual machines to copy my cluster of application servers, but it's perfectly acceptable to set up an environment with four VMs where one is a loadbalancer, two are application servers and then a database server.

Networking and port forwards

Unless you regulary let others use your VMs, don't add port forwards — or at least install a firewall.

For networking, I suggest you either use static IPs (and keep track of them in a sheet) or DHCP. I prefer static IPs though since that makes configuration (e.g. of an application to connect to the database) easier.

It also doesn't hurt to assign names, so you know which VM you're dealing with when GUI is enabled:

    db_config.vm.customize [
      "modifyvm", :id,
      "--name", "DB",
    ]

Hardware

It doesn't hurt to have lots of CPU and RAM, but also configure the VMs accordingly. I run up to four virtual machines on a Macbook Air — usually configured with 256 to 512 MB. I imagine this would go smoother with VMWare Fusion, but since our team contains Mac and Linux as well, we haven't moved on this.

Here's an example how to give 512 MB RAM to a virtual machine:

    db_config.vm.customize [
      "modifyvm", :id,
      "--memory", "512"
    ]

Fin

That's all I can think of right now. Happy development!