Skip to content

Wanderlust

At the last meetup of the Berlin PHP Usergroup, Christoph gave a talk about Vagrant.

Good enough of a reason to write down or re-cap some things I've learned with or about Vagrant over the last two years.

Base boxes

There are lots of base boxes available, but don't be tempted to rely on them (e.g. via config.vm.box_url).

  1. Vagrantbox.es doesn't actually mirror images and that is a huge pain.
  2. Available base boxes tend to be outdated. (Think kernel, packages, etc.) Running updates each time you provision is painful.
  3. Available base boxes use U.S. mirrors only/mostly/always — because we all live in the U.S. of A..
  4. Your VirtualBox guest additions may not match with your system and this may create random issues.

Your best bet is to create your own base box and establish a workflow e.g. using veewee or bento.

At EasyBib, we use bento and we created a definition which replaces the sources with Ubuntu's nifty mirror syntax (since we're pretty distributed at times, everyone appreciates this) and upgrades the base system. Either one of these tools introduce more Ruby into your organization and you may think "WTF — why do I need this?!", but the clear advantage is that no one has to write down a lot of steps how to recreate these boxes and anyone can do it.

In bento's case, the requirement is Ruby 1.9.1+ (getting this installed is IMO the hardest) and bundler. bundle install in your bento-clone gets you everything needed and then the three commands require build, validate and export a box which is ready to use. Ensure to put whatever you need into the definition — for example in the update.sh. Avoid too many manual steps before you export because the next person will have to know and repeat them. Bento serves as documentation as well.

I version our boxes with like easybib-something-10.04.4_vbox-4.1.8_0.1.box and upload them to an S3 bucket. The first number is the Ubuntu release and the second is the version of the VirtualBox guest additions. Simple. The third version is our internal iteration — typically a base box isn't perfect from the beginning while e.g. the Ubuntu and VirtualBox part are settled, there might be other improvements. With an extra version you avoid conflicts and extra work like vagrant box delete etc. and ensure the latest box is always used.

Also — in case the software stack is very different across your projects, it also helps to to create different boxes which come with different software pre-installed.

Standardize on versions

Vagrant and VirtualBox have frequent releases. I suggest to standardize on one so members of your team don't have random issues at hand and fires to fight.

Even for a small team of up to ten developers this makes a lot of sense. Because people tend to add a lot of randomness anyway — different hardware, operating systems and so on. Fight only the battles you want to fight, and deploy otherwise.

Vagrant also recently went from being a rubygem to providing installers. I haven't had the time to roll this out yet, but I expect this to help as well since at least as far as ruby is concerned all the dependencies are bundled.

This of course still implies that testing is required so you and your team don't walk into a stupid little regression and waste away the day trying to figure out what went wrong. And of course even if Vagrant is smoother, it still leaves you with VirtualBox and tools like bento and plenty of potential breakage.

Chef Versions

On a side-note — Chef 10 and 11 may also introduce a lot of breakage in recipes. It helps to roll your base box with a specific version as well. With bento the work-around was pretty straight forward: I replaced the chef-client.sh and installed Chef 10 (instead of 11 — or whatever the latest is).

VirtualBox and guest additions

In theory, it's alright to run with different guest additions in a box than the version of VirtualBox you have installed on the host. It should at least match the main release — for example: 4.1.8 guest additions and 4.1.12 VirtualBox should do fine. That's a big should though, because it also may cause random issues like crashes and hangs.

If you don't want or cannot rebuild the base box for some reason, you can also use vbguest which is a Vagrant plugin to update the guest additions when you start the virtual machine. Keep in mind that this adds a couple minutes to the bootstrapping.

Learn some Ruby

There are little things where it helps to know a little Ruby. And by Ruby, I don't mean Rails. A Vagrantfile itself is Ruby code — this implies that it is fully customizable.

An example of something we as a team couldn't agree on is the location of where projects (and essentially cookbooks) are located on your local disk. Every other team member has a different preference:

case ENV['USER']
when 'till'
  local_cookbook_dir = "~/Documents/workspaces/easybib-cookbooks"
when 'someonelse'
  local_cookbook_dir = "~/dev/till/easybib-cookbooks"
else
  local_cookbook_dir = "~/Sites/easybib/cookbooks"
end

if not File.directory?(File.expand_path(local_cookbook_dir))
  raise "You need to checkout your cookbooks into #{local_cookbook_dir}"
end

# ...

web_config.vm.provision :chef_solo do |chef|
  chef.cookbooks_path = local_cookbook_dir
  chef.add_recipe "ohai"
  # ...
end

It's as simple as that.

Another example — setting VirtualBox options for everyone but a certain user:

web_config.vm.boot_mode = :gui unless ENV['USER'] == 'mr_I_dont_run_X'

Bonus tip: Once you made changes, make sure to at least re-provision. Commit and push after!

Learn Chef or Puppet

I often see projects where developers end up writing a lot of shell script to bootstrap VMs, but learning Chef or Puppet is not really that hard.

I find it harder to validate exit codes (again and again and again) in bash than using a DSL (which is what Chef and Puppet essentially are). The code in your cookbooks (Chef) or manifests (Puppet) is certainly not faster than a shell script but a lot easier to read and more maintainable in the end.

Bash-scripting is not hard either, but in order to produce a set of scripts which can be ran again and again (not just to bootstrap a fresh VM but e.g. also to run updates on one that is running), defensive coding is paramount. And while that is certainly not impossible, it's often a waste of time when frameworks like Chef or Puppet have that covered.

But let's skip on the benefits of using identical tools to bootstrap Vagrant, staging and production because I find them more than obvious.

Learn some Linux

Every once in a while you will run into weird issues with the VMs. These may include one of your VMs losing connectivity (sudo restart networking to the rescue) or weird behavior like assets not refreshing (sendfile off; in nginx). Take it as an opportunity to learn some about the system that is run in production.

In the end all required configuration changes will go back into your provisioning and make sure to share your experience with at least the people on your team.

Backup everything

Whatever you find and use — make a copy of it and put it on Amazon S3 or the local network. With larger teams even a local Ubuntu mirror (or whatever you use) can come in handy.

This includes base boxes, packages, etc.. Nothing is more annoying than waking up and not being able to bootstrap your VMs because someone decided to remove something in order to force you to upgrade.

Don't dumb it down!

Typically, PHP applications are developed on a single host — Apache, PHP and MySQL on localhost. With Vagrant it becomes surprisingly easy to mimic production.

Not to say that I have to run 20 virtual machines to copy my cluster of application servers, but it's perfectly acceptable to set up an environment with four VMs where one is a loadbalancer, two are application servers and then a database server.

Networking and port forwards

Unless you regulary let others use your VMs, don't add port forwards — or at least install a firewall.

For networking, I suggest you either use static IPs (and keep track of them in a sheet) or DHCP. I prefer static IPs though since that makes configuration (e.g. of an application to connect to the database) easier.

It also doesn't hurt to assign names, so you know which VM you're dealing with when GUI is enabled:

    db_config.vm.customize [
      "modifyvm", :id,
      "--name", "DB",
    ]

Hardware

It doesn't hurt to have lots of CPU and RAM, but also configure the VMs accordingly. I run up to four virtual machines on a Macbook Air — usually configured with 256 to 512 MB. I imagine this would go smoother with VMWare Fusion, but since our team contains Mac and Linux as well, we haven't moved on this.

Here's an example how to give 512 MB RAM to a virtual machine:

    db_config.vm.customize [
      "modifyvm", :id,
      "--memory", "512"
    ]

Fin

That's all I can think of right now. Happy development!

From Unfuddle (svn) to git

I've blogged about converting a Subversion repository to git a couple times. While it was a tedious process at first, I've made my peace with it and now cannot count the code repositories I have migrated successfully anymore. The migration usually works, except for when I deal with our old provider unfuddle.

For some reason, sometimes it didn't work right away and I had to re-run git svn clone a couple of times to get it right.

git svn clone

Here's a snippet to make it work for you:

$ git svn clone --authors-file=./authors.txt --no-metadata \
--prefix=svn/company_repositoryname/ \
--tags=tags \       
--trunk=trunk \
--branches=branches \
http://company.unfuddle.com/svn/company_repositoryname \
./new-git-repo

Make sure to replace the company and the company_repositoryname part in above scripts.

For some reason, I was never able to get prefix and all that completely right from the start. That is, until now — needless to say the above works. And for the sake of documenting and not re-learning each time (e.g. today, as a migrate another Subversion repository to git (Github :-)), here's my documentary blog post.

Tags and branches

Tags and branches are slightly different concepts in Subversion and git.

In Subversion, we usually ran pre-processing on tags before we deployed them (because doing this in a branch was a huge pita due to size of the repository and the overall joy of merging commits in Subversion. So in the end, a tag we created in Subversion, is not a tag in git because we modified the tag — which makes it a branch.

So as a follow up to my prior snippet to convert a repostitory, I've used this script to convert the branches it created to proper tags in git:

#!/bin/sh

branches=(`git branch -r`)

for branch in "${branches[@]}"
do
    case $branch in
    *tag*)
        tag=${branch//svn\/company_repositoryname\/tags\//}
        remote="remotes/${branch}"
        echo "$tag from $branch, remote: $remote"
        git checkout -b "tag-$tag" $remote && git tag -a $tag -m "SVN tag: $tag"
        ;;
    *)
        echo "Skipping: $branch"
    esac
done

Again, you will have to adjust company_repositoryname in this piece. :-)

Once the script completes, I verify the tags with git tag -l and delete the branches with git branch -D foo.

If all looks ok and the tests confirm this, I add an origin, push branches and also git push --tags.

Fin

That's all — happy migrating. Just in case: the code is BSD licensed, which means, you can do whatever you want with it.

Continuous Integration: Automated database setup with Doctrine on Travis-CI

Testing is important — most people understand that by now. A lot of people write tests for their open source code already, but in-house testing is still hard. For example, many of us had an encounter with Jenkins: it runs well to a point where it becomes harder to maintain the Jenkins than it is to write tests.

Another obstacle is test setup and environments: When I write and run tests, there is sometimes only so much I can do to mock and avoid actual calls to my storage backend. While I prefer to run my database tests against a SQLite in memory database, there are these edge cases, where I work with multiple database or I write a direct query (and by-pass the ORM-magic).

In these cases I need to have that database server available in my test environment!

The following blog posts explains how to solve these things with Travis-CI. I will walk you through the setup on Travis-CI's business service. But most of this applies to their open source offering as well.

Step by step

I'll try to break it up into small steps.

Travis-CI

The first step is to login at http://travis-ci.com and add the repository tests should be run for. To be able to add repositories of an organization, you have to be the owner of the organization. The easiest way to get access to the service right now is donating to these guys or in case you have done that already: email them. ;-)

The second step is setting up a .travis.yml file.

Mine looks like this:

language: php
php:
  - 5.3
  - 5.4
before_script:
  - ./composer.phar -v install --dev
  - psql -c 'create database testdatabase;' -U postgres

Run-down:

  • run the tests against PHP 5.3 and 5.4
  • before_script defines your test setup (outside PHP)
  • I omitted the script stanza because the default (phpunit) works for me

Composer

I am using composer to manage my dependencies and you should too. I don't want to go into details here, but a short example of my composer.json is the following:

{
    "name": "mycompany/project",
    "description": "A super cool project.",
    "require": {
        "doctrine/orm": "2.2.2"
    },
    "autoload": {
        "psr-0": {
            "MyCompany\\Project\\Test": "tests/",
            "MyCompany\\Project": "src/"
        }
    }
}

Side-note: We also currently commit a composer.phar into each repository for two reasons:

  1. To ensure a change in composer won't break our setup.
  2. Downtime of (or connectivity issues to) their website don't break our deployments and test runs.

Test framework setup

There is not a whole lot to setup since Travis-CI installs phpunit already. Just make sure you have a phpunit.xml in the root of your repository and you are good to go.

Database schema

The next step would be to generate your schema and check in some .sql, right? I'm not a huge fan of this, because I hate running through a lot of manual steps when I need to update something. Manual steps means that they might be forgotten or people make a mistake. So the objective is to avoid any manual labour as much as you can.

Instead of maintaining these files, I use Doctrine's SchemaTool. It takes care of this just fine because I annotated all my entities already.

To make use of this, I suggest to add the following to your test case:

<?php 
namespace MyCompany\Project\Test;

use Doctrine\ORM\Tools\Setup;
use Doctrine\ORM\EntityManager;
use Doctrine\Common\Persistence\PersistentObject;
use Doctrine\ORM\Tools\SchemaTool;

class MyTestCase extends \PHPUnit_Framework_Testcase
{
    protected $em, $tool;
    
    public function setUp()
    {
        $this->setUpDatabase(); // wrap this again
        $this->setUpSchema();
        /* more setup here */
    }

    public function tearDown()
    {
        $classes = array(
            $this->em->getClassMetadata('MyCompany\Project\Entity\SomethingImportant'),
        );
        $this->tool->dropSchema($classes);
        unset($tool);
        unset($em);
    }

    public function setUpDatabase()
    {
        $isDevMode      = true;
        $doctrineConfig = Setup::createAnnotationMetadataConfiguration(
            array('path/to/Entity'),
            $isDevMode
        );

        // database configuration parameters
        $dbConfig = array(
            'host'     => '127.0.0.1',
            'user'     => 'postgres',
            'password' => '',
            'dbname'   => 'testdatabase',
        );

        $this->em = EntityManager::create($dbConfig, $doctrineConfig);
        PersistentObject::setObjectManager($this->em);
    }
    
    public function setUpSchema()
    {
        $this->tool = new SchemaTool($this->em);
        $classes = array(
            $this->em->getClassMetadata('MyCompany\Project\Entity\SomethingImportant'),
        );
        $this->tool->createSchema($classes);
    }
}

Assumptions made (aka, room for improvement):

  • your entities are in path/to/Entity
  • PostgreSQL is used and runs on 127.0.0.1
  • you're using a database called testdatabase

Tests

Once the setup is complete, I add the usual testFoo() methods into my test case class.

From within the testFoo() methods I have Doctrine's EntityManager available via $this->em. The entity manager allows me to do whatever I need or want within a test run.

After a test completes, the tearDown() method is invoked and destroys the tables in your testdatabase and then setUp() re-creates it. This will take some time but the side-effects of stale data are not to be neglected. Add to that, your tests should not rely on the order they are executed in anyway.

Another benefit of this setup are updated SQL tables each time a commit changes the annotations. No extra .sql files to maintain. :)

Fin

That's really all there is to running your test suite with Travis-CI and while you did all the above, you just added continuous integration to your toolkit because these tests run each time a pul request is opened or commits are pushed. :-)

As I mentioned early on, these steps apply to the open source offering as well — all but the link to login.

If PostgreSQL is not your game, have a look at the list of currently supported databases on Travis-CI. And if your database server is not on the list of supported applications, you might as well install it with a before_script (as long as it runs on Ubuntu). ;-)

Happy testing!

Mind blown: svn:ignore

As we migrate away from Subversion, I ran into this little issue today.

Since it's something which popped up time and time again and I never got around to figuring out why this happened in the first place, I decided to put it into a blog post.

Problem?

What I did was, I setup an svn:ignore in the root of my Subversion repository, in order to ignore modules which are installed in the app/modules structure:

dev% svn propedit svn:ignore .

My editor opened, and I put in app/modules/foo for the path I want to ignore.

Turns out this does not work:

dev% svn st
?       app/modules/foo

After trying out various things like app/modules/f* and */foo, I finally figured it out:

dev% cd app/modules
dev% svn propedit svn:ignore .

Solution

The revalation of the day is: svn:ignore does not work recursively — like svn:externals would.

Fin

And this is after almost eight years with Subversion. I am just glad to move on.

Apologies if this is too basic or written down somewhere — I did try to read the page on svn:ignore in the documentation, but decided to spend my time on something else after the first couple paragraphs.

Too much abstraction: Doctrine and PHP

Personally — I feel like I'm really, really late to get on the Doctrine train.

I've tried Doctrine 1 a few years back and found it mostly too complicated for my needs. For all I know, I still find PEAR::DB and PEAR::MDB2 the perfect database abstraction layers (DBAL). Though of course they may or may not be totally up to date in terms of PHP etc.. The overall concept of an ORM never stuck with me.

Of course Doctrine has a DBAL too and it would be unfair to compare its ORM to another DBAL, but it seems like almost no one uses the Doctrine DBAL by itself. I haven't either.

The primary use-case for Doctrine seems to be ORM and here is what I think about it.

Little steps

RTFM is especially hard with Doctrine: Google usually points to older documentation, then there are a lot of out-dated links and the classic has not been written yet.

In case there is documentation I compare the experience to RPM (Redhat Package Manager):

  • I need to learn about X
  • But have to read about Y first.
  • To understand Y I need to dive into Z.

Sounds familiar? Dependency resolution in documentation. However, of course there's always also light at the end of the tunnel — it's when it all works — I conquered a problem and defeated the proxies and caches. ;-)

Entities

Entities can be incredibly simple — take the persistent object:

<?php
use Doctrine\Common\Persistence\PersistentObject;

/**
 * @Entity
 * @Table(
 *   name="db.users"
 * )
 */
class Row extends PersistentObject
{
  /**
   * @Id
   * @Column(type="integer")
   * @GeneratedValue
   */ 
  protected $id
}

So what does this look like:

  • database "db"
  • table: "users"
  • columns: "id" (most likely integer and auto_increment with MySQL)

I feel like that is feature a hidden gem because I could not find much about it in official documentation — but that's basically all you have to do.

Annotations are not my thing and XML or YAML or neither — I can live with either of the three though. What seemed annoying is that usually people start off by writing half a dozen getFoo() and setFoo() methods which of course depends on how many columns in your table are. With the persistent object, this is not necessary.

Abstraction

Abstraction is Doctrine's biggest strength and largest drawback in my opinion. Abstraction is nice within reason, but I think with Doctrine it's way over the top.

My personal opinion on SQL is that SELECT * FROM table WHERE id = 1 never hurt anyone. Not to suggest to do this in the wrong layer of your application, but actually understanding the tools I work with, made me a better developer today. A JOIN does not hurt either because then you and I know how to write one and what it costs to execute the sucker. LEFT JOIN, STRAIGHT JOIN and INNER JOIN — most developers don't know the difference. Doctrine also likes cascades but I bet not many people ever watched a busy database server while it's trying to process all them while a couple 1000 writes happen at the same time.

With Doctrine, SQL is so far away that developers really don't need to have an idea about what is going on and I think that is bad in particular.

My problem is that most of the time, the developer does not see their SQL at all and also doesn't analyze what it is doing. I have yet to figure out how to make the application log all statements produced. And yeah, it seems to be impossible because with the use of prepared statements no SQL is actually produced in Doctrine — it all happens on the server. But sure, prepared statements are pretty awesome because we are all idiots and don't know how to escape data. But I should leave this for another blog post.

About abstraction — we had a similar discussion at this month's PHP Usergroup in Berlin and usually someone will counter my argument that the benefits of faster development outweigh all the potential screw ups. And maybe they are right — for a while. And then they upgrade their database server to have more RAM, right? What if for a while is already past tense?

Example

I just found the following code in one of our projects and I think it's a good example of what I mean.

The basics (broken down):

  • SQL:
    • a user table
    • columns: id, email, password, site_id
    • indices: PRIMARY (on id), UNIQUE (on email and site_id)
  • an entity Doctrine\Model\User class describing the above
  • a repository Doctrine\Model\UserRepository class to add convenience calls to the entity

So here is what I found:

<?php
// in class UserRepository
public function getUserById($id, $site_id = 1)
{
    $qb = $this->_em->createQueryBuilder();
    $qb->select(array('u'))
        ->from('Doctrine\Model\User', 'u')
        ->where('u.id = :id')
        ->andWhere('u.site_id = :site_id')
        ->setParameter('id', $id)
        ->setParameter('site_id', $site_id);

        /* ... */
}

And to the defense of all the people who worked on this (including me), it takes a while to see. :-)

Question to my reader: Take the setup into account, do you see the issue? Leave a comment if you figure out what the problem is.

So despite having an entity which defines the table setup, it's still possible to do something like that and there is no error/warning what so ever to let the developer know what they are doing. Not at run-time and not in the Doctrine CLI either. I'd argue that with the level of abstraction Doctrine adds to hide SQL from the developer, I expect that it would do enough analysis on what the developer is doing.

Or maybe the level of abstraction added here is actually counter-productive?

Fin

Food for thoughts. Hope it was not too ranty.

Disclaimer: This is with no offense meant to the people contributing to Doctrine or using Doctrine. I just had to raise my concerns somewhere.