Continuous Integration: Automated database setup with Doctrine on Travis-CI

Monday, August 13. 2012
Comments

Testing is important — most people understand that by now. A lot of people write tests for their open source code already, but in-house testing is still hard. For example, many of us had an encounter with Jenkins: it runs well to a point where it becomes harder to maintain the Jenkins than it is to write tests.

Another obstacle is test setup and environments: When I write and run tests, there is sometimes only so much I can do to mock and avoid actual calls to my storage backend. While I prefer to run my database tests against a SQLite in memory database, there are these edge cases, where I work with multiple database or I write a direct query (and by-pass the ORM-magic).

In these cases I need to have that database server available in my test environment!

The following blog posts explains how to solve these things with Travis-CI. I will walk you through the setup on Travis-CI's business service. But most of this applies to their open source offering as well.

Step by step

I'll try to break it up into small steps.

Travis-CI

The first step is to login at http://travis-ci.com and add the repository tests should be run for. To be able to add repositories of an organization, you have to be the owner of the organization. The easiest way to get access to the service right now is donating to these guys or in case you have done that already: email them. ;-)

The second step is setting up a .travis.yml file.

Mine looks like this:

language: php
php:
  - 5.3
  - 5.4
before_script:
  - ./composer.phar -v install --dev
  - psql -c 'create database testdatabase;' -U postgres

Run-down:

  • run the tests against PHP 5.3 and 5.4
  • before_script defines your test setup (outside PHP)
  • I omitted the script stanza because the default (phpunit) works for me

Composer

I am using composer to manage my dependencies and you should too. I don't want to go into details here, but a short example of my composer.json is the following:

{
    "name": "mycompany/project",
    "description": "A super cool project.",
    "require": {
        "doctrine/orm": "2.2.2"
    },
    "autoload": {
        "psr-0": {
            "MyCompany\\Project\\Test": "tests/",
            "MyCompany\\Project": "src/"
        }
    }
}

Side-note: We also currently commit a composer.phar into each repository for two reasons:

  1. To ensure a change in composer won't break our setup.
  2. Downtime of (or connectivity issues to) their website don't break our deployments and test runs.

Test framework setup

There is not a whole lot to setup since Travis-CI installs phpunit already. Just make sure you have a phpunit.xml in the root of your repository and you are good to go.

Database schema

The next step would be to generate your schema and check in some .sql, right? I'm not a huge fan of this, because I hate running through a lot of manual steps when I need to update something. Manual steps means that they might be forgotten or people make a mistake. So the objective is to avoid any manual labour as much as you can.

Instead of maintaining these files, I use Doctrine's SchemaTool. It takes care of this just fine because I annotated all my entities already.

To make use of this, I suggest to add the following to your test case:

<?php 
namespace MyCompany\Project\Test;

use Doctrine\ORM\Tools\Setup;
use Doctrine\ORM\EntityManager;
use Doctrine\Common\Persistence\PersistentObject;
use Doctrine\ORM\Tools\SchemaTool;

class MyTestCase extends \PHPUnit_Framework_Testcase
{
    protected $em, $tool;

    public function setUp()
    {
        $this->setUpDatabase(); // wrap this again
        $this->setUpSchema();
        /* more setup here */
    }

    public function tearDown()
    {
        $classes = array(
            $this->em->getClassMetadata('MyCompany\Project\Entity\SomethingImportant'),
        );
        $this->tool->dropSchema($classes);
        unset($tool);
        unset($em);
    }

    public function setUpDatabase()
    {
        $isDevMode      = true;
        $doctrineConfig = Setup::createAnnotationMetadataConfiguration(
            array('path/to/Entity'),
            $isDevMode
        );

        // database configuration parameters
        $dbConfig = array(
            'host'     => '127.0.0.1',
            'user'     => 'postgres',
            'password' => '',
            'dbname'   => 'testdatabase',
        );

        $this->em = EntityManager::create($dbConfig, $doctrineConfig);
        PersistentObject::setObjectManager($this->em);
    }

    public function setUpSchema()
    {
        $this->tool = new SchemaTool($this->em);
        $classes = array(
            $this->em->getClassMetadata('MyCompany\Project\Entity\SomethingImportant'),
        );
        $this->tool->createSchema($classes);
    }
}

Assumptions made (aka, room for improvement):

  • your entities are in path/to/Entity
  • PostgreSQL is used and runs on 127.0.0.1
  • you're using a database called testdatabase

Tests

Once the setup is complete, I add the usual testFoo() methods into my test case class.

From within the testFoo() methods I have Doctrine's EntityManager available via $this->em. The entity manager allows me to do whatever I need or want within a test run.

After a test completes, the tearDown() method is invoked and destroys the tables in your testdatabase and then setUp() re-creates it. This will take some time but the side-effects of stale data are not to be neglected. Add to that, your tests should not rely on the order they are executed in anyway.

Another benefit of this setup are updated SQL tables each time a commit changes the annotations. No extra .sql files to maintain. :)

Fin

That's really all there is to running your test suite with Travis-CI and while you did all the above, you just added continuous integration to your toolkit because these tests run each time a pul request is opened or commits are pushed. :-)

As I mentioned early on, these steps apply to the open source offering as well — all but the link to login.

If PostgreSQL is not your game, have a look at the list of currently supported databases on Travis-CI. And if your database server is not on the list of supported applications, you might as well install it with a before_script (as long as it runs on Ubuntu). ;-)

Happy testing!

Too much abstraction: Doctrine and PHP

Friday, July 6. 2012
Comments

Personally — I feel like I'm really, really late to get on the Doctrine train.

I've tried Doctrine 1 a few years back and found it mostly too complicated for my needs. For all I know, I still find PEAR::DB and PEAR::MDB2 the perfect database abstraction layers (DBAL). Though of course they may or may not be totally up to date in terms of PHP etc.. The overall concept of an ORM never stuck with me.

Of course Doctrine has a DBAL too and it would be unfair to compare its ORM to another DBAL, but it seems like almost no one uses the Doctrine DBAL by itself. I haven't either.

The primary use-case for Doctrine seems to be ORM and here is what I think about it.

Little steps

RTFM is especially hard with Doctrine: Google usually points to older documentation, then there are a lot of out-dated links and the classic has not been written yet.

In case there is documentation I compare the experience to RPM (Redhat Package Manager):

  • I need to learn about X
  • But have to read about Y first.
  • To understand Y I need to dive into Z.

Sounds familiar? Dependency resolution in documentation. However, of course there's always also light at the end of the tunnel — it's when it all works — I conquered a problem and defeated the proxies and caches. ;-)

Entities

Entities can be incredibly simple — take the persistent object:

<?php
use Doctrine\Common\Persistence\PersistentObject;

/**
 * @Entity
 * @Table(
 *   name="db.users"
 * )
 */
class Row extends PersistentObject
{
  /**
   * @Id
   * @Column(type="integer")
   * @GeneratedValue
   */ 
  protected $id
}

So what does this look like:

  • database "db"
  • table: "users"
  • columns: "id" (most likely integer and auto_increment with MySQL)

I feel like that is feature a hidden gem because I could not find much about it in official documentation — but that's basically all you have to do.

Annotations are not my thing and XML or YAML or neither — I can live with either of the three though. What seemed annoying is that usually people start off by writing half a dozen getFoo() and setFoo() methods which of course depends on how many columns in your table are. With the persistent object, this is not necessary.

Abstraction

Abstraction is Doctrine's biggest strength and largest drawback in my opinion. Abstraction is nice within reason, but I think with Doctrine it's way over the top.

My personal opinion on SQL is that SELECT * FROM table WHERE id = 1 never hurt anyone. Not to suggest to do this in the wrong layer of your application, but actually understanding the tools I work with, made me a better developer today. A JOIN does not hurt either because then you and I know how to write one and what it costs to execute the sucker. LEFT JOIN, STRAIGHT JOIN and INNER JOIN — most developers don't know the difference. Doctrine also likes cascades but I bet not many people ever watched a busy database server while it's trying to process all them while a couple 1000 writes happen at the same time.

With Doctrine, SQL is so far away that developers really don't need to have an idea about what is going on and I think that is bad in particular.

My problem is that most of the time, the developer does not see their SQL at all and also doesn't analyze what it is doing. I have yet to figure out how to make the application log all statements produced. And yeah, it seems to be impossible because with the use of prepared statements no SQL is actually produced in Doctrine — it all happens on the server. But sure, prepared statements are pretty awesome because we are all idiots and don't know how to escape data. But I should leave this for another blog post.

About abstraction — we had a similar discussion at this month's PHP Usergroup in Berlin and usually someone will counter my argument that the benefits of faster development outweigh all the potential screw ups. And maybe they are right — for a while. And then they upgrade their database server to have more RAM, right? What if for a while is already past tense?

Example

I just found the following code in one of our projects and I think it's a good example of what I mean.

The basics (broken down):

  • SQL:
    • a user table
    • columns: id, email, password, site_id
    • indices: PRIMARY (on id), UNIQUE (on email and site_id)
  • an entity Doctrine\Model\User class describing the above
  • a repository Doctrine\Model\UserRepository class to add convenience calls to the entity

So here is what I found:

<?php
// in class UserRepository
public function getUserById($id, $site_id = 1)
{
    $qb = $this->_em->createQueryBuilder();
    $qb->select(array('u'))
        ->from('Doctrine\Model\User', 'u')
        ->where('u.id = :id')
        ->andWhere('u.site_id = :site_id')
        ->setParameter('id', $id)
        ->setParameter('site_id', $site_id);

        /* ... */
}

And to the defense of all the people who worked on this (including me), it takes a while to see. :-)

Question to my reader: Take the setup into account, do you see the issue? Leave a comment if you figure out what the problem is.

So despite having an entity which defines the table setup, it's still possible to do something like that and there is no error/warning what so ever to let the developer know what they are doing. Not at run-time and not in the Doctrine CLI either. I'd argue that with the level of abstraction Doctrine adds to hide SQL from the developer, I expect that it would do enough analysis on what the developer is doing.

Or maybe the level of abstraction added here is actually counter-productive?

Fin

Food for thoughts. Hope it was not too ranty.

Disclaimer: This is with no offense meant to the people contributing to Doctrine or using Doctrine. I just had to raise my concerns somewhere.

Vagrant sans Ruby

Tuesday, June 5. 2012
Comments

Development, testing, staging and production — this is how most people devide up different environments for application development.

Maintenance and setup of these environments is often not a trivial goal to achieve. Having worked with a couple different code bases and setups over the last decade, I often noticed things like environment specific hacks (if ($env == 'testing') { ... }) in application code and service configurations and a lot of manual labour all around. It's still very common that code deployed to staging does not work — but it worked for you right?

And it's also too common that members of a team do not know how something works because whoever set it up is currently out sick or on vacation.

My opinion is that the best setup currently available is something like: chef(-solo) on the server and Vagrant on the desktop. Something like because aside from chef, there is also puppet, cfengine and a couple others. Leaving specific projects aside, it really just boils down to automation (within reason).

Automation in my opinion is not just the easiest but the only viable way to setup development, testing, staging and production environments. Without some automation in place it's required that all team members know how all of it works when maybe that is not yet important and the end goal is that all environments actually resemble each other.

Enter Vagrant

Vagrant is a very nifty toolkit to bootstrap Virtualbox images. Bootstrapping means installing your application stack into one or multiple virtual machines in order to resemble production a lot better.

So a lot of times when I rave about how useful Vagrant and chef are, I get the crazy eye from PHP developers:

You want me to learn Ruby to setup my local development environment?

My response:

  1. Do not fear the Ruby.
  2. You don't have to. (Well, not a whole lot!)

Real talk

First off, of course you need Virtualbox — get a download from their website or use your package manager.

Then, it's not possible to avoid Ruby a 100% — after all Vagrant is written in Ruby.

When you're on a Linux or a Mac, this is usually enough (you may need sudo unless you RVM):

$ sudo gem install vagrant

For a sudo-less install, use RVM:

$ sudo gem install rvm
...
$ rvm install
...
$ rvm use 1.8.7
...

When set up, this is what you do:

$ gem install vagrant
...

RVM allows me to run multiple versions of ruby side by side and also leverage local gem installs — per Ruby version. Think of it as a nifty way to run multiple PHP versions and have a PEAR install per version — similar projects exist for PHP today:

To learn more about RVM, visit their website.

Once vagrant is installed, we can continue!

ShellProvisioner for fun and profit

So, in case you are more comfortable writing some shell script for the time being — Vagrant got you covered!

First off, create a new Vagrantfile for your project:

$ mkdir -p Documents/workspaces/blog-example/
$ cd Documents/workspaces/blog-example
$ vagrant init lucid64
A `Vagrantfile` has been placed in this directory. You are now
ready to `vagrant up` your first virtual environment! Please read
the comments in the Vagrantfile as well as documentation on
`vagrantup.com` for more information on using Vagrant.

lucid64 (Ubuntu Lucid 10.04, 64bit) is the name of one of my local box files for Vagrant. In case you haven't got a box yet, head over to vagrantbox.es. There are a couple images you (aka box files) you can download. To get the lucid64 box, use the following command:

$ vagrant box add lucid64 http://files.vagrantup.com/lucid64.box

Once you made it past vagrant init, you should have a Vagrantfile with a lot of stuff in there.

While it's not important for my tutorial, I recommend you review it some (other time). The created Vagrantfile contains examples for all the provisioners (puppet, chef and shell) and a couple other configuration options, etc..

Let's skip over this and get to the ShellProvisioner.

This is all you need

First off, let's create a shell script setup.sh in the same directory and put something like this in it:

#!/bin/sh

apt-get update

apt-get install -y php5 php5-cli php-pear
hash -r

pear upgrade-all
pear install -f HTTP_Request2

What does it do?

  1. Update local package sources.
  2. Install php5, cli interpreter and PEAR installer
  3. reload environment
  4. upgrade all installed PEAR packages
  5. install PEAR HTTP_Request2

When you're done editing, make sure to chmod +x it.

Simple as that.

Putting your shell script to use

Replace your Vagrantfile with the following:

Vagrant::Config.run do |config|

  config.vm.box = "lucid64"
  config.vm.provision :shell, :path => "./setup.sh"

end

Then, run vagrant up and watch it provision. :)

Enter your virtual machine with vagrant ssh and verify HTTP_Request2 is installed:

$ vagrant ssh                                                                                                                                                               [14:16:08]
Linux lucid64 2.6.32-33-server #70-Ubuntu SMP Thu Jul 7 22:28:30 UTC 2011 x86_64 GNU/Linux
Ubuntu 10.04.3 LTS

Welcome to the Ubuntu Server!
 * Documentation:  http://www.ubuntu.com/server/doc
Last login: Thu Jul 21 14:08:15 2011 from 10.0.2.2
[email protected]:~$ pear list -c pear
Installed packages, channel pear.php.net:
=========================================
Package          Version State
Archive_Tar      1.3.10  stable
Console_Getopt   1.3.1   stable
HTTP_Request2    2.1.1   stable
Net_URL2         2.0.0   stable
PEAR             1.9.4   stable
Structures_Graph 1.0.4   stable
XML_Util         1.2.1   stable

Yay — your first steps into automation!

Fin

That's all for today — Vagrant with almost no Ruby. I hope this gets many more started!

From Subversion to GIT (and beyond!)

Friday, December 9. 2011
Comments

Here's a more or less simple way to migrate from Subversion to GIT(hub), this includes mapping commits and tags and what not!

Authors

If multiple people congtributed to your project, this is probably the toughest part. If you're not migration from let's say Google Code but PHP's Subversion repository, then it's really pretty simple indeed: the username is the email address.

I found a nifty bash script to get it done (and adjusted it a little bit):

#!/usr/bin/env bash
authors=$(svn log -q | grep -e '^r' | awk 'BEGIN { FS = "|" } ; { print $2 }' | sort | uniq)
for author in ${authors}; do
  echo "${author} = ${author} <[email protected]>";
done

Since I migrated my project already, I didn't have the Subversion tree handy. That's why I used another package I maintain to demo this.

This is how you run it (assumes you have chmod +x'd it before):

# ./authors.sh
cvs2svn = cvs2svn <...>
cweiske = cweiske <...>
danielc = danielc <...>
gwynne = gwynne <...>
kguest = kguest <...>
pmjones = pmjones <...>
rasmus = rasmus <...>
till = till <...>

If you redirect the output to authors.txt, you're done.

Note: In case people don't have the email address you used on their Github account, they can always add it later on. Github allows you to use multiple email addresses, which is pretty handy for stuff like open source and work-related repositories.

git clone

This part took me a long time to figure out — especially because of the semi-weird setup in Google Code. The wiki is in Subversion as well, so the repository root is not the root where the software lives. This is probably a non-issue if you want to migrate the wiki as well, but I don't see why you would cluter master with it. Instead, I'd suggest to migrate the wiki content into a seperate branch.

Without further ado, this works:

# git svn clone --authors-file=./authors.txt --no-metadata --prefix=svn/ \
--tags=Services_ProjectHoneyPot/tags \
--trunk=Services_ProjectHoneyPot/trunk \
--branches=Services_ProjectHoneyPot/branches \
http://services-projecthoneypot.googlecode.com/svn/ \
Services_ProjectHoneyPot/

The final steps are to add a new remote and push master to it. Done!

You can see the result on Github.

A shortcut with Google Code

I facepalm'd myself when I saw that I could have converted my project on Google Code from Subversion to GIT.

This seems a lot easier since it would allow me to just push the branch to Github without cloning etc.. I'm not sure how it would spin off the wiki-content and how author information is preserved, but I suggest you try it out in case you want to migrate your project off of Google Code.

Doing it the other way is not time wasted since I had to figure out the steps regardless.

Summary

There seem to be literally a million ways to migrate (from Subversion) to GIT. I hope this example got you one step closer to your objective.

The biggest problem migrating is, that often people in Subversion-land screw up tags by committing to them (I'm guilty as well). I try to avoid that in open source, but as far as work is concerned, we sometimes need to hotfix a tag and just redeploy instead of running through the process of recreating a new tag (which is sometimes super tedious when a large Subversion repository is involved).

When I migrated a repository the other day, I noticed that these tags became branches in GIT. The good news is, I won't be able to do this anymore with GIT (Yay!), which is good because it forces me to create a more robust and streamlined process to get code into production. But how do I fix this problem during the migration?

Fix up your tags

If you happen to run into this problem, my suggestion is to migrate trunk only and then re-create the tags in GIT.

GIT allows me to create a tag based on a branch or based on a commit. Both options are simple and much better than installing a couple Python- and/or Ruby-scripts to fix your tree, which either end up not working or require a PHD in Math to understand.

To create a tag from a branch, I check out the branch and tag it. This may work best during a migration and of course it depends on how many tags need to be (re-)created and if you had tags at all. Creating a tag based on a commit comes in handy, when you forgot to create a tag at all: for example, you fixed a bug for a new release and then ended up refactoring a whole lot more.

In order to get the history of your GIT repository, try git log:

# git log --pretty=oneline                                                                                                                             [16:03:13]
1d973dfe6f6e361e6f54953f374d60289bb0abea add AllTests
f53404579f5416058937941d0609df4720717cae  * update package.xml for 0.6.0 release
d5b42eef2035d26b1e1d119ff44a09efa418685e  * refactored Services_ProjectHoneyPot_Response:    * no static anymore    * type-hinting all around
82d7e8d229109565d42f98c6548354f85734583c make skip more robust
b9e77a427eb546bacce600f5bc41546e85c148d7 prep package.xml for 0.6.0
6cecfbc19c00f0bf06b800c297e23f00cee650ef  * remove response-format mambojambo
036a9d9509adb456114f601c60d188839c012004 make test more robust
713c4ec91c28e19fbb33d6bb853ced0bdeb3f321  * update harvester's IP (this fails always)
755a2bba8f8506525a9cd2a1f11b266b7d26bbe6 throw exception if not a boolean
2aa21913946e2b4b3db949233a118dbbe2e34bf4  * all set*() are a fluent interface now  * update from Net_DNS_Resolver to Net_DNS2_Resolver  * dumb down setResolver(): Net_DNS2_Resolver is created in __const
81f544d880fc7b7a6321be9420b911817b567bd1 update docblock
dbe74da67c5fa1f1209fe85d3050041ca2a2de6b  * update docblock  * fix cs, whitespace
...

If I wanted to create a tag based on a certain commit (e.g. see last revision in the previous listing), I'd run the following command:

# git tag -a 0.5.4 dbe74da67c5fa1f1209fe85d3050041ca2a2de6b

Pro-tip: GIT allows you to create tags based on part of the hash to. Try "dbe74da", it should work as well.

That's all.

Things to learn

Moving from Subversion to GIT doesn't require too much to relearn. Instead of just commit, you also push and pull. These commands will get you pretty far if you just care for the speed and not for the rest.

Since I'm hoping, you want more, here are a couple things to look into:

  • branching
  • merging
  • git add -p
  • git remote add

Especially branching and merging are almost painless with GIT. I highly, highly recommend you make heavy use of it.

While GIT is sometimes a brainf*ck (e.g. submodules, commit really stages, subtree, absense of switch), the many benefits usually outweigh the downside. The one and only thing I truely miss are svn:externals currently. However, I'm hoping to master subtree one day and then I'll be a very happy camper.

Fin

That's all.

Tracking PHP errors

Saturday, November 27. 2010
Comments

track_errors provides the means to catch an error message emitted from PHP. It's something I like to use during the development of various applications, or to get a handle on legacy code. Here are a few examples why!

For example

Imagine the following remote HTTP call:

$response = file_get_contents('http://example.org/');

So whenever this call fails, it will return false and also emit an error message:

Warning: file_get_contents(http://example.org):
    failed to open stream: could not connect to host in /example.php on line 2

Some people use @ to suppress this error message — an absolute no-go for reasons such as:

  • it just became impossible to know why the call failed
  • @ at runtime is an expensive operation for the PHP parser

The advanced PHP web kiddo knows to always display_errors = Off (e.g. in php.ini or through ini_set()) in order to shield the visitor of their application from these nasty messages. And maybe they even know how to log the error — somewhere.

But whenever an error is logged to a log file somewhere, it also means it's buried. Sometimes these error logs are too far away and often they get overlooked. If you happen to centralize and actually analyze your logfiles, I salute you!

So how do you use PHP's very useful and informative error message to debug this runtime error?

track_errors to the rescue.

track_errors is a PHP runtime setting.

To enable it:

; php.ini
track_errors = On 

Or:

ini_set('track_errors', 1);

And this allows you to do the following:

$response = file_get_contents('http://example.org');
if ($response === false) {
    throw new RuntimeException(
        "Could not open example.org: {$GLOBALS['php_errormsg']}"
    );
}

The last error message is always populated in the global variable $php_errormsg.

You want more?

I also recently used the same technique to implement error checking into a legacy application. I basically did the following:

// footer.php ;-)
if (isset($GLOBALS['php_errormsg'])
    && !empty($GLOBALS['php_errormsg'])
    && $user->isAdmin()) {

    echo $GLOBALS['php_errormsg'];
}

Trade-offs

As useful as this can be, there are a number of trade-offs. I suggest you use it wisely.

  • $php_errormsg is a global variable [yuck]
  • many extensions provide build in ways to catch errors
  • ini_set() calls at runtime are expensive

Fin

That's all, quick and (very) dirty.