From Subversion to GIT (and beyond!)

Friday, December 9. 2011
Comments

Here's a more or less simple way to migrate from Subversion to GIT(hub), this includes mapping commits and tags and what not!

Authors

If multiple people congtributed to your project, this is probably the toughest part. If you're not migration from let's say Google Code but PHP's Subversion repository, then it's really pretty simple indeed: the username is the email address.

I found a nifty bash script to get it done (and adjusted it a little bit):

#!/usr/bin/env bash
authors=$(svn log -q | grep -e '^r' | awk 'BEGIN { FS = "|" } ; { print $2 }' | sort | uniq)
for author in ${authors}; do
  echo "${author} = ${author} <${author}@php.net>";
done

Since I migrated my project already, I didn't have the Subversion tree handy. That's why I used another package I maintain to demo this.

This is how you run it (assumes you have chmod +x'd it before):

# ./authors.sh
cvs2svn = cvs2svn <...>
cweiske = cweiske <...>
danielc = danielc <...>
gwynne = gwynne <...>
kguest = kguest <...>
pmjones = pmjones <...>
rasmus = rasmus <...>
till = till <...>

If you redirect the output to authors.txt, you're done.

Note: In case people don't have the email address you used on their Github account, they can always add it later on. Github allows you to use multiple email addresses, which is pretty handy for stuff like open source and work-related repositories.

git clone

This part took me a long time to figure out — especially because of the semi-weird setup in Google Code. The wiki is in Subversion as well, so the repository root is not the root where the software lives. This is probably a non-issue if you want to migrate the wiki as well, but I don't see why you would cluter master with it. Instead, I'd suggest to migrate the wiki content into a seperate branch.

Without further ado, this works:

# git svn clone --authors-file=./authors.txt --no-metadata --prefix=svn/ \
--tags=Services_ProjectHoneyPot/tags \
--trunk=Services_ProjectHoneyPot/trunk \
--branches=Services_ProjectHoneyPot/branches \
http://services-projecthoneypot.googlecode.com/svn/ \
Services_ProjectHoneyPot/

The final steps are to add a new remote and push master to it. Done!

You can see the result on Github.

A shortcut with Google Code

I facepalm'd myself when I saw that I could have converted my project on Google Code from Subversion to GIT.

This seems a lot easier since it would allow me to just push the branch to Github without cloning etc.. I'm not sure how it would spin off the wiki-content and how author information is preserved, but I suggest you try it out in case you want to migrate your project off of Google Code.

Doing it the other way is not time wasted since I had to figure out the steps regardless.

Summary

There seem to be literally a million ways to migrate (from Subversion) to GIT. I hope this example got you one step closer to your objective.

The biggest problem migrating is, that often people in Subversion-land screw up tags by committing to them (I'm guilty as well). I try to avoid that in open source, but as far as work is concerned, we sometimes need to hotfix a tag and just redeploy instead of running through the process of recreating a new tag (which is sometimes super tedious when a large Subversion repository is involved).

When I migrated a repository the other day, I noticed that these tags became branches in GIT. The good news is, I won't be able to do this anymore with GIT (Yay!), which is good because it forces me to create a more robust and streamlined process to get code into production. But how do I fix this problem during the migration?

Fix up your tags

If you happen to run into this problem, my suggestion is to migrate trunk only and then re-create the tags in GIT.

GIT allows me to create a tag based on a branch or based on a commit. Both options are simple and much better than installing a couple Python- and/or Ruby-scripts to fix your tree, which either end up not working or require a PHD in Math to understand.

To create a tag from a branch, I check out the branch and tag it. This may work best during a migration and of course it depends on how many tags need to be (re-)created and if you had tags at all. Creating a tag based on a commit comes in handy, when you forgot to create a tag at all: for example, you fixed a bug for a new release and then ended up refactoring a whole lot more.

In order to get the history of your GIT repository, try git log:

# git log --pretty=oneline                                                                                                                             [16:03:13]
1d973dfe6f6e361e6f54953f374d60289bb0abea add AllTests
f53404579f5416058937941d0609df4720717cae  * update package.xml for 0.6.0 release
d5b42eef2035d26b1e1d119ff44a09efa418685e  * refactored Services_ProjectHoneyPot_Response:    * no static anymore    * type-hinting all around
82d7e8d229109565d42f98c6548354f85734583c make skip more robust
b9e77a427eb546bacce600f5bc41546e85c148d7 prep package.xml for 0.6.0
6cecfbc19c00f0bf06b800c297e23f00cee650ef  * remove response-format mambojambo
036a9d9509adb456114f601c60d188839c012004 make test more robust
713c4ec91c28e19fbb33d6bb853ced0bdeb3f321  * update harvester's IP (this fails always)
755a2bba8f8506525a9cd2a1f11b266b7d26bbe6 throw exception if not a boolean
2aa21913946e2b4b3db949233a118dbbe2e34bf4  * all set*() are a fluent interface now  * update from Net_DNS_Resolver to Net_DNS2_Resolver  * dumb down setResolver(): Net_DNS2_Resolver is created in __const
81f544d880fc7b7a6321be9420b911817b567bd1 update docblock
dbe74da67c5fa1f1209fe85d3050041ca2a2de6b  * update docblock  * fix cs, whitespace
...

If I wanted to create a tag based on a certain commit (e.g. see last revision in the previous listing), I'd run the following command:

# git tag -a 0.5.4 dbe74da67c5fa1f1209fe85d3050041ca2a2de6b

Pro-tip: GIT allows you to create tags based on part of the hash to. Try "dbe74da", it should work as well.

That's all.

Things to learn

Moving from Subversion to GIT doesn't require too much to relearn. Instead of just commit, you also push and pull. These commands will get you pretty far if you just care for the speed and not for the rest.

Since I'm hoping, you want more, here are a couple things to look into:

  • branching
  • merging
  • git add -p
  • git remote add

Especially branching and merging are almost painless with GIT. I highly, highly recommend you make heavy use of it.

While GIT is sometimes a brainf*ck (e.g. submodules, commit really stages, subtree, absense of switch), the many benefits usually outweigh the downside. The one and only thing I truely miss are svn:externals currently. However, I'm hoping to master subtree one day and then I'll be a very happy camper.

Fin

That's all.

Tracking PHP errors

Saturday, November 27. 2010
Comments

track_errors provides the means to catch an error message emitted from PHP. It's something I like to use during the development of various applications, or to get a handle on legacy code. Here are a few examples why!

For example

Imagine the following remote HTTP call:

$response = file_get_contents('http://example.org/');

So whenever this call fails, it will return false and also emit an error message:

Warning: file_get_contents(http://example.org):
    failed to open stream: could not connect to host in /example.php on line 2

Some people use @ to suppress this error message — an absolute no-go for reasons such as:

  • it just became impossible to know why the call failed
  • @ at runtime is an expensive operation for the PHP parser

The advanced PHP web kiddo knows to always display_errors = Off (e.g. in php.ini or through ini_set()) in order to shield the visitor of their application from these nasty messages. And maybe they even know how to log the error — somewhere.

But whenever an error is logged to a log file somewhere, it also means it's buried. Sometimes these error logs are too far away and often they get overlooked. If you happen to centralize and actually analyze your logfiles, I salute you!

So how do you use PHP's very useful and informative error message to debug this runtime error?

track_errors to the rescue.

track_errors is a PHP runtime setting.

To enable it:

; php.ini
track_errors = On 

Or:

ini_set('track_errors', 1);

And this allows you to do the following:

$response = file_get_contents('http://example.org');
if ($response === false) {
    throw new RuntimeException(
        "Could not open example.org: {$GLOBALS['php_errormsg']}"
    );
}

The last error message is always populated in the global variable $php_errormsg.

You want more?

I also recently used the same technique to implement error checking into a legacy application. I basically did the following:

// footer.php ;-)
if (isset($GLOBALS['php_errormsg'])
    && !empty($GLOBALS['php_errormsg'])
    && $user->isAdmin()) {

    echo $GLOBALS['php_errormsg'];
}

Trade-offs

As useful as this can be, there are a number of trade-offs. I suggest you use it wisely.

  • $php_errormsg is a global variable [yuck]
  • many extensions provide build in ways to catch errors
  • ini_set() calls at runtime are expensive

Fin

That's all, quick and (very) dirty.

PHP SDK for Amazon Web Services

Wednesday, September 29. 2010
Comments

Yesterday, Jeff Barr announced Amazon's own PHP SDK for their web services — own, because AWS hired CloudFusion's lead developer earlier this year (in March) and I guess after a while they decided it was time to incorporate his open source efforts into the company. The full story is on getcloudfusion.com.

So what?

What's more than just pretty interesting about all of this, is that not only is the AWS PHP SDK hosted on Github (bonus points for sure), but since it implements almost the entire API of all infrastructural services and is backed by the API provider, this library currently presents the most feasible way for PHP developers to work with AWS. And to add to that, the library is fully documented as well.

Having worked myself on various small wrappers for the EC2 and SNS web services, I'm really somewhat glad that I can stop working on them now and continue implementing the web services.

PEAR

Amazon's move is also another victory for PEAR (and of course Pirum) because it brings more acceptance to the distribution of PHP libraries using a PEAR Channel.

The SDK's channel is the following: http://pear.amazonwebservices.com/

Setup

pear channel-discover pear.amazonwebservices.com
pear install aws/sdk

The flipside

There are absolutely no unit tests included anywhere. But since I'm assuming that they exist indeed, I hope they will be open sourced before not too long. Or in case they don't (What's up with that?), that pull requests will be accepted so the community will be able to contribute some.

Google Chrome: useful extensions for developers

Wednesday, September 22. 2010
Comments

While Chrome likes to emphasize how speedy it is, it is also sometimes a pretty bare-metal browser. But all the speed comes at an expensive — Chrome is doing less out of the box, which some people will say means, "Chrome is focusing on the essentials".

So because I'm thankful for said speedyness and overall painlessness, I also realize how I little extra bells and whistles I really need for a great browsing experience.

And in the end, being speedy and painless clearly wins.

I've been using Chrome for over a year now (or longer?) and aside from a couple custom extensions I wrote for my own pleasure, so far there only three other extensions I installed.

JsonView

JsonView formats JSON into a pretty structure which I can expand and contract. It also adds some color. This extension is incredibly helpful to those dealing with web services which exchange data using JSON. A must-have if you recognize yourself in that group.

Link: https://chrome.google.com/extensions/detail/chklaanhfefbnpoihckbnefhakgolnmc

XML Tree

XML Tree is just like JsonView, but for XML. Yes, it's dead simple, but I also highly recommend it.

Link: https://chrome.google.com/extensions/detail/gbammbheopgpmaagmckhpjbfgdfkpadb

SpeedTracer

SpeedTracer could be called the equivalent to Firebug. But since Chrome already ships with a pretty comprehensive developer console (which gets you about 80-90% of what Firebug does), SpeedTracer really is the icing on the cake. Not a must have, but a nice to have.

Link: https://chrome.google.com/extensions/detail/ognampngfcbddbfemdapefohjiobgbdl

Debugging Zend_Test

Monday, September 20. 2010
Comments

Sometimes, I have to debug unit tests and usually this is a situation I'm trying to avoid.

If I have to spend too much time debugging a test it's usually a bad test. Which usually means that it's too complex. However, with Zend_Test_PHPUnit_ControllerTestCase, it's often not the actual test, but the framework. This is not just tedious for myself, it's also not the most supportive fact when I ask my developers to write tests.

An example

The unit test fails with something like:

Failed asserting last module used <"error"> was "default".

Translated, this means the following:

  • The obvious: an error occurred.
  • The error was caught by our ErrorController.
  • Things I need to find out:
    • What error actually occurred?
    • Why did it occur?
    • Where did the error occur?

The last three questions are especially tricky and drive me nuts on a regular basis because a unit test should never withhold these things from you. After all, we use these tests to catch bugs to begin with. Why make it harder for the developer fix them?

In my example an error occurred, but debugging Zend_Test also kicks in when things supposedly go according to plan. Follow me to the real life example.

Real life example

I have an Api_IndexController where requests to my API are validated in its preDispatch().

Whenever a request is not validated, I will issue "HTTP/1.1 401 Unauthorized". For the sake of this example, this is exactly what happens.

class ApiController extends Zend_Controller_Action
{
    protected $authorized = false;

    public function preDispatch()
    {
        // authorize the request
        // ...
    }
    public function profileAction()
    {
        if ($this->authorized === false) {
            $this->getResponse()->setRawHeader('HTTP/1.1 401 Unauthorized');
        }
        // ...
    }
}

Here's the relevant test case:

class Api_IndexControllerTest ...

    public function testUnAuthorizedHeader()
    {
        $this->dispatch('/api/profile'); // unauthorized
        $this->assertResponseCode(401);
    }
}

The result:

1) Api_IndexControllerTest::testUnAuthorizedHeader
Failed asserting response code "401"

/project/library/Zend/Test/PHPUnit/Constraint/ResponseHeader.php:230
/project/library/Zend/Test/PHPUnit/ControllerTestCase.php:773
/project/tests/controllers/api/IndexControllerTest.php:58

Not very useful, eh?

Debugging

Before you step through your application with print, echo and an occasional var_dump, here's a much better way of see what went wrong.

I'm using a custom Listener for PHPUnit, which works sort of like an observer. This allows me to see where I made a mistake without hacking around in Zend_Test.

Here is how it works

Discover my PEAR channel:

sudo pear channel-discover till.pearfarm.org

Install:

till@till-laptop:~/ sudo pear install till.pearfarm.org/Lagged_Test_PHPUnit_ControllerTestCase_Listener-alpha
downloading Lagged_Test_PHPUnit_ControllerTestCase_Listener-0.1.0.tgz ...
Starting to download Lagged_Test_PHPUnit_ControllerTestCase_Listener-0.1.0.tgz (2,493 bytes)
....done: 2,493 bytes
install ok: channel://till.pearfarm.org/Lagged_Test_PHPUnit_ControllerTestCase_Listener-0.1.0

If you happen to not like PEAR (What's wrong with you? ;-)), the code is also on github.

Usage

This is my phpunit.xml:

<?xml version="1.0" encoding="utf-8"?>
<phpunit bootstrap="./TestInit.php" colors="true" syntaxCheck="true">
    <testsuites>
    ...
    </testsuites>
    <listeners>
        <listener class="Lagged_Test_PHPUnit_ControllerTestCase_Listener" file="Lagged/Test/PHPUnit/ControllerTestCase/Listener.php" />
    </listeners>
</phpunit>

Output

Whenever I run my test suite and a test fails, it will add something like this to the output of PHPUnit:

PHPUnit 3.4.15 by Sebastian Bergmann.

..Test 'testUnAuthorizedHeader' failed.
RESPONSE

Status Code: 200

Headers:

     Cache-Control - public, max-age=120 (replace: 1)
     Content-Type - application/json (replace: 1)
     X-Ohai - WADDAP (replace: false)

Body:

{"status":"error","msg":"Not authorized"}

F..

Time: 5 seconds, Memory: 20.50Mb

There was 1 failure:

1) Api_IndexControllerTest::testUnAuthorizedHeader
Failed asserting response code "401"

/project/library/Zend/Test/PHPUnit/Constraint/ResponseHeader.php:230
/project/library/Zend/Test/PHPUnit/ControllerTestCase.php:773
/project/tests/controllers/api/IndexControllerTest.php:58

FAILURES!
Tests: 5, Assertions: 12, Failures: 1.

Analyze

Analyzing the output, I realize that my status code was never set. Even though I used a setRawHeader() call to set it. Turns out setRawHeader() is not parsed so the status code in Zend_Controller_Response_Abstract is not updated.

IMHO this is also a bug and a limitation of the framework or Zend_Test.

The quickfix is to do the following in my action:

$this->getResponse()->setHttpResponseCode(401);

Fin

That's all. Quick, but not so dirty. If you noticed, I got away without hacking Zend_Test or PHPUnit.

The listener pattern provides us with very powerful methods to hook into our test suite. If you see the source code it also contains methods for skipped tests, errors, test suite start and end.