From Subversion to GIT (and beyond!)

If you enjoyed this article, please leave a comment, rss subscribe to my RSS feed and/or follow me on Twitter. Thank you very much!

Here's a more or less simple way to migrate from Subversion to GIT(hub), this includes mapping commits and tags and what not!

Authors

If multiple people congtributed to your project, this is probably the toughest part. If you're not migration from let's say Google Code but PHP's Subversion repository, then it's really pretty simple indeed: the username is the email address.

I found a nifty bash script to get it done (and adjusted it a little bit):

#!/usr/bin/env bash
authors=$(svn log -q | grep -e '^r' | awk 'BEGIN { FS = "|" } ; { print $2 }' | sort | uniq)
for author in ${authors}; do
  echo "${author} = ${author} <${author}@php.net>";
done

Since I migrated my project already, I didn't have the Subversion tree handy. That's why I used another package I maintain to demo this.

This is how you run it (assumes you have chmod +x'd it before):

# ./authors.sh
cvs2svn = cvs2svn <...>
cweiske = cweiske <...>
danielc = danielc <...>
gwynne = gwynne <...>
kguest = kguest <...>
pmjones = pmjones <...>
rasmus = rasmus <...>
till = till <...>

If you redirect the output to authors.txt, you're done.

Note: In case people don't have the email address you used on their Github account, they can always add it later on. Github allows you to use multiple email addresses, which is pretty handy for stuff like open source and work-related repositories.

git clone

This part took me a long time to figure out — especially because of the semi-weird setup in Google Code. The wiki is in Subversion as well, so the repository root is not the root where the software lives. This is probably a non-issue if you want to migrate the wiki as well, but I don't see why you would cluter master with it. Instead, I'd suggest to migrate the wiki content into a seperate branch.

Without further ado, this works:

# git svn clone --authors-file=./authors.txt --no-metadata --prefix=svn/ \
--tags=Services_ProjectHoneyPot/tags \
--trunk=Services_ProjectHoneyPot/trunk \
--branches=Services_ProjectHoneyPot/branches \
http://services-projecthoneypot.googlecode.com/svn/ \
Services_ProjectHoneyPot/

The final steps are to add a new remote and push master to it. Done!

You can see the result on Github.

A shortcut with Google Code

I facepalm'd myself when I saw that I could have converted my project on Google Code from Subversion to GIT.

This seems a lot easier since it would allow me to just push the branch to Github without cloning etc.. I'm not sure how it would spin off the wiki-content and how author information is preserved, but I suggest you try it out in case you want to migrate your project off of Google Code.

Doing it the other way is not time wasted since I had to figure out the steps regardless.

Summary

There seem to be literally a million ways to migrate (from Subversion) to GIT. I hope this example got you one step closer to your objective.

The biggest problem migrating is, that often people in Subversion-land screw up tags by committing to them (I'm guilty as well). I try to avoid that in open source, but as far as work is concerned, we sometimes need to hotfix a tag and just redeploy instead of running through the process of recreating a new tag (which is sometimes super tedious when a large Subversion repository is involved).

When I migrated a repository the other day, I noticed that these tags became branches in GIT. The good news is, I won't be able to do this anymore with GIT (Yay!), which is good because it forces me to create a more robust and streamlined process to get code into production. But how do I fix this problem during the migration?

Fix up your tags

If you happen to run into this problem, my suggestion is to migrate trunk only and then re-create the tags in GIT.

GIT allows me to create a tag based on a branch or based on a commit. Both options are simple and much better than installing a couple Python- and/or Ruby-scripts to fix your tree, which either end up not working or require a PHD in Math to understand.

To create a tag from a branch, I check out the branch and tag it. This may work best during a migration and of course it depends on how many tags need to be (re-)created and if you had tags at all. Creating a tag based on a commit comes in handy, when you forgot to create a tag at all: for example, you fixed a bug for a new release and then ended up refactoring a whole lot more.

In order to get the history of your GIT repository, try git log:

# git log --pretty=oneline                                                                                                                             [16:03:13]
1d973dfe6f6e361e6f54953f374d60289bb0abea add AllTests
f53404579f5416058937941d0609df4720717cae  * update package.xml for 0.6.0 release
d5b42eef2035d26b1e1d119ff44a09efa418685e  * refactored Services_ProjectHoneyPot_Response:    * no static anymore    * type-hinting all around
82d7e8d229109565d42f98c6548354f85734583c make skip more robust
b9e77a427eb546bacce600f5bc41546e85c148d7 prep package.xml for 0.6.0
6cecfbc19c00f0bf06b800c297e23f00cee650ef  * remove response-format mambojambo
036a9d9509adb456114f601c60d188839c012004 make test more robust
713c4ec91c28e19fbb33d6bb853ced0bdeb3f321  * update harvester's IP (this fails always)
755a2bba8f8506525a9cd2a1f11b266b7d26bbe6 throw exception if not a boolean
2aa21913946e2b4b3db949233a118dbbe2e34bf4  * all set*() are a fluent interface now  * update from Net_DNS_Resolver to Net_DNS2_Resolver  * dumb down setResolver(): Net_DNS2_Resolver is created in __const
81f544d880fc7b7a6321be9420b911817b567bd1 update docblock
dbe74da67c5fa1f1209fe85d3050041ca2a2de6b  * update docblock  * fix cs, whitespace
...

If I wanted to create a tag based on a certain commit (e.g. see last revision in the previous listing), I'd run the following command:

# git tag -a 0.5.4 dbe74da67c5fa1f1209fe85d3050041ca2a2de6b

Pro-tip: GIT allows you to create tags based on part of the hash to. Try "dbe74da", it should work as well.

That's all.

Things to learn

Moving from Subversion to GIT doesn't require too much to relearn. Instead of just commit, you also push and pull. These commands will get you pretty far if you just care for the speed and not for the rest.

Since I'm hoping, you want more, here are a couple things to look into:

  • branching
  • merging
  • git add -p
  • git remote add

Especially branching and merging are almost painless with GIT. I highly, highly recommend you make heavy use of it.

While GIT is sometimes a brainf*ck (e.g. submodules, commit really stages, subtree, absense of switch), the many benefits usually outweigh the downside. The one and only thing I truely miss are svn:externals currently. However, I'm hoping to master subtree one day and then I'll be a very happy camper.

Fin

That's all.

| More