PHP: So you'd like to migrate from MySQL to CouchDB? - Part I

If you enjoyed this article, please leave a comment, rss subscribe to my RSS feed and/or follow me on Twitter. Thank you very much!

Update (2009-10-13): I posted part II!

This is the first part of a series. I'll start off by introducing CouchDB — from a PHP side, then I'll demo a couple basic use cases and I later on, I'll dive into migrations from MySQL.

My idea is to introduce CouchDB to a world where database-driven development generally refers to MySQL. By no means, this is meant to be disrespectful to MySQL, or SQL-databases in general. However, I'm a firm believer in using the right tool for the job.

First things first!

First off, before using CouchDB and maybe eventually replacing MySQL with it, we need to ask ourself the "Why?"-question.

And in order to be capable of more than a well-educated guess we need to familiarize ourselves with the CouchDB basics.

Basics

  • Document-oriented and schema-less storage.
  • Erlang (for rock-solid-scaling-goodness).
  • RESTful HTTP-API (we'll get to that).
  • Everything is JSON - request data, storage, response!

Document-oriented

In a document-oriented as to opposed to a relational store, the data is not stored in table, where data is usually broken down into fields. In a document-oriented store each record is stored along side and can have its own characteristics — properties of any kind.

As an example, consider these two records:

Till Klampaeckel, Berlin
Till Klampaeckel, [email protected], Berlin, Germany

In a relational store, we would attempt to break down, or normalize, the data. Which means that we would probably create a table with the columns name, email, city and country.

Consider adding another record:

Till Klampaeckel, +49110, [email protected]

(Just fyi — this is not my real phone number!)

Looking for an intersection in the records, the name is the only thing this record has in common with the previous two. With a relational database, we would either have to add a column for phone number and chat, or we would start splitting off the data into multiple tables (e.g. a table called phone and one called chat) in order to get grip.

With a document-oriented database — such as CouchDB — this is not an issue.

We can store any data, constraints do not apply.

Erlang

Erlang was invented a while ago, by Ericsson, when it was still sans Sony. In a nutshell, Erlang's true strength is reliability and stability. It also manages to really utilize all the resources modern hardware has to offer since it's a master of parallelization.

CouchDB is written in Erlang, and also accepts view code written in Erlang. More on views later.

RESTful HTTP-API

For starters, a lot of HTTP-APIs claim to be RESTful, most of them are not. HTTP has so called request verbs (DELETE, GET, HEAD, POST, PUT among them) and a lot of APIs don't use them to the fullest extend, or rather not all.

Instead, most APIs are limited GET and maybe use a little POST. An example of such an API is the Flickr API.

Most of us are familiar with GET and POST already. For example, when you opened the web page to this blog entry, the browser made a GET-request. If you decide to post a comment later on — you guessed it, that's a POST-request.

Aside from its basic yet powerful nature, HTTP is interesting in particular because it is the least common multiple in many programming language. Whatever you use — C#, PHP, Python, Ruby — these languages know how to talk HTTP. And even better — most of them ship pretty comfortable wrappers.

JSON

JSON — it's godsend for those of us who never liked XML.

It's very lightweight, yet we able to represent lists and objects, integers, strings — most data types you would want to use. A clear disadvantage of JSON is that it lacks validation (think DTD), and of course comments — ha, ha!

Why, oh why?

So along with "Why?", we should consider the following:

  • Does it make sense?
  • Is CouchDB (really) the better fit for my application?
  • What is my #1 problem in MySQL, and how does CouchDB solve it?

And if we are still convinced to migrate all of our data, we'll need to decide on an access wrapper.

It's all HTTP, right?

By now, everyone has heard that CouchDB has a RESTful HTTP-API. But what does that imply?

It means, that we won't need to build a new extension in PHP to be able to use it. There's already either ext/socket or ext/curl — often both — in 99% of all PHP installs out there. Which means that PHP is more than ready to talk to CouchDB — right out of the box.

Since I mentioned JSON before — today ext/json is available in most PHP installs as well. If however we happen to be one of the few unfortunates who don't have and cannot get this extension, we should use Services_JSON instead.

Install it!

CouchDB installations are available in most Linux and Unix distributions. On MacOSX, get CouchDBXthe one-click CouchDB package, and there's a work in process for Windows as well. Especially interesting for those who run Ubuntu 9.10 (which has been released a few days ago), there's already a CouchDB install included.

Ubuntu/Debian:

apt-get install couchdb

FreeBSD:

cd /usr/ports/databases/couchdb && make install clean

Raw, or comfort?

In general, my next question would be how comfortable it is to interact with CouchDB.

What I mean is the following — consider this curl request from the shell:

shell% curl -X PUT http://localhost:5984/mydb/mydoc -d '{"foo":"bar"}'

This request creates a document with an id called mydoc in a database called mydb. Simple and straightforward.

If I translate the above into PHP (using curl), it will look similar to the following:

<?php
$data = array('foo' => 'bar');
$json = json_encode($data);

$file = tmpfile(); 
fwrite($file, $json); 
fseek($file, 0);

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, "http://localhost:5984/mydb/mydoc");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_PUT, true);
curl_setopt($ch, CURLOPT_INFILE, $file); 
curl_setopt($ch, CURLOPT_INFILESIZE, strlen($json));

$result = curl_exec($ch);

fclose($file);
curl_close($ch);

echo $result;
?>

That's a lot of code, and also somewhat nasty looking? But that's because different limitations apply!

  1. A PUT request through PHP's curl extension needs a file handle to work with — there's no way to push a simple string into it.
  2. ext/curl itself is not exactly straightforward to use. (No offense, and strictly IMHO, of course!)

Alternatives?

Of course there are more than a few ways to comfortably access CouchDB from PHP.

HTTP_Request2 vs. Zend_Http_Client

Both solutions are full blown PHP5 and offer an object oriented approach. My personal favorite is HTTP_Request2, but Zend_Http_Client is a close second. Both components are easily structured and simple to use HTTP client libraries. Both are more than capable to access CouchDB.

The equivalent to the curl command with HTTP_Request2:

<?php
require_once 'HTTP/Request2.php';

$data = array('foo' => 'bar');
$json = json_encode($data);

$req  = new HTTP_Request2('http://localhost:5984/mydb/mydoc');
$resp = $req->setMethod(HTTP_Request2::METHOD_PUT)
    ->setBody($json)
    ->send();

echo $resp->getBody();
?>

phpillow

phpillow is Kore Nordmann's CouchDB wrapper. You can get it from his website, a basic example looks like this:

<?php
require 'phpillow/autoload.php';

$data = array('foo' => 'bar');
$json = json_encode($data);

phpillowConnection::createInstance('localhost', 5894);
phpillowConnection::setDatabase('mydb');
$db = phpillowConnection::getInstance();
$db->put(phpillowConnection::getDatabase() . 'mydoc', $json);
?>

phpillow adds another layer to what HTTP_Request2 and Zend_Http_Client provide. And it really excels when the access classes (phpillowDocument, phpillowView, etc.) are used. It also allows you to define model classes (e.g. by extending phpillowDocument) and defining a set of rules for document data validation, document id generation, etc..

Love it, or hate it, phpillow is probably not what you want to use for your first steps into CouchDB, but you want to check it out when you build a more complex application.

Quo vadis, wrapper?

One of the things I noticed about CouchDB is, that especially when you use one of the client libraries I mentioned earlier, you tend to re-build it in every project. To this date, I unfortunately know of no awesome PHP wrapper for CouchDB which I'd recommend to everyone in every situation.

Even my own approach is very much tailored to our own use case and in my opinion doesn't qualify to meet the needs of a wider audience.

Which leads to a small dilemma — the danger one runs into is that whenever we attempt to create a wrapper that's flexible enough to serve a greater audience, we may end up making it more complex to work with CouchDB through the wrapper than it really has to be.

But in the end — be that as it may — this is the extreme strength of CouchDB. With very little knowledge of HTTP, we are productive right away. No training and books necessary.

Conclusion

In this entry, I've highlighted the current PHP approach to CouchDB. I tried to keep things simple for now and will dive into basic usage and more examples in my next installment.

Stay tuned!

| More