Update (2009-10-13): I posted part II!

This is the first part of a series. I’ll start off by introducing CouchDB — from a PHP side, then I’ll demo a couple basic use cases and I later on, I’ll dive into migrations from MySQL.

My idea is to introduce CouchDB to a world where database-driven development generally refers to MySQL. By no means, this is meant to be disrespectful to MySQL, or SQL-databases in general. However, I’m a firm believer in using the right tool for the job.

First things first!

First off, before using CouchDB and maybe eventually replacing MySQL with it, we need to ask ourself the “Why?”-question.

And in order to be capable of more than a well-educated guess we need to familiarize ourselves with the CouchDB basics.

Basics

  • Document-oriented and schema-less storage.
  • Erlang (for rock-solid-scaling-goodness).
  • RESTful HTTP-API (we’ll get to that).
  • Everything is JSON - request data, storage, response!

Document-oriented

In a document-oriented as to opposed to a relational store, the data is not stored in table, where data is usually broken down into fields. In a document-oriented store each record is stored along side and can have its own characteristics — properties of any kind.

As an example, consider these two records:

Till Klampaeckel, Berlin
Till Klampaeckel, [email protected], Berlin, Germany

In a relational store, we would attempt to break down, or normalize, the data. Which means that we would probably create a table with the columns name, email, city and country.

Consider adding another record:

Till Klampaeckel, +49110, [email protected]

(Just fyi — this is not my real phone number!)

Looking for an intersection in the records, the name is the only thing this record has in common with the previous two. With a relational database, we would either have to add a column for phone number and chat, or we would start splitting off the data into multiple tables (e.g. a table called phone and one called chat) in order to get grip.

With a document-oriented database — such as CouchDB — this is not an issue.

We can store any data, constraints do not apply.

Erlang

Erlang was invented a while ago, by Ericsson, when it was still sans Sony. In a nutshell, Erlang’s true strength is reliability and stability. It also manages to really utilize all the resources modern hardware has to offer since it’s a master of parallelization.

CouchDB is written in Erlang, and also accepts view code written in Erlang. More on views later.

RESTful HTTP-API

For starters, a lot of HTTP-APIs claim to be RESTful, most of them are not. HTTP has so called request verbs (DELETE, GET, HEAD, POST, PUT among them) and a lot of APIs don’t use them to the fullest extend, or rather not all.

Instead, most APIs are limited GET and maybe use a little POST. An example of such an API is the Flickr API.

Most of us are familiar with GET and POST already. For example, when you opened the web page to this blog entry, the browser made a GET-request. If you decide to post a comment later on — you guessed it, that’s a POST-request.

Aside from its basic yet powerful nature, HTTP is interesting in particular because it is the least common multiple in many programming language. Whatever you use — C#, PHP, Python, Ruby — these languages know how to talk HTTP. And even better — most of them ship pretty comfortable wrappers.

JSON

JSON — it’s godsend for those of us who never liked XML.

It’s very lightweight, yet we able to represent lists and objects, integers, strings — most data types you would want to use. A clear disadvantage of JSON is that it lacks validation (think DTD), and of course comments — ha, ha!

Why, oh why?

So along with “Why?”, we should consider the following:

  • Does it make sense?
  • Is CouchDB (really) the better fit for my application?
  • What is my #1 problem in MySQL, and how does CouchDB solve it?

And if we are still convinced to migrate all of our data, we’ll need to decide on an access wrapper.

It’s all HTTP, right?

By now, everyone has heard that CouchDB has a RESTful HTTP-API. But what does that imply?

It means, that we won’t need to build a new extension in PHP to be able to use it. There’s already either ext/socket or ext/curl — often both — in 99% of all PHP installs out there. Which means that PHP is more than ready to talk to CouchDB — right out of the box.

Since I mentioned JSON before — today ext/json is available in most PHP installs as well. If however we happen to be one of the few unfortunates who don’t have and cannot get this extension, we should use Services_JSON instead.

Install it!

CouchDB installations are available in most Linux and Unix distributions. On MacOSX, get CouchDBXthe one-click CouchDB package, and there’s a work in process for Windows as well. Especially interesting for those who run Ubuntu 9.10 (which has been released a few days ago), there’s already a CouchDB install included.

Ubuntu/Debian:

apt-get install couchdb

FreeBSD:

cd /usr/ports/databases/couchdb && make install clean