A toolchain for CouchDB Lounge

If you enjoyed this article, please leave a comment, rss subscribe to my RSS feed and/or follow me on Twitter. Thank you very much!

One of our biggest issues with CouchDB is currently the lack of compaction of our database, and by lack of, I don't mean that CouchDB doesn't support it, I mean that we are unable to actually run it.

Compaction in a nutshell

Compaction in a nutshell is pretty cool.

As you know, CouchDB is not very space-efficient. For once, CouchDB saves revisions of all documents. Which means, whenever you update a document a new revision is saved. You can rollback any time, or expose it as a nifty feature in your application — regardless, those revisions are kept around until your database is compacted.

Think about it in terms of IMAP - emails are not deleted until you hit that magic "compact" button which 99% of all people who use IMAP don't know what it's for anyway.

Another thing is that whenever new documents are written to CouchDB and bulk mode is not used, it'll save them in a way which is not very efficient either. In terms of actual storage and indexing (so rumour has it).

Compaction woes

Since everything is simple with CouchDB, compaction is a simple process in CouchDB too. Yay!

When compaction is started, CouchDB will create a new database file where it stores the data in a very optimized way (I will not detail on this, go read a science book or google if you are really interested in this!). When the compaction process finished, CouchDB will exchange your old database file with the new database file.

The woes start with that e.g. when you have 700 GB uncompacted data, you will probably need another 400 GB for compaction to finish because it will create a second database file.

The second issue is that when you have constant writing on your database, the compaction process will actually never finish. It kind of sucks and for those people who aim to provide close to 100% availability, this is extremely painful to learn.

Doing the math

Doing the math on your compaction process is especially hideous. Because there are so many factors. There are two main factors to compaction: disk-i/o and the actual size of the database.

To talk about disk-i/o is especially random when you run in the Amazon cloud (AWS) and (hopefully) utilize EBS storage. From my experience, there seems to be no constant throughput on it. Even though I heard that EBS network traffic is essentially on a different network than regular network traffic I still noticed a lot of randomness in performance when I benched a couple operations on my volumes.

If you're not in the cloud and run your own hardware, disk-i/o will be something you can take care of especially by buying the right hardware (e.g. SAS disks, raid controllers with lots of battery cache, maybe a nifty sun appliance with SSD read/write caches, etc.). There's a lot of money to be spent on disk-i/o.

As for the second important factor to compaction — the size of the database — that's something you can take care of with a technique called oversharding.

(Over)sharding

(Database) Sharding, or horizontal partitioning, generally refers to splitting up a dataset in order to ease read/write operations on the data. A lot of large websites such as Digg and Facebook employ sharding techniques with all the (UG)content they acquired over the years.

The advantage of splitting up the data into shards is that the total number of rows in a table on one shard is lower which means that index size is smaller and all the operations you run on it also tend to get faster.

Most software architects will recommend sharding only when the need arises — something about, premature optimization is the root of all evil — oversharding on the other side is a preemptive measure.

Oversharding to the rescue

With oversharding, we aim to keep the set in manageable portions. The primary goal being to be able to compact the data (always). While sharding generally means we have a lot of data and we need different physical nodes, oversharding can happen on the same server — in our case, we run 50 CouchDB instances on localhost, on different ports.

Anyway, and this brings me to CouchDB Lounge! ;)

Introducing the Lounge

CouchDB Lounge is a proxy-based partitioning/clustering framework for CouchDB. It was originally written by Kevin Ferguson, Vijay Ragunathan and Shaun Lindsay at Meebo.com and currently works with CouchDB 0.10.0.

There are two main pieces:

  • dumbproxy
  • smartproxy

... and a couple smaller scripts in between. ;)

Dumbproxy

Dumbproxy is a weird name, sort of. Often this part is also referred to as nginx-lounge. The parts are the nginx source, a custom module, compiled and installed this is ready to go. Aside from what the project pages on Google Code (outdated) and Github will tell you, it includes an entire nginx (0.7.x) install.

Dumbproxy generally runs on port 6984 and splits up the incoming data to be saved on the various shards. There's also an algorithm in dumbproxy which makes sure that a simple GET request also reaches the shard the data was saved to etc.. Pretty nifty, etc..

Smart proxy

Smartproxy is a twistd python daemon which handles map and reduce for view requests on all shards of the cluster.

replication_notifier.py

Is a python script which is added to CouchDB's default.ini, which handles distribution of the data if you run with redundance — so I believe.

Note: The replication notifier is the only part which my toolchain does not setup for you, see lounge-notes.txt for details, it's rather simple.

My toolchain

My toolchain is a set of scripts which is hosted in my ubuntu repository on Github.

As the repository name suggest, these scripts will work best on Ubuntu (and probably also Debian). I tested this on 9.04 and also 9.10 so far. The steps to get your CouchDB lounge running are rather simple and explain on the page (see README.md).

After you completed all steps, you should copy my start script (lounge-shard-init) to /etc/init.d. Then add the replication notifier to the CouchDB's default.ini and start all components after another and see if you can access the lounge through http://localhost:6984. It should greet you with the standard CouchDB greeting (of 0.10, of course).

Moving to the lounge, and what about compaction?

Moving to the lounge is a little more complex — since it does not currently support replication to it. I'm currently working on a replicator to get the job done. Another thing is that said compaction cannot be run on the lounge globally, instead you'll have to run the jobs on each node.

Fin

On my TODO for this are enhancing my bash, adding the replication notifier and something like providing a configuration file for all the variables so people don't need to edit my scripts.

Currently, there are a view variables at the top in both lounge-fest.sh and lounge-shard-conf.sh, mostly paths of where you want your CouchDB install to live, and also seperate entries for the database files and log directories. Edit them to your liking and you're good to go.

As always, please comment for feedback and questions, fork and send me pull requests in case there's anything I missed and so on.

| More