Operating CouchDB

Saturday, May 8. 2010
Comments

These are some random operational things I learned about CouchDB. While I realize that my primary use-case (a CouchDB install with currently 230+ million documents) may be oversized for many, these are still things important things to know and to consider. And I would have loved to know of some of these before we grew that large.

I hope these findings are useful for others.

Compaction

CouchDB doesn't take great care of diskspace — the assumption is that disk is cheap. To get on top of it, you should run database and view compaction as often as you can. And the good news is, these operations help you to reclaim a lot of space (e.g. I've seen an uncompacted view of 200 GB trim down to ~30 GB).

Cleanup

In case you changed the actual view code, make sure to run the clean-up command (curl -X POST http://server/db/_view_cleanup) to regain disk space.

Performance impact

Database and view compaction (especially on larger sets) will slow down reads and writes considerably. Schedule downtime, or do it in off-peak hours if you think the operation is fast enough.

This is especially hideous when you run in a cloud environment where disk i/o generally sucks (OHAI, EBS!!!).

To stop either of those background-tasks, just restart CouchDB.

(Just fyi, the third option is of course to throw resources (hardware) at it.)

Resuming view compaction?

HA, HA! [Note, sarcasm!] — view compaction is not resumable, like database compaction.

View files

I suggest you split views into several design documents — this will have the following benefit.

For each design document, CouchDB will create a .view file (by default these are in var/lib/couchdb/database-name/.database-name_design/). It's just faster to run compact and cleanup operation on multiple (hopefully smaller files) versus one giant file.

In the end, you don't run the operation against the file directly, but against CouchDB — but CouchDB will deal with a smaller file which makes the operation faster and generally shorter — I call this poor man's view partitioning.

Warming the cache

Cache warming is when a cache is populated with items in order to avoid the cache and server being hit with too much traffic when a server starts up and here is what you can do with CouchDB in this regard.

The basics are obvious — updates to a CouchDB view are performed when the view is requested. This has certain advantages and works well in most situations. Something I noticed is that especially on smaller VPS servers where resources tend to be oversold and and are rare in general, generating view updates can slow your application down to a full stop.

As a matter of fact and CouchDB does often not respond during that operation when the disk was saturated (take into account that even a 2 GB database will get hard to work with if you only have 1 GB of RAM for CouchDB and the OS, and whatever else is running on the same server).

The options are:

  1. To get more traffic so views are constantly update and the updates performed are kept at a minimum.
  2. Make your application query views with ?stale=ok and instead update the views on a set interval, for example via a curl request in a cronjob.

Cache-warming for dummies, the CouchDB way.

View data

For various reasons such as space management and performance, it doesn't hurt to put all views on its own dedicated partition.

In order to do this, add the following to your local.ini (in [couchdb]): view_index_dir = /new_device

And assuming your database is called "helloworld" and the view dir is /new_device, your .view-files will end up in /new_device/.helloworld_design.

Overshard

I've blogged on CouchDB and CouchDB-Lounge before. No matter if you use the Lounge or build sharding into your application — consider it. From what I learned it's better to shard earlier (= overshard), than when it's too late. The faster your CouchDB grows, the more painful it will be to deal with all the data stuck in.

My suggestion is that when you rapidly approach 50,000,000 documents and see yourself exceeding (and maybe doubling) this number soon, take a deep breath and think about a sharding strategy.

Oversharding has the advantage that for example you run 10 CouchDB instances on the same server and move each of them (or a couple) to their own dedicated hardware once they exceed the resources of the single hardware.

If sharding is not your cup of tea, just outsource to Cloudant — they do a great job.

CouchDB-Lounge

CouchDB-Lounge is Meebo's python-based sharding framework for CouchDB. The lounge is essentially an nginx-webserver and a twistd service which proxies your requests to different shards using a shards.conf. The number of shards and also the level of redundancy are all defined in it.

CouchDB-Lounge is a great albeit young project. The current shortcomings IMHO include general stability of the twistd service and absence of features such as _bulk_docs which makes migrating a larger set into CouchDB-Lounge a tedious operation. Never the less, this is something to keep an eye on.

Related to CouchDB-Lounge, there's also lode — a JavaScript- and node.js-based rewrite of the Python lounge.

Erlang-Lounge

What I call Erlang-Lounge is Cloudant's internal erlang-based sharding framework for CouchDB. It's in production at Cloudant and to be released soon. From what I know Cloudant will probably offer a free opensource version and support once they released it.

Disk, CPU or memory — which is it?

This one is hard to say. But despite how awesome Erlang is, even CouchDB depends on the system's available resources. ;-)

Disk

For starters, disk i/o is almost always the bottleneck. To verify if this the bottleneck in your particular case, please run and analyze [iostat][] during certain operations which appear to be slow in your context. For everyone on AWS, consider a RAID-0 setup, for everyone else, buy faster disks.

CPU

The more CPU in a server, the more beam processes. CouchDB (or Erlang) seem to take great advantage of this resource. I haven't really figured out a connection between CPU and general performance though because in my case memory and disk were always the bottleneck.

Memory

... seems to be another underestimated bottleneck. For example, I noticed that replication can slow down to a state where it seems faster to copy-paste documents from one instance to another when CouchDB is unable to cache an entire b-tree in RAM.

We've been testing some things on a nifty high-memory 4XL AWS instance and during a compact operation, almost 90% of my ram (70 GB) was used by the OS to cache. And don't make my mistake and rely on (h)top to verify this, cat /proc/meminfo instead.

Caching

Caching is trivial with CouchDB.

e-tags

Each document and view responds with an Etag header — here is an example:

curl -I http://foo:bar@till.cloudant.com:5984/foobar/till-laptop_1273064525
HTTP/1.1 200 OK
Server: CouchDB/0.11.0a-1.0.7 (Erlang OTP/R13B)
Etag: "1-92b4825ffe4c61630d3bd9a456c7a9e0"
Date: Wed, 05 May 2010 13:20:12 GMT
Content-Type: text/plain;charset=utf-8
Content-Length: 1771
Cache-Control: must-revalidate

The Etag will only changes, when the data in the document change. Hence it's trivial to avoid hitting the database if you don't have to. The request above is a very lightweight HEAD request which only gathers the data and does not pull the entire document.

_changes

_changes represents a live-update feed of your CouchDB database. It's located at http://server/dbname/_changes.

Whenever a data changing operation is completed, _changes will reflect that, which makes it easy for a developer to stay on top to for example invalidate an application cache only when needed (and not like it's done usually when the cache expired).

Logging

Logrotate

First off, a lot of people run CouchDB from source which means that in 99% of all installs, the logrotation is not activated.

To fix this (on Ubuntu/Debian), do the following:

ln -s /usr/local/etc/logrotate.d/couch /etc/logrotate.d/couchdb

Make sure to familiarize yourself a little with logrotatet because depending on space and business of your installation, you should adjust the config a little to not run out of diskspace. If CouchDB is unable to log, it will crash.

Loglevel

In most cases it's more than alright to just run with a log level of error.

Add the following to your local.ini (in [log]): level = error

Log directory

Still running out of diskspace? Add the following to your local.ini (in [log]):

file = /path/to/more/diskspace/couch.log

... if you adjusted the above, you will need to correct the config for logrotate.d as well.

No logging?

Last but not least — if no logs are needed, just turn them off completely.

Fin

That's all kids.

A toolchain for CouchDB Lounge

Friday, February 26. 2010
Comments

One of our biggest issues with CouchDB is currently the lack of compaction of our database, and by lack of, I don't mean that CouchDB doesn't support it, I mean that we are unable to actually run it.

Compaction in a nutshell

Compaction in a nutshell is pretty cool.

As you know, CouchDB is not very space-efficient. For once, CouchDB saves revisions of all documents. Which means, whenever you update a document a new revision is saved. You can rollback any time, or expose it as a nifty feature in your application — regardless, those revisions are kept around until your database is compacted.

Think about it in terms of IMAP - emails are not deleted until you hit that magic "compact" button which 99% of all people who use IMAP don't know what it's for anyway.

Another thing is that whenever new documents are written to CouchDB and bulk mode is not used, it'll save them in a way which is not very efficient either. In terms of actual storage and indexing (so rumour has it).

Compaction woes

Since everything is simple with CouchDB, compaction is a simple process in CouchDB too. Yay!

When compaction is started, CouchDB will create a new database file where it stores the data in a very optimized way (I will not detail on this, go read a science book or google if you are really interested in this!). When the compaction process finished, CouchDB will exchange your old database file with the new database file.

The woes start with that e.g. when you have 700 GB uncompacted data, you will probably need another 400 GB for compaction to finish because it will create a second database file.

The second issue is that when you have constant writing on your database, the compaction process will actually never finish. It kind of sucks and for those people who aim to provide close to 100% availability, this is extremely painful to learn.


Continue reading "A toolchain for CouchDB Lounge"

Thoughts on RightScale

Tuesday, October 20. 2009
Comments

RightScale provides all kinds of things — from a pre-configured MySQL master-slave setup (with automatic EBS/s3 backups), to a full LAMP stack, Rails app servers, virtually all kinds of other pre-configured server templates to a nifty auto-scaling feature.

We decided to leverage RightScale when we planned our move to AWS a couple months ago in order to not have to build everything ourselves. I've been writing this blog entry for the past five weeks and here are some observations, thoughts and tips.

RightScale

First off, whatever you think, and do, or have done so far, let me assure you, there's always a RightScale way of doing things. For (maybe) general sanity (and definitely your own), I suggest you don't do it their way — always.

One example for the RightScale way is, that all the so-called RightScripts will attempt to start services on reboot for you, instead of registering them with the init system (e.g., on Ubuntu, update-rc.d foo defaults) when they are set up.

You may argue that RightScale's attempt will provide you with a maybe more detailed protocol of what happened during the boot sequence, but at the same time it provides more potential for errors and introduces another layer around what the operating system provides, and what generally works pretty well already.

PHP and RightScale

RightScale's sales team knows how to charm people, and when I say charm, I do not mean scam (just for clarity)! :-)

The demos are very impressive and the client show cases not any less. Where they really need to excel though are PHP-related demos because not everyone in the world runs Ruby on Rails yet. No, really — there's still us PHP people and also folks who run Python, Java and so on.

Coming from the sales pitch, I felt disappointed a little because a standard PHP setup on RightScale is as standard as you would think three years ago. mod_php, Apache2 and so on. The configuration itself is a downer as well, a lot of unnecessary settings and generally not so speedy choices. Then remember that neither CentOS nor Ubuntu are exactly up to date on packages and add another constraint to the mix — Ubuntu is on 8.04 which is one and half years in the past as I write this entry.

And even though I can relate to RighScale's position — in terms of that supporting customers with all kinds of different software is a burden and messy to say the least — I am also not a fan.

Scaling up

The largest advantage when you select a service provider such as RightScale is, that they turn raw EC2 instances into usable servers out of the box. So far example setting up a lamp stack yourself requires time, while it's still a trivial task for many. With RightScale, it's a matter of a couple clicks — select image, start, provide input variables and done.

Aside from enhanced AMIs RightScale's advantage is auto-scaling. Auto-scaling has been done a couple times before. There are more than one service provider which leverages EC2 and provides scalability on top. Then take a look at Scalr, which is open source, and then recently Amazon themselves added their own Elastic Load Balancer.

In general, I think auto-scaling is something everyone gets, and wants, but of course it's not that dead simple to implement. And especially when you move to a new platform, it's a perfect trade off to sacrifice a flexibility and money for a warm and fuzzy "works out of the box" feeling.


Continue reading "Thoughts on RightScale"

Managing software deployments of your PHP applications II

Wednesday, September 9. 2009
Comments

This is not (really a part) two of my series, but an Intermezzo (1) between Part I and Part III — because I have no time to finish Part III.

In Part I, I talked about my approach to deploying a website and I offered pear and subversion as solutions to the problem. To briefly elaborate on my subversion part, I want to share the following Capistrano recipe with you.

Capistrano, isn't that Ruby?

Yes it is! Capistrano is a nifty piece of software. Not because it's so super-duper complex, but because it's pretty robust and works really well.

I am sure everyone heard of the "the right tool for the job" theory and this is along those lines. Please don't be afraid of a little Ruby to ease the deployment process. (They don't deserve your fear anyway. ;-))

Installation

On FreeBSD:

cd /usr/ports/devel/rubygem-capistrano && make install clean

On Debilian/Ubuntu:

apt-get install rubygems
gem install capistrano

Go on and create a capfile in $HOME, and this is where we'll keep our tasks.


Continue reading "Managing software deployments of your PHP applications II"

Ubuntu: nginx+php-cgi on a socket

Friday, July 31. 2009
Comments

Moving our PHP application into the cloud, means for us that we are leaving FreeBSD for Linux. Not the best move (IMHO), but I shall elaborate on this in a future blog post.

Once we decided on Ubuntu as the Linux of our choice, I started by moving our development server to an instance on Slicehost. Point taken, Slicehost is not the cloud (as in Amazon EC2, Rackspace, Flexiscale or GoGrid) yet, but Linux on Slicehost and Linux on Amazon EC2 will be alike (or so I hope :-)) and a getting a small slice versus getting a small EC2 instance is an economical decision in the end.

Introduction

The following is the start script for my php-cgi processes, which I ported from FreeBSD (I previously blogged about it here).

The advantages of this script are:

  1. php-cgi runs on a unix domain socket — no need for tcp/ip on localhost.
  2. No need for the infamous spawn-fcgi script, which never worked for me anyway, and on Ubuntu requires you to install lighttpd (if you don't happen to be on Karmic Koala).
  3. You can setup different websites with different instances of php-cgi. This is great for virtual hosting, especially on a development server where the different workspaces may have different PHP settings and you want to run versions in parallel without sharing settings and therefore maybe affecting each other.
  4. Icing on the cake: we could even add a custom php.ini to the start call for each instance (-c option) to customize it even further.

Continue reading "Ubuntu: nginx+php-cgi on a socket"