Skip to content

Measuring CouchDB performance

The overall document-oriented approach of CouchDB and the free-form way of saving data are probably the things that appeal to most of us when we first read about this new database.

Most of the people that were introduced to CouchDB so far quickly made the decision to use it in production despite the early beta'ish state of the project. We all hate normalization, we all want a faster and responsive database, and some of us want multiple nodes and inter-node replication. CouchDB manages to sell all these quite well. And of course, there are plenty of other reasons.

One of the reasons to use CouchDB for us is the speed of read. The environment where we will deploy CouchDB first is a massive database used to auto-complete data for forms for our customers. The database's size is currently over 19,000,000 documents (and we still got around 3 to 4,000,000 records to import) and the read speed has performed insanely well.

While the "query" is very simple (we go by document key, _id), I'm still impressed with it. I don't even want to cache it yet (something I'd never ever do if I deployed a MySQL database of this size — no offense meant), and if we ever needed a cache, we would plug Varnish or a shiney hardware component in the middle and be done with it — because that's how awesome a HTTP-API is.

Over the last weeks Jan has been talking about a module to generate statistics for the avid CouchDB user (me, me, me) and when he collected feedback, he got me interested in measuring/seeing what happened with my database because so far I didn't care much and settled for absolutely-f'ing-fast instead.

Measure write speed

So let's say I want to know how many documents per minute I am writing into CouchDB. Unix to the rescue:

cat /var/log/couchdb/couch.log|grep '\[Sat, 21 Feb 2009 22:51:'|grep -v 412|wc -l

With little optimization I easily manage 1500 documents per minute and I'm able to scale up the more workers I add. For example, adding another two workers gets me up to over 3000 documents per minute (excluding duplicates). By the way, these are single (PUT) requests and no bulk writes.

The above line filters all "requests" that happened today (Saturday, 21st February, 2009) at 22:51. Because I'm only writing, I filter possible "duplicates" (HTTP Status: 412) from it, and count all returned rows from the logfile.

On a side-note, the server said (while performing 3000 writes):

7:30PM up 211 days, 3:28, 1 user, load averages: 0.94, 0.70, 0.55

CouchDB's beam process used a total of 7 MB memory which suggests that we haven't met a border by far; the server is also host to a busy MySQL server.

Measure read

cat /var/log/couchdb/couch.log|grep '\[Sat, 21 Feb 2009 22:51:'|wc -l

Imagine you do a series of reads against your database, and the documents all existed -- that's what would get you the number of reads in that minute.

Another way is to employ the infamous Apache Bench or Siege.

Don't trust benchmarks

The above is not a super-smart way to show off CouchDB's speedyness because:

  • You don't know the size of my documents.
  • Duplicates obviously happen -- in a perfect they would not.
  • You don't know anything about my hardware (it's something quad and alot of RAM).
  • You don't know my setup -- e.g. am I writing against localhost or from "remote".

The ideas in this blog post are attempts to provide people with the means to see what is going on right now. At least until the statistics module is in place.

In related news, I've followed jchris on Twitter and he reported 3000 docs/seconds and even shared his script. And last but not least, there are also related efforts on Github - a stress test suite.

Trackbacks

No Trackbacks

Comments

No comments

The author does not allow comments to this entry