A toolchain for CouchDB Lounge

Friday, February 26. 2010

One of our biggest issues with CouchDB is currently the lack of compaction of our database, and by lack of, I don't mean that CouchDB doesn't support it, I mean that we are unable to actually run it.

Compaction in a nutshell

Compaction in a nutshell is pretty cool.

As you know, CouchDB is not very space-efficient. For once, CouchDB saves revisions of all documents. Which means, whenever you update a document a new revision is saved. You can rollback any time, or expose it as a nifty feature in your application — regardless, those revisions are kept around until your database is compacted.

Think about it in terms of IMAP - emails are not deleted until you hit that magic "compact" button which 99% of all people who use IMAP don't know what it's for anyway.

Another thing is that whenever new documents are written to CouchDB and bulk mode is not used, it'll save them in a way which is not very efficient either. In terms of actual storage and indexing (so rumour has it).

Compaction woes

Since everything is simple with CouchDB, compaction is a simple process in CouchDB too. Yay!

When compaction is started, CouchDB will create a new database file where it stores the data in a very optimized way (I will not detail on this, go read a science book or google if you are really interested in this!). When the compaction process finished, CouchDB will exchange your old database file with the new database file.

The woes start with that e.g. when you have 700 GB uncompacted data, you will probably need another 400 GB for compaction to finish because it will create a second database file.

The second issue is that when you have constant writing on your database, the compaction process will actually never finish. It kind of sucks and for those people who aim to provide close to 100% availability, this is extremely painful to learn.


Continue reading "A toolchain for CouchDB Lounge"

CouchDB on Ubuntu on AWS

Friday, August 28. 2009

Here's a little HowTo on how to setup CouchDB on an AWS EC2 instance. But outside of AWS (and EC2), this setup works on any other Ubuntu server, and I suppose Debian as well.

Getting started

The following steps are a rough draft, or a sketch on how to get started. I suggest that you familiarize yourself with what all of these things do. If you want to skip on the reading and just get started, this should work anyway.

  • you (obviously) need an AWS account (and log into the AWS console).
  • you need a custom security group (make sure to open up for http traffic)
security_group_001

security_group_002

  • create an EBS volume (Take a deep breath and think about the size of the volume. Keep in mind that you don't want to run into space issues right away and that allocated storage (even idle) costs you money (e.g. 400 GB =~ 40 USD (per month), excluding the i/o).)
  • create a keypair (It'll prompt you to download a foobar.pem, I placed mine on my local machine in ~/.ssh/ and ran chmod 400 on it.)
  • get an elastic IP
  • start the instance
    • select an AMI (I selected alestic's 64bit server Ubuntu 9.04 AMI.)
    • assign your own security group AND the defaults one
    • select your keypair

Woo! We made it that far.

The instance should boot and once this is done (green indicates all went well), we want to associate the previously created EBS volume and the elastic IP to said instance.

Once these steps are complete, go on the instance screen, click on your running instance and then click on "Connect". It'll show you the ssh command to connect to your instance -- it should be similar to this:

ssh -i .ssh/foobar.pem root@ec2-W-X-Y-Z.compute-1.amazonaws.com

The W-X-Y-Z part is most likely replaced with your elastic IP.

This process is not very automated yet, but at least you have an instance up and running. The next step is to try to login and see if the EBS was attached — if all went well, you should have /mnt.


Continue reading "CouchDB on Ubuntu on AWS"

Ubuntu: nginx+php-cgi on a socket

Friday, July 31. 2009

Moving our PHP application into the cloud, means for us that we are leaving FreeBSD for Linux. Not the best move (IMHO), but I shall elaborate on this in a future blog post.

Once we decided on Ubuntu as the Linux of our choice, I started by moving our development server to an instance on Slicehost. Point taken, Slicehost is not the cloud (as in Amazon EC2, Rackspace, Flexiscale or GoGrid) yet, but Linux on Slicehost and Linux on Amazon EC2 will be alike (or so I hope :-)) and a getting a small slice versus getting a small EC2 instance is an economical decision in the end.

Introduction

The following is the start script for my php-cgi processes, which I ported from FreeBSD (I previously blogged about it here).

The advantages of this script are:

  1. php-cgi runs on a unix domain socket — no need for tcp/ip on localhost.
  2. No need for the infamous spawn-fcgi script, which never worked for me anyway, and on Ubuntu requires you to install lighttpd (if you don't happen to be on Karmic Koala).
  3. You can setup different websites with different instances of php-cgi. This is great for virtual hosting, especially on a development server where the different workspaces may have different PHP settings and you want to run versions in parallel without sharing settings and therefore maybe affecting each other.
  4. Icing on the cake: we could even add a custom php.ini to the start call for each instance (-c option) to customize it even further.

Continue reading "Ubuntu: nginx+php-cgi on a socket"