A toolchain for CouchDB Lounge

Friday, February 26. 2010

One of our biggest issues with CouchDB is currently the lack of compaction of our database, and by lack of, I don't mean that CouchDB doesn't support it, I mean that we are unable to actually run it.

Compaction in a nutshell

Compaction in a nutshell is pretty cool.

As you know, CouchDB is not very space-efficient. For once, CouchDB saves revisions of all documents. Which means, whenever you update a document a new revision is saved. You can rollback any time, or expose it as a nifty feature in your application — regardless, those revisions are kept around until your database is compacted.

Think about it in terms of IMAP - emails are not deleted until you hit that magic "compact" button which 99% of all people who use IMAP don't know what it's for anyway.

Another thing is that whenever new documents are written to CouchDB and bulk mode is not used, it'll save them in a way which is not very efficient either. In terms of actual storage and indexing (so rumour has it).

Compaction woes

Since everything is simple with CouchDB, compaction is a simple process in CouchDB too. Yay!

When compaction is started, CouchDB will create a new database file where it stores the data in a very optimized way (I will not detail on this, go read a science book or google if you are really interested in this!). When the compaction process finished, CouchDB will exchange your old database file with the new database file.

The woes start with that e.g. when you have 700 GB uncompacted data, you will probably need another 400 GB for compaction to finish because it will create a second database file.

The second issue is that when you have constant writing on your database, the compaction process will actually never finish. It kind of sucks and for those people who aim to provide close to 100% availability, this is extremely painful to learn.


Continue reading "A toolchain for CouchDB Lounge"

Thoughts on RightScale

Tuesday, October 20. 2009

RightScale provides all kinds of things — from a pre-configured MySQL master-slave setup (with automatic EBS/s3 backups), to a full LAMP stack, Rails app servers, virtually all kinds of other pre-configured server templates to a nifty auto-scaling feature.

We decided to leverage RightScale when we planned our move to AWS a couple months ago in order to not have to build everything ourselves. I've been writing this blog entry for the past five weeks and here are some observations, thoughts and tips.

RightScale

First off, whatever you think, and do, or have done so far, let me assure you, there's always a RightScale way of doing things. For (maybe) general sanity (and definitely your own), I suggest you don't do it their way — always.

One example for the RightScale way is, that all the so-called RightScripts will attempt to start services on reboot for you, instead of registering them with the init system (e.g., on Ubuntu, update-rc.d foo defaults) when they are set up.

You may argue that RightScale's attempt will provide you with a maybe more detailed protocol of what happened during the boot sequence, but at the same time it provides more potential for errors and introduces another layer around what the operating system provides, and what generally works pretty well already.

PHP and RightScale

RightScale's sales team knows how to charm people, and when I say charm, I do not mean scam (just for clarity)! :-)

The demos are very impressive and the client show cases not any less. Where they really need to excel though are PHP-related demos because not everyone in the world runs Ruby on Rails yet. No, really — there's still us PHP people and also folks who run Python, Java and so on.

Coming from the sales pitch, I felt disappointed a little because a standard PHP setup on RightScale is as standard as you would think three years ago. mod_php, Apache2 and so on. The configuration itself is a downer as well, a lot of unnecessary settings and generally not so speedy choices. Then remember that neither CentOS nor Ubuntu are exactly up to date on packages and add another constraint to the mix — Ubuntu is on 8.04 which is one and half years in the past as I write this entry.

And even though I can relate to RighScale's position — in terms of that supporting customers with all kinds of different software is a burden and messy to say the least — I am also not a fan.

Scaling up

The largest advantage when you select a service provider such as RightScale is, that they turn raw EC2 instances into usable servers out of the box. So far example setting up a lamp stack yourself requires time, while it's still a trivial task for many. With RightScale, it's a matter of a couple clicks — select image, start, provide input variables and done.

Aside from enhanced AMIs RightScale's advantage is auto-scaling. Auto-scaling has been done a couple times before. There are more than one service provider which leverages EC2 and provides scalability on top. Then take a look at Scalr, which is open source, and then recently Amazon themselves added their own Elastic Load Balancer.

In general, I think auto-scaling is something everyone gets, and wants, but of course it's not that dead simple to implement. And especially when you move to a new platform, it's a perfect trade off to sacrifice a flexibility and money for a warm and fuzzy "works out of the box" feeling.


Continue reading "Thoughts on RightScale"

Magento: moving a store to another server

Tuesday, June 16. 2009

Frequently, you do client work and if you are fortunate enough, you can setup a development environment on your own server or your laptop (or whatever), tinker with the files, and templates, and so on — until it's all done.

And whenever you are done, it's time to move files.

Sounds easy? It sort of is!

Checklist

Here's a small check list of things to keep in mind when you move an installation.

  1. You may need to fix up the configuration file in order to adjust all database related settings. It's located in app/etc/local.xml.

  2. Certain directories will need to be made writable. Writable in this case means that the webserver has to be able to write into it. Since most of the standard PHP setups are with mod_php, this will be the user www, apache or www-run (in most cases). If you (or your provider) runs php-cgi, chances are that this is not necessary.

    You may have to ask an administrator to set this up. If the administrator is not available, you can always chmod 777 these, but for obvious reasons this may be risky in a shared environment.

    The following is a list of directories (and contents) which need to be made writable:

    • app/etc/
    • var/
    • media/

    And in case you want to use MagentoConnect, there are additional directories to make writable. Please see this blog post for more info.

  3. URLs, URLs, URLs. You were working on localhost all this time and now you think you can move it over just like that? Wrong!!! ;-) But all you need to do is edit the following keys in the core_config_data table:

    • web/unsecure/base_url
    • web/secure/base_url

That's all folks!

If you have any additions, please leave a comment!

Defined tags for this entry: , , ,

PHP performance III -- Running nginx

Sunday, May 31. 2009

Since part one and two were uber-successful, here's an update on my Zend Framework PHP performance situation. I've also had this post sitting around since beginning of May and I figured if I don't post it now, I never will.

Disclaimer: All numbers (aka pseudo benchmarks) were not taken on a full moon and are (of course) very relative to our server hardware (e.g. DELL 1950, 8 GB RAM) and environment. The application we run is Zend Framework-based and currently handles between 150,000 and 200,000 visitors per day.

Why switch at all?

In January of this year (2009), we started investigating the 2.2 branch of the Apache webserver. Because we used Apache 1.3 for forever, we never had the need to upgrade to Apache 2.0, or 2.2. After all, you're probably familiar with the don't fix it, if it's not broken-approach.

Late last year we ran into a couple (maybe) rather FreeBSD-specific issues with PHP and its opcode cache APC. I am by no means an expert on the entire situation, but from reading mailing lists and investigating on the server, this seemed to be expected behavior — in a nutshell: Apache 1.3 and a large opcode cache on a newer versions of FreeBSD (7) were bound to fail with larger amounts of traffic.

We tried bumping up a few settings (pv entries), but we just ran into the same issue again and again.

Because the architecture of Apache 2.2 and 1.3 is so different from one another (and upgrading to 2.2 was the proposed solution), I went on to explore this upgrade to Apache 2.2. And once I completed the switch to Apache 2.2, my issues went away.

So far, so good!

Performance?

On the performance side we experienced rather mediocre results.

While we benched that a static file could be read at around 300 requests per second (that is a pretty standard Apache 2.2 install, sans a couple unnecessary modules), PHP (mod_php) performed at a fraction of that, averaging between 20 and 23 requests per second.

Myth: Hardware is cheap(, developer time is not)!

Before some people yell at me for trying to optimize my web server, one needs to take the costs of scaling (to a 100 requests per seconds) into account.

One of those servers currently runs at 2,600.00 USD. The price tag adds up to an additional 10,400.00 USD in order to scale to a 100 (lousy) requests per seconds. Chances are of course, that the hardware is slightly less expensive since DELL gives great rebates — but the 8 GB of (server) RAM and the SAS disks by themselves are melting budgets away.

And on top of all hardware costs, you need add setup, maintenance and running costs (rack space, electricity) for an additional four servers — suddenly, developer time is cheap. ;-)

So what do we do? Nginx to the rescue?!


Continue reading "PHP performance III -- Running nginx"

RFC: CouchDB on FreeBSD

Friday, May 15. 2009

Thanks to Wesley, we recently managed to update CouchDB's FreeBSD port to the official 0.9.0 release.

My current TODO for the port includes:

  • a super-cool rc-script (currently, there is none)
  • automatic user setup/creation (couchdb)
  • patching of the install/source to use BSD-style directories for the database (e.g. /var/db/couchdb).

In regard to the the rc-script, I continued on a work in progress and committed an idea on Github. This work in process (couchdb) works out of the box. Just download the file, put it in /usr/local/etc/rc.d/, chmod +x couchdb and done. I also updated the CouchDB wiki page explaining how CouchDB on FreeBSD is setup (e.g. the options of the rc-script) and updated the instructions on user creation — something that I plan to roll into the couchdb port ASAP.

But before I continue on the port, I'd like to ask for feedback from people (Probably, you!) who use CouchDB on FreeBSD. For example, are you happy with the current options, or do you need a different set, etc.. If this work in progress is what you need, then that's valuable feedback as well.

Thanks!

Defined tags for this entry: , , , ,