March is a pretty exciting month for PHP people in Berlin, and around — so here is what we got.
Berlin PHP user group
- Topic: Magento Commerce (webservices, integration, skinning, ...), and a little bit of Zend-MVC
- When: 8:30 PM, 4th March, 2009
- Where: Z-Bar, Bergstr. 9, Berlin-Mitte
... presented by Manuel Blechschmidt, and of course for the price of nothing. :-)
Berlin Flex user group
The Berlin Flex user group invited Adobe Platform Evangelist Mihai Corlan to speak about Flex development with a PHP backend, using Zend_Amf and probably Zend Studio.
... more details are on their homepage, please RSVP.
Over the last two weeks, I had been working on an import from a raw text file of JSON data (~20 GB) into CouchDB.
Due to the fuzzyness of the data, I decided to not use _bulk_docs
to import it because if a single document inside a bulk request would fail (e.g. duplicate _id
), I would have to go through the request one by one to figure out what went wrong.
So I came to the conclusion that while bulk writing offers speed, to decide on less complexity at the expense of speed. However, it should be noted that if you know your data, I would always suggest you use bulk.
The import ran for a week and totalled to 20,125,604 documents and a database file of 79.4 GB.
Yesterday, I started _compact
on the database and documented the numbers on Twitter, and double-wow'd at the result: the database had shrunk to only 22.8 GB.
All in all the operation ran for roughly 23 hours during which I was still able to read from my database, and possibly also write to it — though excessive write during a compact should be avoided.
I also found it remarkable that the size of the database in comparison to the raw textfile is almost 1:1.
The overall document-oriented approach of CouchDB and the free-form way of saving data are probably the things that appeal to most of us when we first read about this new database.
Most of the people that were introduced to CouchDB so far quickly made the decision to use it in production despite the early beta'ish state of the project. We all hate normalization, we all want a faster and responsive database, and some of us want multiple nodes and inter-node replication. CouchDB manages to sell all these quite well. And of course, there are plenty of other reasons.