SQL MAX() and GROUP BY for CouchDB
While re-writing a couple SQL statements into CouchDB we got stuck when we wanted to do a SELECT MAX(...), id ... GROUP BY id
in CouchDB.
MySQL
Imagine the following SQL table with data:
mysql> SHOW FIELDS FROM deploys; +-------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+-------------+------+-----+---------+-------+ | project | varchar(10) | NO | | NULL | | | deploy_time | datetime | NO | | NULL | | +-------------+-------------+------+-----+---------+-------+ 2 rows in set (0.01 sec)
In order to get the latest deploy for each project, I'd issue:
mysql> SELECT MAX(deploy_time), project FROM deploys GROUP BY project; +---------------------+----------+ | MAX(deploy_time) | project | +---------------------+----------+ | 2013-10-04 22:01:26 | project1 | | 2013-10-04 22:02:17 | project2 | | 2013-10-04 22:02:20 | project3 | +---------------------+----------+ 3 rows in set (0.00 sec)
Simple. But what do you do in CouchDB?
CouchDB
My documents look like this:
{ "_id": "hash", "project": "github-user/repo/branch", "deploy_time": { "date": "2013-10-04 22:02:20", /* ... */ }, /* ... */ }
So, after more than a couple hours trying to wrap our heads around map-reduce in CouchDB, it's working.
Here's the view's map function:
function (doc) { if (doc.project) { emit(doc.project, doc.deploy_time.date); } }
This produces nice key value pairs — in fact, multiple — for each project.
And because the map-function returns multiple, we need to reduce our set.
So here is the reduce:
function (doc, values, rereduce) { var max_time = 0, max_value, time_parsed; values.forEach(function(deploy_date) { time_parsed = Date.parse(deploy_date.replace(/ /gi,'T')); if (max_time < time_parsed) { max_time = time_parsed; max_value = deploy_date; } }); return max_value; }
The little .replace(/ /gi,'T')
took especially long to figure out. Special thanks to Cloudant's señor JS-date-engineer Adam Kocoloski for helping out. ;-)
Step by step
- iterate over values
- fix each value (add the "T") to make Spidermonkey comply
- parse, and compare
- return the "latest" in the end
A note of advice: To save yourself some trouble, install a local copy of Spidermonkey and test your view code in there and not in your browser.
Open the view in your browser: http://localhost/db/_design/deploys/_view/last_deploys?group=true
{ "rows":[ { "key":"testowner/testname/testbranch", "value":"2013-09-13 11:41:03" }, { "key":"testowner/testname/testbranch2", "value":"2013-09-12 16:48:39" } ] }
Yay, works!
FIN
That's all for today.
Trackbacks
The author does not allow comments to this entry
Comments