SQL MAX() and GROUP BY for CouchDB

If you enjoyed this article, please leave a comment, rss subscribe to my RSS feed and/or follow me on Twitter. Thank you very much!

While re-writing a couple SQL statements into CouchDB we got stuck when we wanted to do a SELECT MAX(...), id ... GROUP BY id in CouchDB.

MySQL

Imagine the following SQL table with data:

mysql> SHOW FIELDS FROM deploys;
+-------------+-------------+------+-----+---------+-------+
| Field       | Type        | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| project     | varchar(10) | NO   |     | NULL    |       |
| deploy_time | datetime    | NO   |     | NULL    |       |
+-------------+-------------+------+-----+---------+-------+
2 rows in set (0.01 sec)

In order to get the latest deploy for each project, I'd issue:

mysql> SELECT MAX(deploy_time), project FROM deploys GROUP BY project;
+---------------------+----------+
| MAX(deploy_time)    | project  |
+---------------------+----------+
| 2013-10-04 22:01:26 | project1 |
| 2013-10-04 22:02:17 | project2 |
| 2013-10-04 22:02:20 | project3 |
+---------------------+----------+
3 rows in set (0.00 sec)

Simple. But what do you do in CouchDB?

CouchDB

My documents look like this:

{
  "_id": "hash",
  "project": "github-user/repo/branch",
  "deploy_time": {
     "date": "2013-10-04 22:02:20",
     /* ... */
  },
  /* ... */
}

So, after more than a couple hours trying to wrap our heads around map-reduce in CouchDB, it's working.

Here's the view's map function:

function (doc) {
  if (doc.project) {
    emit(doc.project, doc.deploy_time.date);
  }
}

This produces nice key value pairs — in fact, multiple — for each project.

And because the map-function returns multiple, we need to reduce our set.

So here is the reduce:

function (doc, values, rereduce) {
  var max_time = 0, max_value, time_parsed;
  values.forEach(function(deploy_date) {
    time_parsed = Date.parse(deploy_date.replace(/ /gi,'T'));
    if (max_time < time_parsed) {
      max_time = time_parsed;
      max_value = deploy_date;
    }
  });
  return max_value;
}

The little .replace(/ /gi,'T') took especially long to figure out. Special thanks to Cloudant's seƱor JS-date-engineer Adam Kocoloski for helping out. ;-)

Step by step

  • iterate over values
  • fix each value (add the "T") to make Spidermonkey comply
  • parse, and compare
  • return the "latest" in the end

A note of advice: To save yourself some trouble, install a local copy of Spidermonkey and test your view code in there and not in your browser.

Open the view in your browser: http://localhost/db/_design/deploys/_view/last_deploys?group=true

{
  "rows":[
    {
      "key":"testowner/testname/testbranch",
      "value":"2013-09-13 11:41:03"
    },
    {
      "key":"testowner/testname/testbranch2",
      "value":"2013-09-12 16:48:39"
    }
  ]
}

Yay, works!

FIN

That's all for today.

| More