node.js & socket.io fun

If you enjoyed this article, please leave a comment, rss subscribe to my RSS feed and/or follow me on Twitter. Thank you very much!

I recently had the extreme pleasure to use node.js and socket.io on a project. Here are some insights.

Objective

So the objective of the project was to read data from the _changes feed of our CouchDB cluster (hosted by Cloudant) and publish the data to a widget which we can use to display a constant stream of "what are people doing right now".

The core of the problem we faced was not just taking this stream of data and feeding it on to a page, but since we'll deploy this widget to our homepage we needed to make sure that no matter how many clients see it, the impact on the database cluster is minimal; for example, it would be a single client (or down the road up to three for failover) who actually read data from the cluster.

After shopping around for a technology to use, it became obvious that we needed some sort of abstraction because of how the different technologies (e.g. comet, websockets, ajax longpolling, ...) are implemented in different browsers. We decided to build this project on top of socket.io — pretty much for the same reasons most people go to jQuery, prototype or dojo these days.

Code

Here are the relevant pieces of the code we ended up with.

server.js

/**
 * @desc load required modules
 */
var http = require('http'),
    url = require('url'),
    fs = require('fs'),
    io = require('./vendor/socket.io');

var server = http.createServer(...);
server.listen(8080);
var socket = io.listen(server);

// couchdb connection data
var db = {};
db.user = '';
db.pass = '';
db.host = '';

if (db.user && db.pass) {
    var basicAuth = 'Basic ' + new Buffer(db.user + ':' + db.pass).toString('base64');
}

var headers = {};
if (typeof basicAuth != 'undefined') {
    headers["Authorization"] = basicAuth;
}
headers['Content-Type'] = 'application/json';

var requestUri = ...;

// request changes
var client  = http.createClient(80, db.host);
var changes = client.request('GET', requestUri, headers);
changes.end();

// handle response, it'll be chunked
changes.on('response', function (response) {
    // bail hard on non-200, something must be wrong
    if (response.statusCode != 200) {
        throw "response: " + response.statusCode;
    }

    var json;
    response.setEncoding('utf8');
    response.on('data', function (chunk) {
        try {
            json = JSON.parse(chunk); // let's not get crazy here
            socket.broadcast({'foo': json.doc.foo});
        } catch (Err) {
            //console.log("skip: " + sys.inspect(Err));
            return; // continue
        }
    });
});

Here's what happenes:

  1. Create a (node.js-based) HTTP server for the client to connect to.
  2. Use a standard (node.js) HTTP client (v0.26 API) to read the _changes feed.
  3. Try to parse chunks for valid json.
  4. Use socket.broadcast to send the result to the client-side.
  5. If for some reasons the chunks don't parse for valid json, move on.

Note: The reason why I get to be so 'laxed about the chunk parsing is that we have plenty of data coming in and omitting a couple of entries doesn't matter.

client.html

$.getScript('/socket.io/socket.io.js', function(){
    var socket = new io.Socket(null, {port: 8080, rememberTransport: false});
    socket.connect();
    socket.on('message', function(json) {
        $('#container').prepend("<li>" + json.foo + "</li>");
    });
});

The rundown:

  1. Load /socket.io/socket.io.js asynchronously — don't ask why or how, but socket.io probably adds a route somewhere to serve this file. As for the code it loads, it seems to be vendor/socket.io/support/socket.io-client/socket.io.js.
  2. Create a socket object — port designates the same port I had in server.js.
  3. Whenever a message is received, push it on the list (aka id=container).

Note: The above code is example code only. It doesn't have any bells and whistles, e.g. a try/catch wouldn't hurt, etc..

Want more?

I'll spare you more of the official socket.io example code, as there is not only one outdated website and github repository, but already 30 other blogs out there discussing the same basic chat example. While the example is pretty basic, it lists the relevant events you might need in case there's more interaction between the code on the server and the client.

I may seem ungrateful for what socket.io does, but the horror or misery we face is that while socket.io provides a great value (all-around abstraction for client- and server-side communication), it really does not excel at documentation.

In its current state, the node.js community is probably mostly developers who get a kick out of writing cool apps. While general QA, testing and documentation have become a pretty standard thing in other programming communities, this has not happened in the node.js ecosystem yet.

As a developer myself, I don't feel embraced at all here. When I dive into the 20 files and 100s of lines of code, I'm hoping that beyond what I got now, we will actually never run into any larger problems which will require further debugging.

And if we do, we probably will have to switch to something else. (Which I think would be a slightly less cooler implementation using comet with an iframe, etc..)

Takeaways

  1. In case you're looking into node.js for a customer, be aware that its APIs still change (fundamentally) and there are no guarantees for BC. And since the libraries for node.js tend to follow the patterns of the core, it's best to keep a local copy of node.js code and all libraries involved.

  2. socket.io is extremely powerful but it clearly lacks documentation (and updated examples).

  3. Working with CouchDB and the node.js http client can be a tedious task. E.g. Basic Authorization by hand along with the response and error handling etc.. For most cases I found Restler to be an excellent library (Thanks again, Stefano!). Restler may very well be the best HTTP client library for node.js out there. The only thing Restler currently does not do well is work with _changes or streams in general because restler will always wait for a request to complete before it sends you data. Not suitable if the stream is never ending. :-)

  4. All my messed up JavaScript code, now runs on the server. Booya!

Fin

That's all for now. I'm currently working on some chef-recipes for the application servers who'll run this code for us. I'll share more next time.

| More