Real time online activity monitor example with node.js and WebSocket

Makoto Inoue

This post was originally published on the New Bamboo blog, before New Bamboo joined thoughtbot in London.


Update 23rd March 2010: Today we’re announcing Pusher, a new realtime client push service. Sign up for the beta now.

Here at New Bamboo, we specialize on Ruby On Rails web development. However, we started talking more and more about exciting things happening around HTML5 and javascript during morning standup (where we talk about all the cool things we are working/have discovered), lunch time and our company hack day.

The latest hot topic is node.js. Some of us went to Full Frontal Javascript Conference and were very excited by the power and potential of node.js which Simon Willison (Django core team) introduced there. Simon describes node.js as “A toolkit for writing extremely high performance non-blocking event driven network servers in JavaScript”. I highly recommend you to read his blog post, “Node.js is genuinely exciting”.

After reading Simon’s blog, watching Ryan Dahl’s talk video, and witnessing “A cambrian explosion of lightweight web frameworks based on top of Node” (again quotes from Simon), I started thinking what would be an interesting app to write by myself.

Many people either created web framework, or chat apps, but I wanted to create something you wouldn’t normally do as web app, and this is what I came up with.

I’ve learned a lot while building this, and would like to share them step by step (or grab the code if you are impatient to read my entire post.)

Step 1: Simple Text Streaming

In my original idea, I was thinking about making an simple app to tail errorlog and stream the output directly to web real time.

The code is something like this:

var sys = require('sys');
var filename = process.ARGV[2];

if (!filename)
  return sys.puts("Usage: node watcher.js filename");

// Look at http://nodejs.org/api.html#_child_processes for detail.
var tail = process.createChildProcess("tail", ["-f", filename]);
sys.puts("start tailing");

tail.addListener("output", function (data) {
  sys.puts(data);
});

// From nodejs.org/jsconf.pdf slide 56
var http = require("http");
http.createServer(function(req,res){
  res.sendHeader(200,{"Content-Type": "text/plain"});
  tail.addListener("output", function (data) {
    res.sendBody(data);
  });
}).listen(8000);

I will explain the important bits.

var tail = process.createChildProcess("tail", ["-f", filename]);

Here, I created a child process to watch the tail command. This will happen only once when you started node server.

tail.addListener("output", function (data) {
  sys.puts(data);
});

Then, I created a listener to send the output of the tail command to http body as new message is written to errorlog. This will keep the connection open per http request.

To keep watching a process activity continuously is not an easy thing to do in a normal web app, but it’s almost effortless thanks to node.js non-blocking architecture, and javascript’s event driven pattern.

Try running the script by specifying errorlog you want to watch like below:

node tail.js development.log

Open http://localhost:8000. It will show you error messages as they are written to the log file.

However, I wanted to do more, like drawing usage graph and stuff. It doesn’t make sense for me to do all drawing at server side and send the image as it comes, so now it’s time to write some code at client side.

Step 2: Ajax polling

Initially, I tried to do Ajax long polling as you normally do with online chat app, but it did not work as I expected. (You can check out my failed attempt here if you are curious).

I think there are 3 problems applying long polling to a scenario like my app.

  • Ajax success callback will fire when you receive all the response. It will not fire off if your response never finishes.
  • Even if you put timeout every 30 sec, the ajax success callback needs to wait for 30 sec, which is far from real time update.
  • If you shorten the timeout to 1 sec, you will receive constant result, but it’s far from real time. In addition, continuous request will hammer your server, and it won’t guarantee that you get result from server every one sec. if there are network latency, you will lose the data during the latency.

Step 3: WebSocket

I need some solution to establish connection between client and server and do something as data arrives to client. Luckily, I had a chance to attend HTML5 communication workshop, and got introduced to one of HTML5 feature called “Web Sockets”.

Here is the explanation of HTML5 Websocket from Kaazing, the company which provided the workshop.

The HTML 5 specification introduces the Web Socket interface, which defines a full-duplex communications channel that operates over a single socket and is exposed via a JavaScript interface in HTML 5 compliant browsers. The bi-directional capabilities of Comet and Ajax, unlike Web Sockets, are not native to the browser, and rely on maintaining two connections-one for upstream and one for downstream – in order to stream data to and from the browser. Note, that to support streaming over HTTP, Comet requires a long-lived connection, which is often severed by proxies and firewalls. In addition, few Comet solutions support streaming over HTTP, employing a less performant technique called “long-polling” instead.

Web Sockets account for network hazards such as proxies and firewalls, making streaming possible over any connection, and with the ability to support upstream and downstream communications over a single connection, Web Sockets place less burden on your servers, allowing existing machines to support more than twice the number of concurrent connections. Simple is Better

(What is an HTML5 WebSocket)

I also learnt during the workshop about drastic reduction of network traffic in use of WebSocket.

During making connection with WebSocket, client and server exchange data per frame which is 2 bytes each, compared to 8 kilo bytes of http header when you do continuous polling. Here is the comparison of 2 scenarios:

Case 1: 10,000 clients polling every second:
* Network throughput is (871 x 10,000) = 8,710,000 bytes = 69,680,000 bits per second (66 Mbps)

Case 2: 10,000 frames every second:
* Network throughput is (2 x 10,000)/1 = 20,000 bytes = 160,000 bits per second (156 Kbps)

Apparently this turned on Google….

Reducing kilobytes of data to 2 bytes…and reducing latency from 150ms to 50ms is far more than marginal. In fact, these two factors alone are enough to make WebSocket seriously interesting to Google.

(From Ian Hickson (Google, HTML5 spec lead))

Within WebSocket supported browser (at this moment Chromium, and OSX version of Chrome only, but don’t go away yet. There is a solution for other browsers, which I will explain later), all you have to do is something like this (the example from Kaazing’s website).

var myWebSocket = new WebSocket("ws://www.websocket.org");

myWebSocket.onopen = function(evt) { alert("Connection open ..."); };
myWebSocket.onmessage = function(evt) { alert( "Received Message:  "  +  evt.data); };
myWebSocket.onclose = function(evt) { alert("Connection closed."); };

Step 4: WebSocket meets node.js

I was curious if I can serve WebSocket from node.js and found that Alexander Teinum already did hard work for me:

With this, all you have to do at server side is to define what server returns while the connection is established. Here is the code example from Alexander’s blog to echo whatever you sent from browser. You have to define it under “resources” directory.

exports.handleData = function(connection, data) {
    connection.send('\u0000' + data + '\uffff');
}

And here are the snippets of what I did for my app.

Server side

String.prototype.trim = function() {
  return this.replace(/^\s+|\s+$/g,"");
}

var sys = require('sys');
var child_process = process.createChildProcess("iostat", ["-w 1"]);

exports.handleData = function(connection, data) {

  connection.addListener('eof', function(data) {
   child_process.removeListener("output", output)
  })

  var output = function (output_data) {
    sys.puts(output_data);
    var output_array = output_data.trim().split(/\s+/);
    for (var i=0; i < output_array.length; i++) {
      output_array[i] = parseFloat( output_array[i]);
    };
    output_hash = {
      date:new Date(),
      cpu:{
        us:output_array[3],
        sy:output_array[4],
        id:output_array[5]
      }
    }
    connection.send('\u0000' + JSON.stringify(output_hash) + '\uffff');
  }
  child_process.addListener("output", output);
}

At this snippet, I added child process to watch the result of iostat (I just changed my mind by that time to show iostat, rather than tail). Then, I added an listener inside “handleData” function to parse the result of iodata, construct the result set into hash, then send the response back in JSON format.

One thing worth explaining is this line:

connection.addListener('eof', function(data) {
  child_process.removeListener("output", output)
})

I added a logic to close the listener when the client browser closes connection (by closing the tab, or call “connection.close”)

Without this logic, node.js will blow up every time users close their connection with following error message.

[websocket-server-node.js (master)]$ node server.js
~/work/sample/websocket-server-node.js/resources/loop.js:5
connection.send(\u0000′ + counter + ‘\uffff’);
^
Error: Socket is not open for writing
at Timer. (/work/sample/websocket-server-node.js/resources/loop.js:5:15)

Client side

webSocket = new WebSocket('ws://localhost:8000/iostat');

webSocket.onopen = function() {
    out.html('Connection opened.<br>');
};

webSocket.onmessage = function(event) {
  stats = event.data;
  data = JSON.parse(stats);

  // Adding the pursed result into array.
  stats_array.unshift(data);
  stats_array.pop();

  for (var i=0; i < stats_array.length; i++) {
    if (stats_array[i]) {
      var cpu_total = stats_array[i].cpu.sy + stats_array[i].cpu.us
      $('#date_'  + i).html(stats_array[i].date);

    // More Logic here to add the data into table continues.
    // ...
    //

    };
    var total_array = [];
    for (var j=0; j < stats_array.length; j++) {
      if (stats_array[j]) {
        total_array[j] = [stats_array[j].cpu.us, stats_array[j].cpu.sy]
      }else{
        total_array[j] = [0,0]
      };
    };
    // Draw charts.
    drawCharts(total_array.reverse());
  };
};

webSocket.onclose = function() {
  out.html('Connection closed.<br>');
};

The above client side does not really have anything special. I added the incoming data into array and injected into result table and draw charts based on the data.

The entire source is here.

Step 5: What else can we do with WebSocket?

In this example, I showed how to stream the output of iostat in real time, but you should be able to stream other stuff , such as XMPP(Extensible Messaging and Presence Protocol, used for Google Talk and Google Wave), and STOMP(Streaming Text Orientated Messaging Protocol, which ActiveMQ uses as protocol). To do them on node.js, you need to implement a logic to handle these protocols on top of TCP. I haven’t seen any libraries, but hopefully someone implement them… It’s actually possible to transfer binary data, such as video and audio, but it may not be practical, as Javascript at client side has to encode them.

Step 6: Is this WebSocket future thing?

Many features of HTML5 (Canvas, geo location, offline storage, etc) are already implemented in many of existing browsers, but WebSocket is still a cutting edge. You need to keep an eye on which browser it will start implementing the feature at somewhere like StackOverflow.

Does this mean you can forget about WebSocket for another few years? If you are seriously considering WebSocket in enterprise environment (but not with node.js), you might want to consider Kaazing Open Gateway. They are doing pretty good job by automatically detecting Websocket support of your browser, and switch to alternative solutions(Silverlight, Flash Socket and so on). They also support XMPP and STOMP, so if you are interested in displaying real time stock feed, there is already an solution. In fact, their demo page has some interesting real time stock feed as well as resource monitoring, just like I did.

However, I am probably not enterprise enough for their license, and I am more interested in using it with node.js. I found some interesting solution by Hiroshi Ichikawa:

  • web-socket-js - HTML5 Web Socket implementation powered by Flash

He basically wrote a wrapper to convert your HTML5 WebSocket into Flash Socket.

If you include his javascript library like below

<script type="text/javascript" src="./swfobject.js"></script>
<script type="text/javascript" src="./FABridge.js"></script>
<script type="text/javascript" src="./web_socket.js"></script>

<script>
  // This is for websocket-js
  WebSocket.__swfLocation = "./WebSocketMain.swf";
</script>

And add flash policy file on your machine, or tweak websocket-server-node.js to return policy file (alternatively, if you serve both client side and server side code from same domain/port, you don’t need to worry about cross domain policy at all.), you should be able to use WebSocket on other browsers, as long as flash is supported.

function doHandshake() {
  // some code to do handshake
  // ...
  // ...

  // Return flash policy file for web-socket-js
  // http://github.com/gimite/web-socket-js
  if(request[0].match(/policy-file-request/)){
    sys.puts('requesting flash policy file');

    policy_xml =
    '<?xml version="1.0"?>' +
    '<!DOCTYPE cross-domain-policy SYSTEM ' +
    'ww.macromedia.com/xml/dtds/cross-domain-policy.dtd">' +
    '<cross-domain-policy>' +
    "<allow-access-from domain='*' to-ports='*'/>" +
    '</cross-domain-policy>'
    connection.send(policy_xml);
    connection.close();
  }
}

Here is the proof.

There are some other things going on apart from what I covered above.

There are new things coming up almost every day. The best way to stay tuned to node.js/websocket is just keep watching twitter search feed

Summary

Here is the recap of what I learnt.

  • It is extremely easy to write streaming logic in node.js
  • However, you need a solution at client side to handle the incoming data real time.
  • Websocket enables you to have continous communication in significantly less network overhead compared to existing solution.
  • Websocket is not available in most browsers yet, but there are workarounds.

I hope I was able to share my experiment well and looking forward to hearing about exciting apps/libraries/projects related to the both topics. Also, my understanding about them are less than perfect, so welcome any feedback / corrections / suggestions.

NOTE: There is more story about real time web here.