r.va.gg

NodeSchool comes to Australia

NodeSchool has its genesis ultimately at NodeConf 2013 where @substack introduced us to stream-adevnture. I took the concept home to CampJS and wrote learnyounode for the introduction to Node.js workshop I was to run. As part of the process I extracted a package called workshopper to do the work of making a terminal workshop experience. Most of the logic originally came from stream-adventure. A short time after CampJS, I created levelmeup for NodeConf.eu and suddenly we had a selection of what have come to be known as "workshoppers". At NodeConf.eu, @brianloveswords suggested the NodeSchool concept and registered the domain, @substack provided the artwork and the ball was rolling.

Today, workshopper is depended on by 22 packages in npm, most of which are workshoppers that you can install and use to learn aspects of Node.js or JavaScript in general. The curated list of usable workshoppers is maintained at nodeschool.io.

learnyounode itself is now being downloaded at a rate of roughly 200 per day. That's at least 200 more people each day wanting to learn how to do Node.js.

learnyounode downloads

I can't recall exactly how "NodeSchool IRL" events started but it was probably @maxogden who has been responsible for a large number of these events. There have now been over 30 of these events around the world and the momentum is only picking up steam. The beauty of this format is that it's low-cost and low-effort to make it happen. All you need is a venue where nerds can show up with their computers and some basic guidance. There have even been a few events that were without experienced Node.js mentors but that's no great barrier as the lessons are largely self-guided and work particularly well when pairs or groups of people work together on solutions.

NodeSchool comes to Australia

It's surprising that so far, given all of the NodeSchool activity around the world, that we haven't had a single NodeSchool event in Australia. CampJS had learnyounode last year and this year there were 3 brand new workshoppers introduced there, so it's the closest thing we've had.

Next weekend, on the 21st of June, we are attempting a coordinated Australian NodeSchool event. At the moment, that coordination amounts to having events hosted in Sydney, Melbourne and Hobart, unfortunately the timing has been difficult for Brisbane and we haven't managed to bring anyone else out of the woodwork. But, we will be attempting to do this regularly, plus we'd like to encourage meet-up hosts to use the format now and again with their groups.

NodeSchool in Sydney

I'll be at NodeSchool in Sydney next weekend. It will be proudly hosted by NICTA who have a space for up to 60 people. NICTA are currently doing some interesting work with WebRTC, you should catch up with @DamonOehlman if this is something you're interested in. Tabcorp will also be a major sponsor of the event. Tabcorp have been building a new digital team with the back-end almost entirely in Node.js. They are also doing a great job engaging and contributing to existing and new open source projects. They are also hiring so be sure to catch up with @romainprieto if you're doing PHP, Java or some other abomination and want to be doing Node!

Thanks to the sponsorship, we'll be able to provide some catering for the event. Currently we're looking at providing lunch but we may be expanding that to providing some breakfast treats. We'll also be providing refreshments for everyone attending throughout the day.

Start time is 9.30am, end is 4pm. The plan is to spend the first half of the day doing introductory Node.js which will mainly mean working through learnyounode. The second half of the day will be less structured and we'll encourage attendees to work on other workshoppers that they find interesting. Thankfully we have some amazing Node.js programmers in Sydney and they'll be available as mentors.

We are currently selling tickets for $5, the money will contribute towards the event, there is no profit involved here. We don't need to charge for the event but given the generally dismal turnout for tech meet-ups that are free we feel that providing a small commitment barrier will help us maximise the use of the space we have available. If the money is a barrier for you please contact us! We don't want anyone to miss out. Also, we have special "mentor" tickets available for experienced Node.js programmers who are able to assist. If you think you fit into this category please contact us also.

You can sign up for Sydney NodeSchool at https://ti.to/nodeschool/sydney-june-2014/. If you are tempted, don't sit on the fence because spots are limited and as of writing the tickets are almost 1/2 gone.

NodeSchool in Melbourne

NodeSchool in Melbourne is being supported by ThoughtWorks who have been doing Node in Australia for a while now. If you're interested in their services or want to chat about employment opportunities you should catch up with Liauw Fendy.

@sidorares is putting in a large amount of the legwork in to Melbourne's event. He was a major contributor to the original learnyounode and is a huge asset to the Melbourne Node community. Along with Andrey, Melbourne has large number of expert Node.js hackers, many of who will be available as mentors. This will be a treat for Melbournians so this is not something you should miss out on if you are in town! Potential mentors should contact Andrey.

You can sign up for Melbourne NodeSchool at https://ti.to/nodeschool/melbourne-june-2014/.

NodeSchool in Hobart

Hobart is lucky to have @joshgillies, a local tech-community organiser responsible for many Tasmanian web and JavaScript events. The event is being hosted at The Typewriter Factory, a business workspace that Josh helps run. Sponsorship is being provided by ACS who will be helping support the venue and also provide some catering.

You can sign up for Hobart NodeSchool at https://ti.to/nodeschool/hobart-june-2014/.

Why I don't use Node's core 'stream' module

This article was originally offered to nearForm for publishing and appeared for some time on their blog from early 2014 (at this URL: http://www.nearform.com/nodecrunch/dont-use-nodes-core-stream-module). It has since been deleted. I'd rather not speculate about the reasons for the deletion but I believe the article contains a very important core message so I'm now republishing it here.

TL;DR

The "readable-stream" package available in npm is a mirror of the Streams2 and Streams3 implementations in Node-core. You can guarantee a stable streams base, regardless of what version of Node you are using, if you only use "readable-stream".

The good 'ol days

Prior to Node 0.10, implementing a stream meant extending the core Stream object. This object was simply an EventEmitter that added a special pipe() method to do the streaming magic.

Implementing a stream usually started with something like this:

var Stream = require('stream').Stream
var util = require('util')

function MyStream () {
  Stream.call(this)
}

util.inherits(MyStream, Stream)

// stream logic, implemented however you want

If you ever had to write a non-trivial stream implementation for pre-Node 0.10 without using a helper library (such as through), you know what a nightmare the state-management it can be. The actual implementation of a custom stream is a lot more than just the above code.

Welcome to Node 0.10

Thankfully, Streams2 came along with a brand new set of base Stream implementations that do a whole lot more than pipe(). The biggest win for stream implementers comes from the fact that state-management is almost entirely taken care of for you. You simply need to provide concrete implementations of some abstract methods to make a fully functional stream, even for non-trivial workloads.

Implementing a stream now looks something like this:

var Readable = require('stream').Readable
// `Stream` is still provided for backward-compatibility
// Use `Writable`, `Duplex` and `Transform` where required
var util = require('util')

function MyStream () {
  Readable.call(this, { /* options, maybe `objectMode:true` */ })
}

util.inherits(MyStream, Readable)

// stream logic, implemented mainly by providing concrete method implementations:

MyStream.prototype._read = function (size) {
  // ... 
}

State-management is handled by the base-object and you interact with internal methods, such as this.push(chunk) in the case of a Readable stream.

While the internal streams implementations are an order-of-magnitude more complex than the previous core-streams implementation, most of it is there to make life an order-of-magnitude easier for those of us implementing custom streams. Yay!

Backward-compatibility

When every new major stable release of Node occurs, anyone releasing public packages in npm has to make a decision about which versions of Node they support. As a general rule, the authors of the most popular packages in npm will support the current stable version of Node and the previous stable release.

Streams2 was designed with backwards-compatibility in mind. Streams using require('stream').Stream as a base will still mostly work as you'd expect and they will also work when piped to streams that extend the other classes. Streams2 streams won't work like classic EventEmitter objects when you pipe them together, as old-style streams do. But when you pipe a Streams2 stream and an old-style EventEmitter-based stream together, Streams2 will fall-back to "compatibility-mode" and operate in a backward-compatible way.

So Streams2 are great and mostly backward-compatible (aside from some tricky edge cases). But what about when you want to implement Streams2 and run on Node 0.8? And what about open source packages in npm that want to still offer Node 0.8 compatibility while embracing the new Streams2-goodness?

"readable-stream" to the rescue

During the 0.9 development phase, prior to the 0.10 release, Isaac developed the new Streams2 implementation in a package that was released in npm and usable on older versions of Node. The readable-stream package is essentially a mirror of the streams implementation of Node-core but is available in npm. This is a pattern we will hopefully be seeing more of as we march towards Node 1.0. Already there is a core-util-is package that makes available the shiny new is type-checking functions in the 0.11 core 'util' package.

readable-stream gives us the ability to use Streams2 on versions of Node that don't even have Streams2 in core. So a common pattern for supporting older versions of Node while still being able to hop on the Streams2-bandwagon starts off something like this, assuming you have "readable-stream" as a dependency:

var Readable = require('stream').Readable || require('readable-stream').Readable

This works because there is no Readable object on the core 'stream' package in 0.8 and prior, so if you are running on an older version of Node it skips straight to the "readable-stream" package to get the required implementation.

Streams3: a new flavour

The readable-stream package is still being used to track the changes to streams coming in 0.12. The upcoming Streams3 implementation is more of a tweak than a major change. It contains an attempt to make "compatibility mode" more of a first-class citizen of the API and also some improvements to pause/resume behaviour.

Like Streams2, the aim with Streams3 is for backward (and forward) compatibility but there are limits to what can be achieved on this front.

While this new streams implementation will likely be an improvement over the current Streams2 implementation, it is part of the unstable development branch of Node and is so far not without its edge cases which can break code designed against the pure 0.10 versions of Streams2.

What is your base implementation?

Looking back at the code used to fetch the base Streams2 implementation for building custom streams, let's consider what we're actually getting with different versions of Node:

var Readable = require('stream').Readable || require('readable-stream').Readable
  • Node 0.8 and prior: we get whatever is provided by the readable-stream package in our dependencies.
  • Node 0.10: we get the particular version of Streams2 that comes with the version of Node we're using.
  • Node 0.11: we get the particular version of Streams3 that comes with the version of Node we're using.

This may not be interesting if you have full control over all deployments of your custom stream implementations and which version(s) of Node they will be used on. But it can cause some problems in the case of open source libraries distributed via npm with users still stuck on 0.8 (for some, the upgrade path is not an easy one for various reasons), 0.10 and even people trying out some of the new Node and V8 features available in 0.11.

What you end up with is a very unstable base upon which to build your streams implementation. This is particularly acute since the vast bulk of the code used to construct the stream logic is coming from either Node-core or the readable-stream package. Any bugs fixed in later Node 0.10 releases will obviously still be present for people still stuck on earlier 0.10 releases even if the readable-stream dependency has the fixed version.

Then, when your streams code is run on Node 0.11, suddenly it's a Streams3 stream which has slightly different behaviour to what most of your users are experiencing.

One of the ways these subtle differences are exposed is in bug reports. Users may report a bug that only occurs on their particular combination of core-streams and readable-stream and it may not be obvious that the problem is related to base-stream implementation edge-cases they are stumbling upon; wasting time for everyone.

And what about stability? The fragmentation introduced by all of the possible combinations means that your otherwise stable library is having instability foisted upon it from the outside. This is one of the costs of relying on a featureful standard-library (core) within a rapidly developing, pre-v1 platform. But we can do something about it by taking control of the exact version of the base streams objects we want to extend regardless of what is bundled in the version of Node being used. readable-stream to the rescue!

Taking control

To control exactly what code your streams implementation is building on, simply pin the version of readable-stream and use only it, avoiding require('stream') completely. Then you get to make the choice when to upgrade to Streams3, even if that's some time after Node 0.12.

readable-stream comes in two major versions, v1.0.x and v1.1.x. The former tracks the Streams2 implementation in Node 0.10, including bug-fixes and minor improvements as they are added. The latter tracks Streams3 as it develops in Node 0.11; we may see a v1.2.x branch for Node 0.12.

Any library worth using should be following the basics of semver minor and patch versions (the merits and finer points of major versioning are still something worth debating). readable-stream gives you proper patch-level versioning so if you pin to "~1.0.0" you'll get the latest Node 0.10 Streams2 implementation, including any fixes and minor non-breaking improvements. The patch-level version of 1.0.x and 1.1.x should mirror the patch-level versions of Node core releases as we proceed.

When you're ready to start using Streams3 you can pin to "~1.1.0", but you should hold off until much closer to Node 0.12, if not after its formal release.

Small core FTW!

Being able to control precisely the versions of dependencies your code uses reduces the scope for bugs introduced by version incompatibilities or new and unproven implementations.

When we rely on a bulky standard-library to build our libraries and applications, we're relying on a shifting sand that we have little control over. This is particularly a problem for open source libraries whose users have legitimate (and sometimes not-so-legitimate) reasons for using versions that you'd rather not have to support.

Streams2 is a powerful abstraction, but the implementation is far from simple. The Streams2 code is some of the most complex JavaScript you'll find in Node core. Unless you want to have a detailed understanding of how they work and be able to track the changes as they develop, you should pin your Streams2 dependency in the same way as you pin all your other dependencies. Opt for readable-stream over what Node-core offers:

{
  "name": "mystream",
  ...
  "dependencies": {
    "readable-stream": "~1.0.0"
  }
}
var Readable = require('readable-stream').Readable
var util = require('util')

function MyStream () {
  Readable.call(this)
}

util.inherits(MyStream, Readable)

MyStream.prototype._read = function (size) {
  // ... 
}

Addendum: "through2"

If the boilerplate of the Streams2 base objects ("classes") is too much for you or triggers some past-life Java PTSD, you can just opt for the "through2" package in npm to get the job done.

through2 is based on Dominic Tarr's through but is built for Streams2, whereas "through" is a pure Streams1 style. The API isn't quite the same but the flexibility and simplicity is.

through2 gives you a DuplexStream as a base to implement any kind of stream you like, be it as purely readable, purely writable or a fully duplex stream. In fact, you can even use through2 to implement a PassThrough stream by not providing an implementation!

From the examples:

var through2 = require('through2')

fs.createReadStream('ex.txt')
  .pipe(through2(function (chunk, enc, callback) {

    for (var i = 0; i < chunk.length; i++)
      if (chunk[i] == 97)
        chunk[i] = 122 // swap 'a' for 'z'

    this.push(chunk)

    callback()

   }))
  .pipe(fs.createWriteStream('out.txt'))

Or an object stream:

fs.createReadStream('data.csv')
  .pipe(csv2())
  .pipe(through2.obj(function (chunk, enc, callback) {

    var data = {
        name    : chunk[0]
      , address : chunk[3]
      , phone   : chunk[10]
    }

    this.push(data)

    callback()

  }))
  .on('data', function (data) {
    all.push(data)
  })
  .on('end', function () {
    doSomethingSpecial(all)
  })

Testing code against many Node versions with Docker

I haven't found reason to play with Docker until now, but I've finally came up with an excellent use-case.

NAN is a project that helps build native Node.js add-ons while maintaining compatibility with Node and V8 from Node versions 0.8 onwards. V8 is currently undergoing major internal changes which is making add-on development very difficult; NAN's purpose is to abstract that pain. Instead of having to manage the difficulties of keeping your code compatible across Node/V8 versions, NAN does it for you. But this means that we have to be sure to keep NAN tested and compatible with all of the versions it claims to support.

Travis can help a little with this. It's possible to use nvm to test across different versions of Node, we've tried this with NAN (see the .travis.yml). Ideally you'd have better choice of Node versions, but Travis have had some difficulty keeping up. Also, npm bugs make it difficult, with a high failure rate from npm install problems, like this and this, so I don't even publish the badge on the NAN README.

The other problem with Travis is that it's a CI solution, not a proper testing solution. Even if it worked well, it's not really that helpful in the development process, you need rapid feedback that your code is working on your target platforms (this is one reason why I love back-end development more than front-end development!)

Enter Docker and DNT

DNT: Docker Node Tester

Docker is a tool that simplifies the use of Linux containers to create lightweight, isolated compute "instances". Solaris and its variants have had this functionality for years in the form of "zones" but it's a fairly new concept for Linux and Docker makes the whole process a lot more friendly.

DNT contains two tools that work with Docker and Node.js to set-up containers for testing and run your project's tests in those containers.

DNT includes a setup-dnt script that sets up the most basic Docker images required to run Node.js applications, nothing extra. It first creates an image called "dev_base" that uses the default Docker "ubuntu" image and adds the build tools required to compile and install Node.js

Next it creates a "node_dev" image that contains a complete copy of the Node.js source repository. Finally, it creates a series of images that are required; for each Node version, it creates an image with Node installed and ready to use.

Setting up a project is a matter of creating a .dntrc file in the root directory of the project. This configuration file involves setting a NODE_VERSIONS variable with a list of all of the versions of Node you want to test against, and this can include "master" to test the latest code from the Node repository. You also set a TEST_CMD variable with a series of commands required to set up, compile and execute your tests. The setup-dnt command can be run against a .dntrc file to make sure that the appropriate Docker images are ready. The dnt command can then be used to execute the tests against all of the Node versions you specified.

Since Docker containers are completely isolated, DNT can run tests in parallel as long as the machine has the resources. The default is to use the number of cores on the computer as the concurrency level but this can be configured if not appropriate.

Currently DNT is designed to parse TAP test output by reading the final line as either "ok" or "not ok" to report test status back on the command-line. It is configurable but you need to supply a command that will transform test output to either an "ok" or "not ok" (sed to the rescue?).

How I'm using it

My primary use-case is for testing NAN. The test suite needs a lot of work so being able to test against all the different V8 and Node APIs while coding is super helpful; particularly when tests run so quickly! My NAN .dntrc file tests against master, all of the 0.11 releases since 0.11.4 (0.11.0 to 0.11.3 are explicitly not supported by NAN) and the last 5 releases of the 0.10 and 0.8 series. At the moment that's 17 versions of Node in all and on my computer the test suite takes approximately 20 seconds to complete across all of these releases.

The NAN .dntrc

NODE_VERSIONS="\
  master   \
  v0.11.9  \
  v0.11.8  \
  v0.11.7  \
  v0.11.6  \
  v0.11.5  \
  v0.11.4  \
  v0.10.22 \
  v0.10.21 \
  v0.10.20 \
  v0.10.19 \
  v0.10.18 \
  v0.8.26  \
  v0.8.25  \
  v0.8.24  \
  v0.8.23  \
  v0.8.22  \
"
OUTPUT_PREFIX="nan-"
TEST_CMD="\
  cd /dnt/test/ &&                                               \
  npm install &&                                                 \
  node_modules/.bin/node-gyp --nodedir /usr/src/node/ rebuild && \
  node_modules/.bin/tap js/*-test.js;                            \
"

Next I configured LevelDOWN for DNT. The needs are much simpler, the tests need to do a compile and run a lot of node-tap tests.

The LevelDOWN .dntrc

NODE_VERSIONS="\
  master   \
  v0.11.9  \
  v0.11.8  \
  v0.10.22 \
  v0.10.21 \
  v0.8.26  \
"
OUTPUT_PREFIX="leveldown-"
TEST_CMD="\
  cd /dnt/ &&                                                    \
  npm install &&                                                 \
  node_modules/.bin/node-gyp --nodedir /usr/src/node/ rebuild && \
  node_modules/.bin/tap test/*-test.js;                          \
"

Another native Node add-on that I've set up with DNT is my libssh bindings. This one is a little more complicated because you need to have some non-standard libraries installed before compile. My .dntrc adds some extra apt-get sauce to fetch and install those packages. It means the tests take a little longer but it's not prohibitive. An alternative would be to configure the node_dev base-image to have these packages to all of my versioned images have them too.

The node-libssh .dntrc

NODE_VERSIONS="master v0.11.9 v0.10.22"
OUTPUT_PREFIX="libssh-"
TEST_CMD="\
  apt-get install -y libkrb5-dev libssl-dev &&                           \
  cd /dnt/ &&                                                            \
  npm install &&                                                         \
  node_modules/.bin/node-gyp --nodedir /usr/src/node/ rebuild --debug && \
  node_modules/.bin/tap test/*-test.js --stderr;                         \
"

LevelUP isn't a native add-on but it does use LevelDOWN which requires compiling. For the DNT config I'm removing node_modules/leveldown/ prior to npm install so it gets rebuilt each time for each new version of Node.

The LevelUP .dntrc

NODE_VERSIONS="\
  master   \
  v0.11.9  \
  v0.11.8  \
  v0.10.22 \
  v0.10.21 \
  v0.8.26  \
"
OUTPUT_PREFIX="levelup-"
TEST_CMD="\
  cd /dnt/ &&                                                    \
  rm -rf node_modules/leveldown/ &&                              \
  npm install --nodedir=/usr/src/node &&                         \
  node_modules/.bin/tap test/*-test.js --stderr;                 \
#"

What's next?

I have no idea but I'd love to have helpers flesh this out a little more. It's not hard to imagine this forming the basis of a local CI system as well as a general testing tool. The speed even makes it tempting to run the tests on every git commit, or perhaps on every save.

If you'd like to contribute to development then please submit a pull request, I'd be happy to discuss anything you might think would improve this project. I'm keen to share ownership with anyone making significant contributions; as I do with most of my open source projects.

See the DNT GitHub repo for installation and detailed usage instructions.

LevelDOWN v0.10 / managing GC in native V8 programming

LevelDB

Today we released version 0.10 of LevelDOWN. LevelDOWN is the package that directly binds LevelDB into Node-land. It's mainly C++ and is a fairly raw & direct interface to LevelDB. LevelUP is the package that we recommend most people use for LevelDB in Node as it takes LevelDOWN and makes it much more Node-friendly, including the addition of those lovely ReadStreams.

Normally I wouldn't write a post about a minor release like this but this one seems significant because of a number of small changes that culminate in a relatively major release.

In this post:

  • V8 Persistent references
  • Persistent in LevelDOWN; some removed, some added
  • Leaks!
  • Snappy 1.1.1
  • Some embarrassing bugs
  • Domains
  • Summary
  • A final note on Node 0.11.9

V8 Persistent references

The main story of this release are v8::Persistent references. For the uninitiated, V8 internally has two different ways to track "handles", which are references to JavaScript objects and values currently active in a running program. There are Local references and there are Persistent references. Local references are the most common, they are the references you get when you create an object or pass them around within a function and do the normal work that you do with an object. Persistent references are a special case that is all about Garbage Collection. An object that has at least one active Persistent reference to it is not a candidate for garbage collection. Persistent references must be explicitly destroyed before they release the object and make it available to the garbage collector.

Prior to V8 3.2x.xx (I don't know the exact version, does it matter? It roughly corresponds to Node v0.11.3.), these handles were both as easy as each other to create and interchange. You could swap one for the other whenever you needed. My guess is that the V8 team decided that this was a little too easy and that a major cause for memory leaks in C++ V8 code was the ease at which you could swap a Local for a Persistent and then forget to destroy the Persistent. So they tweaked the "ease" equation and it's become quite difficult.

Persistent and Local no longer share the same type hierarchy and the way you instantiate and assign a Persistent has become quite awkward. You now have to go through enough gymnastics to create a Persistent that it makes you ask the question: "Do I really need this to be a Persistent?" Which I guess is a good thing for memory leaks. NAN to the rescue though! We've somewhat papered over those difficulties with the capabilities introduced in NAN, it's still not as easy as it once was but it's not a total headache.

So, you understand v8::Persistent now? Great, so back to LevelDOWN.

Persistent in LevelDOWN; some removed, some added!

Some removed

Recently, Matteo noticed that when you're performing a Batch() operation in LevelDB, there is an explicit copy of the data that you're feeding in to that batch. When you construct a Batch operation in LevelDB you start off with a short string representing the batch and then build on that string as you build your batch with both Put() and Del() operations. You end up with a long string containing all of your write data; keys and values. Then when you call Write() on the Batch, that string gets fed directly into the main LevelDB store as a single write—which is where the atomicity of Batch comes from.

Both the chained-form and array-form batch() operations work this way internally in LevelDOWN.

However, with almost all operations in LevelDOWN, we perform the actual writes and reads against LevelDB in libuv worker threads. So we have to create the "descriptor" for work in the main V8 Node thread and then hand that off to libuv to perform the work in a separate thread. Once the work is completed we get the results back in the main V8 Node thread from where we can trigger a callback. This is where Persistent references come in.

Before we hand off the work to libuv, we need to make Persistent references to any V8 object that we want to survive across the asynchronous operation. Obviously the main candidate for this is callback functions. Consider this code:

db.get('foo', function (err, value) {
  console.log('foo = %s', value)
})

What we've actually done is create an anonymous closure for our callback. It has nothing referencing it, so as far as V8 is concerned it's a candidate for garbage collection once the current thread of execution is completed. In Node however, we're doing asynchronous work with it and need it to survive until we actually call it. This is where Persistent references come in. We receive the callback function as a Local in our C++ but then assign it to a Persistent so GC doesn't touch it. Once we're done our async work we can call the function and destroy the Persistent, effectively turning it back in to a Local and freeing it up for GC.

Without the Persistent then the behaviour is indeterminate. It depends on the version of V8, the GC settings, the workload currently in the program and the amount of time the async work takes to complete. If the GC is aggressive enough and has a chance to run before our async work is complete, the callback will disappear and we'll end up trying to call a function that no longer exists. This can obviously lead to runtime errors and will most likely crash our program.

In LevelDOWN, if you're passing in String objects for keys and values then to pull out the data and turn it in to a form that LevelDB can use we have to do an explicit copy. Once we've copied the data from the String then we don't need to care about the original object and GC can get its hands on it as soon as it wants. So we can leave String objects as Local references while we are building the descriptor for our async work.

Buffer objects are a different matter all together. Because we have access to the raw character array of a Buffer, we can feed that data straight in to LevelDB and this will save us one copy operation (which can be a significant performance boost if the data is significantly large or you're doing lots of operations—so prefer Buffers where convenient if you need higher perf). When building the descriptor for the async work, we are just passing a character array to the LevelDB data structures that we're setting up. Because the data is shared with the original Buffer we have to make sure that GC doesn't clean up that Buffer before we have a chance to use the data. So we make a Persistent reference for it which we clean up after the async work is complete. So you can do this without worrying about GC:

db.put(
    new Buffer('foo')
  , require('crypto').randomBytes(1024)
  , function (err) {
      console.log('foo is now some random data!')
    }
)

This has been the case in LevelDOWN for all operations since pretty much the beginning. But back to Matteo's observation. If LevelDB's data structures perform an explicit copy on the data we feed it then perhaps we don't need to keep the original data safe from GC? For a batch() call it turns out that we don't! When we're constructing the Batch descriptor, as we feed in data to it, both Put() and Del(), it's taking a copy of our data to create its internal representation. So even when we're using Buffer objects on the JavaScript side, we're done with them before the call down in to LevelDOWN is completed so there's no reason to save a Persistent reference! For other operations we're still doing some copying during the asynchronous cycle but the removal of the overhead of creating and deleting Persistent references for batch() calls is fantastic news for those doing bulk data loading (like Max Ogden's dat project which needs to bulk load a lot of data).

Some added

Another gem from Matteo was reports of crashes during certain batch() operations. Difficult to reproduce and only under very particular circumstances, it seems to be mostly reproducible by the kinds of workloads generated by LevelGraph. Thanks to some simple C++ debugging we traced it to a dropped reference, obviously by GC. The code in question boiled down to something like this:

function doStuff () {
  var batch = db.batch()
  batch.put('foo', 'bar')
  batch.write(function (err) {
    console.log('done', err)
  })
}

In this code, the batch object is actually a LevelDOWN Batch object created in C++-land. During the write() operation, which is asynchronous, we end up with no hard references to batch in our code because the JS thread has yieled and moved on and the batch is contained within the scope of the doStuff() function. Because most of the asynchronous operations we perform are relatively quick, this normally doesn't matter. But for writes to LevelDB, if you have enough data in your write and you have enough data already in your data store, you can trigger a compaction upstream which can delay the write which can give V8's GC time to clean up references that might be important and for which you have no Persistent handles.

In this case, we weren't actually creating internal Persistent references for some of our objects. Batch in this case but also Iterator. Normally this isn't a problem because to use these objects you generally keep references to them yourself in your own code.

We managed to debug Matteo's crash by adjusting his test code to look something like this and watching it succeed without a crash:

function doStuff () {
  var batch = db.batch()
  batch.put('foo', 'bar')
  batch.write(function (err) {
    console.log('done', err)
    batch.foo = 'bar'
  })
}

By reusing batch inside our callback function, we're creating some work that V8 can't optimise away and therefore has to assume isn't a noop. Because the batch variable is also now referenced by the callback function and we already have an internal Persistent for it, GC has to pass over batch until the Persistent is destroyed for the callback.

So the solution is simply to create a Persistent for the internal objects that need to survive across asynchronous operations and make no assumptions about how they'll be used in JavaScript-land. In our case we've gone for assigning a Persistent just prior to every asynchronous operation and destroying it after. The alternative would be to have a Persistent assigned upon the creation of objects we care about but sometimes we want GC to do its work:

function dontDoStuff () {
  var batch = db.batch()
  batch.put('foo', 'bar')
  // nothing else, wut?
}

I don't know why you would write that code but perhaps you have a use-case where you want the ability to start constructing a batch but then decide not to follow through with it. GC should be able to take care of your mess like it does with all of the other messes you create in your daily adventures with JavaScript.

So we are only assigning a Persistent when you do a write() with a chained-batch operation in LevelDOWN since it's the only asynchronous operation. So in dontDoStuff() GC will come along and rid us of batch, 'foo' and 'bar' when it has the next opportunity and our C++ code will have the appropriate destructors called that will clean up any other objects we have created along the way, like the internal LevelDB Batch with its copy of our data.

Leaks!

We've been having some trouble with leaks in LevelUP/LevelDOWN lately (LevelDOWN/#171, LevelGraph/#40). And it turns out that these leaks aren't related to Persistent references, which shouldn't be a surprise since it's so easy to leak with non-GC code, particularly if you spend most of your day programming in a language with GC.

With the help of Valgrind we tracked the leak down to the omission of a delete in the destructor of the asynchronous work descriptor for array-batch operations. The internal LevelDB representation of a Batch wasn't being cleaned up unless you were using the chained-form of LevelDOWN's batch(). This one has been dogging us for a few releases now and it's been a headache particularly for people doing bulk-loading of data so I hope we can finally put it behind us!

Snappy 1.1.1

Google released a new version of Snappy, version 1.1.1. I don't really understand how Google uses semver; we get very simple LevelDB releases with the minor version bumped and then we get versions of Snappy released with non-trivial changes with only the patch version bumped. I suspect that Google doesn't know how it uses semver either and there's no internal policy on it.

Anyway, Snappy 1.1.1 has some fixes, some minor speed and compression improvements but most importantly it breaks compilation on Windows. So we had to figure out how to fix that for this release. Ugh. I also took the opportunity to clean up some of the compilation options for Snappy and we may see some improvements in the way it works now... perhaps.

Some embarrassing bugs

Amine Mouafik is new to the LevelDOWN repository but has picked up some rather embarrassing bugs/omissions that are probably my fault. It's great to have more eyes on the C++ code, there's not enough JavaScript programmers with the confidence to dig in to messy C++-land.

Firstly, on our standard LevelDOWN releases, it turns out that we haven't actually been enabling the internal bloom filter. The bloom filter was introduced in LevelDB to speed up read operations to avoid having to scan through whole blocks to find the data a read is looking for. So that's now enabled for 0.10.

Then he discovered that we had been turning off compression by default! I believe this happened with the the switch to NAN. The signature for reading boolean options from V8 objects was changed from internal LD_BOOLEAN_OPTION_VALUE & LD_BOOLEAN_OPTION_VALUE_DEFTRUE macros for defaulting to true and false respectively when the options aren't supplied, to the NAN version which is a unified NanBooleanOptionValue which takes an optional defaultValue argument that can be used to make the default true. This happened at roughly Node version 0.11.4.

Well, this code:

bool compression =
    NanBooleanOptionValue(optionsObj, NanSymbol("compression"));

is now this:

bool compression =
    NanBooleanOptionValue(optionsObj, NanSymbol("compression"), true);

so if you don't supply a "compression" boolean option in your db setup operation then it'll now actually be turned on!

Domains

We've finally caught up with properly supporting Node's domains by switching all C++ callback calls from standard V8 callback->Call(...) to Node's own node::MakeCallback(callback, ...) which does the same thing but also does lots of additional things, including accounting for domains. This change was also included in NAN version 0.5.0.

Summary

Go and upgrade!

leveldown@0.10.0 is packaged with the new levelup@0.18.0 and level@0.18.0 which have their minor versions bumped purely for this LevelDOWN release.

Also released are the packages:

  • leveldown-hyper@0.10.0
  • leveldown-basho@0.10.0
  • rocksdb@0.10.0 (based on the same LevelDOWN code) (Linux only)
  • level-hyper@0.18.0 (levelup on leveldown-hyper)
  • level-basho@0.18.0 (levelup on leveldown-basho)
  • level-rocks@0.18.0 (levelup on rocksdb) (Linux only)

I'll write more about these packages in the future since they've gone largely under the radar for most people. If you're interested in catching up then please join ##leveldb on Freenode where there's a bunch of Node database people and also a few non-Node LevelDB people like Robert Escriva, author of HyperLevelDB and all-round LevelDB expert.

A final note on Node 0.11.9

There will be a LevelDOWN@0.10.1 very soon that will increment the NAN dependency to 0.6.0 when it's released. This new version of NAN will specifically deal with Node 0.11.9 compatibility where there are more breaking V8 changes that will cause compile errors for any addon not taking them in to account. So if you're living on the edge in Node then we should have a release soon enough for you!

All the levels!

When we completely separated LevelUP and LevelDOWN so that installing LevelUP didn't automatically get you LevelDOWN, we set up a new package called Level that has them both as a dependency so you just need to do var level = require('level') and everything is done for you.

But, we now have more than just the vanilla (Google) LevelDB in LevelDOWN. We also have a HyperLevelDB version and a Basho fork. These are maintained on branches in the LevelDOWN repo and are usually released now every time a new LevelDOWN is released. They are called leveldown-hyper and leveldown-basho in npm but you need to plug them in to LevelUP yourself to make them work. We also have Node LMDB that's LevelDOWN compatible and a few others.

So, as of today, we've released a new, small library called level-packager that does this bundling process so that you can feed it a LevelDOWN instance and it'll return a Level-type object that can be exported from a package like Level. This is meant to be used internally and it's now being used to support these new packages that are available in npm:

  • level-hyper bundles the HyperLevelDB version of LevelDOWN with LevelUP
  • level-basho bundles the Bash fork of LevelDB in LevelDOWN with LevelUP
  • level-lmdb bundles Node LMDB with LevelUP

The version numbers of these packages will track the version of LevelUP.

So you can now simply do:

var level = require('level-hyper')
var db = level('/path/to/db')
db.put('foo', 'woohoo!')

If you're already using Level then you can very easily switch it out with one of these alternatives to try them out.

Both HyperLevelDB and the Basho LevelDB fork are binary-compatible with Google's LevelDB, with one small caveat: with the latest release, LevelDB has switched to making .ldb files instead of .sst files inside a data store directory because of something about Windows backups (blah blah). Neither of the alternative forks know anything about these new files yet so you may run in to trouble if you have .ldb files in your store (although I'm pretty sure you can simply rename these to .sst and it'll be fine with any version).

Also, LMDB is completely different to LevelDB so you won't be able to open an existing data store. But you should be able to do something like this:

require('level')('/path/to/level.db').createReadStream()
  .pipe(require('level-lmdb')('/path/to/lmdb.db').createWriteStream())

Whoa...

A note about HyperLevelDB

Lastly, I'd like to encourage you to try the HyperLevelDB version if you are pushing hard on LevelDB's performance. The HyperDex fork is tuned for multi-threaded access for reads and writes and is therefore particularly suited to how we use it in Node. The Basho version doesn't show much performance difference mainly because they are optimising for Riak running 16 separate instances on the same server so multi-threaded access isn't as interesting for them. You should find significant performance gains if you're doing very heavy writes in particular with HyperLevelDB. Also, if you're interested in support for HyperLevelDB then pop in to ##leveldb on Freenode and bother rescrv (Robert Escriva), author of HyperLevelDB and our resident LevelDB expert.

It's also worth nothing that HyperDex are interested in offering commercial support for people using LevelDB, not just HyperLevelDB but also Google's LevelDB. This means that anyone using either of these packages in Node should be able to get solid support if they are doing any heavy work in a commercial environment and need the surety of experts behind them to help pick up the pieces. I imagine this would cover things like LevelDB corruption and any LevelDB bugs you may run in to (we're currently looking at a subtle batch-related LevelDB bug that's come along with the 1.14.0 release, they do exist!). Talk to Robert if you want more information about commercial support.