r.va.gg

Primitives for JS Databases (an LXJS adventure)

I gave a talk yesterday at LXJS yesterday in the "Infrastructure.js" block and tried to talk about JavaScript Database Primitives; i.e. the basic building blocks we have landed on for building more complex database solutions in JavaScript.

The talk certainly wasn't as good or clear as I wanted it to be, it worked much better in my head! A huge venue with over 300 talented JavaScripters, an absolutely massive screen, bright lights and loud amplification got the better of me and I wasn't able to pull the material together how I wanted to. The introvert within me is telling me to become a recluse for a little while just to recover! My hope is that at least one or two people are inspired to give database hacking a go because it's really not that difficult once you get your head around the primitives.

Edit: I wasn't trying to elicit sympathy here, I genuinely think that I wasn't clear on what I was trying to communicate. It went so well in my head, as it usually does, but I fell far short of what I wanted to express. I'll attempt to rectify some of that with a writeup (see next para).

Thankfully though, a portion of the material will be able to serve as the basis for the, long overdue, third part in my three part DailyJS series on LevelDB & Node.

In summary, inspired by LevelDB, we've ended up with a core set of primitives in LevelUP that can be used to build feature-rich and advanced database functionality. Atomic batch and ReadStream are the two non-trivial primitives, open, close, get, put, del are all pretty easy to understand as primitives, although del is perhaps redundant but we're opting for explicitness.

My slides are online but hopefully I'll be able to get my DailyJS article sorted out soon and I'll be able to explain what I was trying to get at.

ReadStream as a primitive query mechanism is not too hard to understand once you get your head around key sorting and the implications for key structure. Batch is a little more subtle and relates to consistency and our ability to augment basic operations to create more complex functionality while keeping the data store in a consistent state.

I additionally raised "Buckets", or "Namespaces" as a primitive concept and discussed how sublevel has effectively become the standard for turning a one-dimensional data store into a multi-dimensional store able to encapsulate contain sophisticated functionality behind what is essentially just a key/value store.

Thanks to the LXJS team

It would be neglectful of me to not say how absolutely grateful I am to the LXJS team for putting so much effort into taking care of speakers; fantastic job.

LXJS is an amazing event, put on by a dedicated and very talented team of people committed to the JavaScript community and the JavaScript community in Portugal in particular. This conference sets a very high bar for community-driven conferences with the way it has managed to get so many locals (and internationals!) involved in running an event in their own time.

David Dias, Ana Hevesi, Pedro Teixeira, Luís Reis, Nuno Job, Tiago Rodrigues, Leo Xavier, Alexander Kustov, André Rodrigues and Bruno Coelho have managed to put on an amazing event and are some of the nicest and talented people I've met. Thank you to you all and everyone else who put on LXJS 2013, your hard work is appreciated and should be an inspiration to everyone involved in our local JavaScript communities, running events or considering running events like this.

NodeConf.eu

Wow, NodeConf.eu was certainly a once-in-a-lifetime event ... although there's talk of a repeat performance next year (don't miss the chance when it comes around!).

Raise that flag

Dominic Tarr, @substack and Julian Gruber raising the NodeConf.eu flag

NodeConf.eu was held in Waterford, Ireland, on an Island, in a Castle and was organised by the Node lovin' company, nearForm, in particular Cian O'Maidin and his amazing assistant Catherine Bradley. Of course Mikeal Rogers had a significant role in organising the event too.

Waterford Castle

Waterford Castle

Pig

The welcome banquet ... yep

Instead of describing the talks, I'll defer to the excellent four part series by Paul, Adam, Luke and Ben of Clock where you'll find a great summary of the talks and events of the conference.

For my part, I was deeply honoured to be involved in the "Node Databases" track of the conference. We started off the NodeConf.eu talks with a 3-part show. My talk was titled "A Real Database Rethink" and was followed by Dominic Tarr who talked more about the Level* ecosystem and the various pieces of the Node Databases puzzle that's being built. Julian Gruber then closed us off with some amazing live-coding of some browser/server streaming LevelUP/multilevel wizardry.

A Real Database Rethink

The slides of my talk are online. I attempted to break down the definition of the term "database" by looking at where the concept comes from historically. It's actually a difficult thing to define and I don't believe there is any one agreed upon meaning. What I came up with is:

A tool for interacting with structured data, externalised from the core of our application

  • Persistence
  • Performance
  • Simplify access to complex data

And sometimes...

  • Shared access
  • Scalability

But even that's pretty rough.

Taking that definition, we can apply Node philosophy of small-core and vibrant user-land, along with the culture of extreme modularity afforded us by npm, and build a new kind of database; or at least apply new thinking to the "database".

The bulk of my talk was taken up with talking about LevelUP and the basics of the Level ecosystem. There's a table on slide #7 that I'm going to try and refine over time to help describe what the Level / NodeBase world is all about.

Level Me Up Scotty!

One of the three workshops available at NodeConf.eu was all about Node Databases. I took the same approach as at CampJS recently where I built Learn You The Node.js For Much Win!, a tool that owes a debt to stream-adventure, a self-guided workshop-in-your-terminal application by @substack and Max Ogden written for NodeConf (US).

This time around, I received some great help from both @substack and Julian Gruber who helped write some exercises, I also received help from Eugene Ware who wasn't even at the conference but was assisting with development from Australia. Raynos was also a great help in getting the application working well.

We ended up with Level Me Up Scotty!, or just levelmeup.

levelmeup

Dominic Tarr, Thorsten Lorenz, Paolo Fragomeni, Matteo Collina, Magnus Skog, Max Ogden and other experienced Levelers helped on and off while the workshops were happening; so we had plenty of expertise at hand whenever there were questions.

Workshops were unstructured and the organisers of each workshop all ended up agreeing that we should just let people come and go as they pleased. This suited us as the workshop was open-ended and designed not to be finished by most people within the originally planned hour (I think that was the original plan).

levelmeup is installed from npm (npm install levelmeup -g) and is fully self-guided. You run the levelmeup application and it steps you through some exercises designed to:

  • introduce you to the format of the workshops with a simple "Hello World" style exercise
  • introduce you to LevelUP and its basic operations
  • help you understand ReadStream and the range-queries it makes possible
  • encourage creative thought regarding key structure
  • introduce sublevel
  • introduce multilevel

There's more planned for the future of this workshop application too, Matteo even has an a work-in-progress exercise that should be merged fairly soon.

nodeschool.io was hatched from NodeConf.eu and pulls together the three workshop applications currently available in npm. I believe this was an initiative of Brian J. Brennan and other Mozillans on the Open Badges project. workshopper is the engine that runs both learnyounode and levelmeup and we're trying to make it even easier for others to author their own workshop applications. There is already a Functional Javacript Workshop by Tim Oxley and there are more in development. Exciting times!

Level Me Up Workshoppers

Workshoppers stretching their brains with levelmeup

My experience with stream-adventure and learnyounode suggested that this format should prove to be relatively successful but ultimately I think we had most of the attendees come through at some point and sit down to have a crack at the workshop. This is particularly impressive given that Emily Rose, Elijah Insua and Matteo were running a NodeBots workshop which included Arduino and NodeCopter hacking (always popular!). And Max Bruning and TJ Fontaine were running a Manta / MDB / DTrace / SmartOS-magic workshop and their material was some of my favourite from NodeConf (US) so I'm sure people really enjoyed what they had to present.

Unfortunately I didn't get to attend these other workshops, I also missed out on some skeet!

Skeet

Karolina "don't mess with me" Szczur, photo by Matthew Bergman

But there was plenty of other experience to be had. It was also fantastic to meet so many people I only knew from IRC / Twitter / GitHub. For someone who lives in regional Australia and doesn't get a chance to socialise much with other nerds, this was a particularly special opportunity.

Shenanigans

Final night banquet shenanigans with Charlie McConnell and @substack ... the napkin hat thing is a story in itself, blame Jessica Lord, photo by Matthew Bergman

The Level* Gang

As an aside, NodeConf.eu had the largest concentration of LevelUP contributors and active Level* developers of any event that I'm aware of so far. So we took the opportunity to have our own little meeting. We even took minutes, of sorts.

There has been a long-standing plan to make a Level* / NodeBase website but being the disorganised rabble we are, it hasn't got off the ground. Karolina (and Jessica too I believe) are keen to help out on the design end but just need the content. So that's what we planned. There's a bunch of issues that form a TODO in the repo for this project. Hopefully we can all get on top of it sooner rather than later. We're also open to assistance from anyone else that would like to contribute.

Besides getting stuff done, it was just a pleasure to hang out with these people and talk shop.

A momentus event

The Level* Gang: Paolo, Dominic, @substack, Karolina, Magnus, Mikeal, Julian, Max, Matteo and Paul Fryzel. Raynos was around but missed this particular event, Thorsten was inside demoing his guitar-typing software.

Learn You The Node.js

CampJS has just finished, with a bigger crowd than last time around. It was lots of fun, and as usual, these events are more about meeting the people I collaborate, and socialise with online than anything else. There was a particularly large turn-out of the hackers on #polyhack, our Australian programmers channel on Freenode. Even @mwotton, our resident Haskell-troll was there! Lots of photos and news can be found on Storify. The next one will likely be near Melbourne in February some time and I highly recommend it if you can get there.

Learn You The Node.js For Much Win (presentation)

I was struck last CampJS how many JavaScript newbies were there, or at least people who deal with JavaScript as a secondary language and therefore only have a cursory understanding of it. And by extension, there were not many people who had much understanding of Node. So I wanted to present some intro-to-Node material this time.

I gave a 30 minute talk covering the very basics of what Node is, called Learn You The Node.js For Much Win. Obviously the title is inspired by Learn You a Haskell For Great Good and Learn You Some Erlang For Great Good. You can find my slides here (feel free to rip them off if you need to give a similar talk somewhere!). The video may be online at some point in the future.

Learn You The Node.js For Much Win (workshop)

The next morning, I gave a workshop on the same topic but it was much more hands-on. The inspiration for my workshop came from NodeConf, a couple of months earlier. @substack and @maxogden presented a workshop titled stream adventure which was a self-guided, interactive workshop for the terminal, built with Node. You can find it here and install it from npm with npm install stream-adventure -g, I highly recommend it.

NPM

I was so inspired that I stole their code and made my own workshop application! learnyounode. You can download and install it with npm install learnyounode -g.

NPM

The application itself is/was a series of 13 separate workshops. Starting off with a simple HELLO WORLD and ending with a JSON API HTTP server (contributed by the very clever @sidorares).

learnyounode

Nobody actually managed to finish the workshops in the allotted 60 minutes, although @alexdickson, an expert JavaScripter but Node-n00b was the first one I heard of finishing it not long after.

The workshops attempt to focus on some of the core concepts of Node. There's lots of console output because that's easiest to validate but it introduces filesystem I/O, both synchronous and asynchronous and moves straight on to networking because that's what Node is so good at. An HTTP CLIENT example, introduces HTTP and is expanded on in HTTP COLLECT which introduces streams. JUGGLING ASYNC builds on HTTP COLLECT to introduce the complexities of managing parallel asynchronous activities. From there, it switches from network clients to network servers, first a simple TCP server in TIME SERVER and then using streams to serve files in HTTP FILE SERVER and transforming data with HTTP UPPERCASERER. The final exercise presents you with a more complex, closer-to-real-world example, an HTTP API server with multiple end-points.

The entire workshop is designed to take longer than 1-hour, people ought to be able ot take it away and complete it later. It's also designed to be suitable for complete n00bs and also people with some experience, it ought to make a fun challenge for anyone already experienced with Node to see how quickly they can complete the examples (I believe I earned the honour of being the first person at NodeConf to finish stream-adventure in the allotted time!).

The Node-experts at CampJS were thankfully helping out during the workshop so there wasn't much competition going on there.

Many thanks to these expert Node nerds who hovered and helped people during the workshop and also did some test-driving of the workshop prior to the event:

(I really hope I haven't missed anyone out there; so many quality nerds at CampJS!)

Tim Oxley making a contribution during the workshop, along with Christopher Giffard (left) and Eugene Ware (right)

I had the solutions to the workshop ready on the big-screen and walked through some of the early solutions and talked through what was going on. I didn't expect many people to listen to those bits and the workshop was designed so you could totally zone-out and do it at your own pace if that suited.

If anyone wants to run a similar style workshop for their local meet-up, using the same content, I'd love to receive contributions to learnyounode. Alternatively, make your own! I extracted the core framework from learnyounode and it now lives separately as workshopper.

NPM

I would love feedback from anyone in attendance or anyone that uses this tool to run their own workshops! learnyounode is already listed in Max Ogden's excellent The Art of Node, so I'm looking forward to contributions to help turn this into a really useful teaching tool.

LevelDOWN Alternatives

Since LevelUP v0.9, LevelDOWN has been an optional dependency, allowing you to switch in alternative back-ends.

We have MemDOWN, a pure in-memory data-store, allowing you to run LevelUP against transient, and very fast storage.

We also have level.js which works against IndexedDB, allowing you to run LevelUP in the browser!

Since LevelUP just needs some basic primitives and a sorted bi-directional iterator, we can swap out the back-end with numerous alternatives.

The easy targets are the forks of LevelDB that purport to fix or improve LevelDB in some way. I have another post brewing on what I think about the claims made in this area and how we ought to approach them, but that can come later. For now I have some packages in npm for you to try for yourself!

Basho

First of all we have leveldown-basho which bundles the Basho LevelDB fork into LevelDOWN. See Matthew Von-Maszewski's slides from the recent Ricon East 2013 for more information on what they've tried to do with LevelDB.

In summary, Basho's aim is to optimise LevelDB "for the server", particularly for high write throughput. They use >1 compaction threads and relax the rules a little on overlapping keys for the lower levels. Plus a few other things that I won't get in to here.

$ npm install levelup leveldown-basho
var levelup = require('levelup')
  , leveldown = require('leveldown-basho')
  , db = levelup('/path/to/db', { db: leveldown })  
// go to work on `db`

Disclaimer: some of the LevelDOWN and LevelUP tests are failing on the current build for this release, although I don't believe they should impact on standard usage but your mileage may vary...

HyperDex

Next, we have leveldown-hyper, which bundles a fork by the people behind HyperDex, a key-value store. Again their aim is to optimise LevelDB for a server environment. You can see some of their claims about performance here. I don't know as much about this fork, I'll investigate further when I have time, but they are also using multiple compaction threads to do the background work.

$ npm install levelup leveldown-hyper
var levelup = require('levelup')
  , leveldown = require('leveldown-hyper')
  , db = levelup('/path/to/db', { db: leveldown })  
// go to work on `db`

Lies! Benchmarks!

OK, benchmarks kind of suck, particularly microbenchmarks. It's really hard to test something that's meaningful for everyone's use-case. But you can make pretty pictures with them and they can tell something of a story, even if it's just the first page of a novel.

So here we go. I've put together a simplistic benchmark that tries to test the kind of situation that these two forks are aiming to optimise for. In particular, high-throughput writes. There's a common claim that LevelDB has problems with writes because the compaction thread can hold up levels 0 and 1 while it's working on higher levels; and you really want to be flushing the new data as soon as possible so you can get more in. (Again, I have more to say on this & the claims about "fixing" the problem in a later post).

I have a sorted-write benchmark in the LevelDOWN repo that tries to push in 10M pre-sorted entries as fast as possible, fully utilising Node's worker-threads for the job. So this isn't your typical browser scenario. An important point here is that Node is a unique environment when looking at LevelDB performance. It's not going to be a straightforward mapping of benchmark results obtained with other LevelDB bindings onto what we can achieve in Node.

Because there are so many entries, instead of recording the time for individual writes, I've recorded average time for batches of 1000 writes. Below you can see what the write-times look like when plotted over time. There are a bunch of outliers that are above the maximum Y of 0.6ms, but not enough to warrant distracting from the interesting behaviour below 0.6ms so I chopped it off there.

It is important to note that I'm using the default options here and this is where I'll probably cop some flak. Basho in particular advocate a healthy amount of "tuning" to achieve appropriate performance. In particular the write-buffer defaults to only 4M and you can push data in faster (at the cost of compactions later on) by increasing this. I think the forks may even have additional tunables of their own that you can fiddle with. But, this whole tuning thing is a rabbit hole I don't dare go down right now!

I'm running this on an i7-2630QM CPU, plenty of RAM and an SSD.

You can see that we managed to push in the 10M entries in just over 95 seconds with the plain Google LevelDB (v1.10.0).


Next up we have the HyperDex fork. The main difference here is that we have it working slightly faster in total and the write-times have been trimmed down a bit to be more consistent. Not a bad effort with default settings, quite a nice picture.


Lastly we can see what Basho have done. They've been on this case for a lot longer than HyperDex have and their fork, internally at least, diverges quite a bit from Google's LevelDB.

We can see that the write-time has been considerably flattened; which is in line with what Basho claim and are aiming for, the consistency here is very impressive. Unfortunately we've ended up with a total time that is double what it took Google's LevelDB to get the 10M entries in!

No doubt this is probably something to do with the tunables, or perhaps I've messed something up, anything's possible!


So?

If you take anything away from this, here's what I think it should be: Do your own benchmarks if performance really is an issue for you. You're going to need some kind of benchmark suite that is tailored to your particular application. This will not only let you choose the appropriate storage system but it will give you something to work with when you start to get in to the mire that is "tunables".

It's likely I won't be able to leave this alone and will be posting more benchmarks with some tweaking and tuning. I'd love to have input from others on this too of course! The code for this is all in the LevelDOWN repo with both of these forks under appropriately named branches.

LevelUP v0.9 Released

LevelDB

As per my previous post, LevelUP v0.9 has been released!

I'm doing a quick post about this release because it's got more changes in it than we normally see, including some things worth explaining.

Relationship to LevelDOWN

The biggest change is the removal of LevelDOWN as a dependency, you should review what I've already said about this as this will impact you if you're currently using LevelUP. In short, you'll either need to explicitly npm install leveldown or switch to using the new Level package which bundles them both.

Along with this change, we also get better Browserify support. See level.js for more information on this.

Chained batch

The other major change is the introduction of a new chained batch syntax, additional to the existing batch syntax. This method of creating and writing batch operations is much closer to the way LevelDB does batches and under certain circumstances you may find improved performance from using this method.

If you call db.batch() with no arguments, you'll get a Batch object back which has the following operations: put(), del(), clear() and write(). The first three are chainable so you can call them one after the other to build your batch. write() is the only method that takes a callback because it submits the batch. Until you call write(), the batch is transient and can be discarded.

Example from the README:

db.batch()
  .del('father')
  .put('name', 'Yuri Irsenovich Kim')
  .put('dob', '16 February 1941')
  .put('spouse', 'Kim Young-sook')
  .put('occupation', 'Clown')
  .write(function () { console.log('Done!') })

Some love for WriteStream

WriteStream got some attention in this release. On the main createWriteStream() method and on individual write() calls, you can now pass some new options:

  • 'type' can switch from the default 'put' to 'del' so you can make a WriteStream that only deletes when you write({ key: 'foo' }), or you can make individual writes delete: write({ type: 'del', key: 'foo' }).
  • 'keyEncoding' and 'valueEncoding' will switch from default encodings for the current LevelUP instance. Again, you can specify them on the main createWriteStream() or on individual write() calls.

Other changes

  • A race condition was fixed that allowed a put() to write to the store before an iterator was obtained when calling `createReadStream().
  • ReadStream no longer emits a 'ready' event.
  • The db property on LevelUP instances can be used to get access to LevelDOWN or whatever LevelDOWN-substitute you are using (this was _db).
  • Some very LevelDB-specific methods have been deprecated on LevelUP and the documentation now recommends either directly using LevelDOWN or calling via the db property. Specifically:
    • db.db.approximateSize()
    • leveldown.repair()
    • leveldown.destroy()
  • LevelDOWN got a new LevelDB method: getProperty() that currently understands 3 properties:
    • db.db.getProperty('leveldb.num-files-at-levelN'): returns the number of files at level N, where N is an integer representing a valid level (e.g. "0")').
    • db.db.getProperty('leveldb.stats'): returns a multi-line string describing statistics about LevelDB's internal operation.
    • db.db.getProperty('leveldb.sstables'): returns a multi-line string describing all of the sstables that make up contents of the current database.
  • Significantly improved ReadStream performance improvements (up to 50% faster).
  • Some LevelDOWN memory leaks discovered and fixed.
  • LevelDOWN upgraded to LevelDB@1.10.0, details here.

Who you should thank

A lot of people put in work to this release. There's a team of people that can claim ownership of LevelUP, LevelDOWN and related projects and most of them have been involved in this release. You should follow these people on Twitter and GitHub!

And others, who you can find in this 0.9 WIP thread, plus additional users who reported & found issues.