r.va.gg

All the levels!

When we completely separated LevelUP and LevelDOWN so that installing LevelUP didn't automatically get you LevelDOWN, we set up a new package called Level that has them both as a dependency so you just need to do var level = require('level') and everything is done for you.

But, we now have more than just the vanilla (Google) LevelDB in LevelDOWN. We also have a HyperLevelDB version and a Basho fork. These are maintained on branches in the LevelDOWN repo and are usually released now every time a new LevelDOWN is released. They are called leveldown-hyper and leveldown-basho in npm but you need to plug them in to LevelUP yourself to make them work. We also have Node LMDB that's LevelDOWN compatible and a few others.

So, as of today, we've released a new, small library called level-packager that does this bundling process so that you can feed it a LevelDOWN instance and it'll return a Level-type object that can be exported from a package like Level. This is meant to be used internally and it's now being used to support these new packages that are available in npm:

  • level-hyper bundles the HyperLevelDB version of LevelDOWN with LevelUP
  • level-basho bundles the Bash fork of LevelDB in LevelDOWN with LevelUP
  • level-lmdb bundles Node LMDB with LevelUP

The version numbers of these packages will track the version of LevelUP.

So you can now simply do:

var level = require('level-hyper')
var db = level('/path/to/db')
db.put('foo', 'woohoo!')

If you're already using Level then you can very easily switch it out with one of these alternatives to try them out.

Both HyperLevelDB and the Basho LevelDB fork are binary-compatible with Google's LevelDB, with one small caveat: with the latest release, LevelDB has switched to making .ldb files instead of .sst files inside a data store directory because of something about Windows backups (blah blah). Neither of the alternative forks know anything about these new files yet so you may run in to trouble if you have .ldb files in your store (although I'm pretty sure you can simply rename these to .sst and it'll be fine with any version).

Also, LMDB is completely different to LevelDB so you won't be able to open an existing data store. But you should be able to do something like this:

require('level')('/path/to/level.db').createReadStream()
  .pipe(require('level-lmdb')('/path/to/lmdb.db').createWriteStream())

Whoa...

A note about HyperLevelDB

Lastly, I'd like to encourage you to try the HyperLevelDB version if you are pushing hard on LevelDB's performance. The HyperDex fork is tuned for multi-threaded access for reads and writes and is therefore particularly suited to how we use it in Node. The Basho version doesn't show much performance difference mainly because they are optimising for Riak running 16 separate instances on the same server so multi-threaded access isn't as interesting for them. You should find significant performance gains if you're doing very heavy writes in particular with HyperLevelDB. Also, if you're interested in support for HyperLevelDB then pop in to ##leveldb on Freenode and bother rescrv (Robert Escriva), author of HyperLevelDB and our resident LevelDB expert.

It's also worth nothing that HyperDex are interested in offering commercial support for people using LevelDB, not just HyperLevelDB but also Google's LevelDB. This means that anyone using either of these packages in Node should be able to get solid support if they are doing any heavy work in a commercial environment and need the surety of experts behind them to help pick up the pieces. I imagine this would cover things like LevelDB corruption and any LevelDB bugs you may run in to (we're currently looking at a subtle batch-related LevelDB bug that's come along with the 1.14.0 release, they do exist!). Talk to Robert if you want more information about commercial support.

Should I use a single LevelDB or many to hold my data?

This is a long overdue post, so long in fact that I can't remember who I promised to do this for! Regardless, I keep on having discussions around this topic so I thought it worthwhile putting down some notes on what I believe to be the factors you should consider when making this decision.

What's the question?

It goes like this: You have an application that uses LevelDB, in particular I'm talking about Node.js applications here but the same would apply if you're using LevelUP in the browser and also most of the other back-ends for LevelUP. And you invariably end up with different kinds of data, sometimes the kinds of data you're storing is so different that it feels strange putting them into the same storage blob. Often though, you just have sets of not-very-related data that you need to store and you end up having to make a decision: do I put everything into a single LevelDB store or do I put things into their own, separate, LevelDB store?

This stuff doesn't belong together!

Coming from an relational database background, it took me a little while to displace the concept of discrete tables with the notion of namespacing within the same store. I can understand the temptation to want to keep things separate, not wanting to end up with a huge blob of data that just shouldn't be together. But this isn't the relational database world and you need to move on!

We have a set of LevelUP addons, such as sublevel, that exist mainly to provide you with the comfort of being able to separate your data by whatever criteria makes sense. bytewise is another tool that can serve a similar purpose and some people even use sublevel and bytewise together to achieve more complex organisation.

We have the tools at our disposal in Node.js to turn a one-dimensional storage array into a very complex, multidimensional storage system where unrelated, and semi-related data can coexist. So, if the only reason you want to store things in separate stores is because it just feels right to do so, you should probably be looking at what's making you think that way. You may need to update your assumptions.

Technical considerations

That aside, there are some technical considerations for making this decision:

Size and performance

To be clear, LevelDB is fast and it can also store lots of data, it'll handle Gigabytes of data without too much sweat. However, there are some performance concerns when you start getting in to the Gigabyte range, mainly when you're trying to push data in at a high rate. Most use-cases don't do this so be honest about your performance needs. For most people LevelDB is simply fast.

However, if you do have a high-throughput scenario involving a large amount of data that you need to store then you may want to consider having a separate store to deal with the large data and another one to deal with the rest of your data so the performance isn't impacted across the board.

But again, be honest about what your workload is, you're probably not pushing Voxer amounts of data so don't prematurely optimise around the workload you'd like to think you have or are going to have one day in the distant future.

Cache

Caching is transparent by default with LevelDB so it's easy to forget about it when making these kinds of decisions but it's actually quite important for this particular question.

By default, you have an 8M LRU cache with LevelDB and all reads use that cache, for look-ups and also for updating with newly read values. So, you can have a lot of cache-thrash unless you're reading the same values again and again.

But, there is a fillCache (boolean) option for read operations (both get() and createReadStream() and its variations). So you can set this to false where you know you won't be needing fast access to those entries again and you don't want to push out other entries from the LRU.

So caching strategies can be separate for different types of data and are not a strong reason to keep things in a separate data store.

I always recommend that you should tinker with the cacheSize option when you're using LevelDB, it can be as large as you want to fit in the available memory of your machine. As a rule of thumb, somewhere between 2/3 and 3/4 of the available memory should be a maximum if you can afford it.

Consider though what happens if you are using separate LevelDB stores, you now have to deal with juggling cacheSize between the stores. Often, you're probably going to be best served by having a single, large cache that can operate across all your data types and let the normal behaviour of your application determine what gets cached with occasional reliance on 'fillCache': false to fine-tune.

Consistency

As I discussed in my LXJS talk, the atomic batch is an important primitive for building solid database functionality with inherent consistency. When you're using sublevel, even though you have what operate like separate LevelUP instances for each sublevel, you still get to perform atomic batch operations between sublevels. Consider indexing where you may have a primary sublevel for the entries you're writing and a secondary sublevel for the indexing data used to reference the primary data for lookups. If you're running these as separate stores then you lose the benefits of the atomic batch, you just can't perform multiple operations with guaranteed consistency.

Try and keep the atomic batch in mind when building your application, instead of accepting the possibility of inconsistent state, use the batch to keep consistency.

Back-end flexibility

OK, this one is a bit left-field, but remember that LevelUP is back-end-agnostic. It's inspired by LevelDB but it doesn't have to be Google's LevelDB that's storing data for you. It could be Basho's fork or HyperLevelDB. It could even be LMDB or something a little crazy like MemDOWN or mysqlDOWN!

If you're at all concerned about performance, and most people claim to be even though they're not building performance-critical applications, then you should be benchmarking your particular workload against your storage system. Each of the back-ends for LevelUP have different performance characteristics and different trade-offs that you need to understand and test against your needs. You may find that one back-end works for one kind of data in your application and another back-end works for another.

Summary

The TL;DR is: in most cases, a single LevelDB store is generally preferable unless you have a real reason for having separate ones.

Have I missed any considerations that you've come across when making this choice? Let me know in the comments.

Primitives for JS Databases (an LXJS adventure)

I gave a talk yesterday at LXJS yesterday in the "Infrastructure.js" block and tried to talk about JavaScript Database Primitives; i.e. the basic building blocks we have landed on for building more complex database solutions in JavaScript.

The talk certainly wasn't as good or clear as I wanted it to be, it worked much better in my head! A huge venue with over 300 talented JavaScripters, an absolutely massive screen, bright lights and loud amplification got the better of me and I wasn't able to pull the material together how I wanted to. The introvert within me is telling me to become a recluse for a little while just to recover! My hope is that at least one or two people are inspired to give database hacking a go because it's really not that difficult once you get your head around the primitives.

Edit: I wasn't trying to elicit sympathy here, I genuinely think that I wasn't clear on what I was trying to communicate. It went so well in my head, as it usually does, but I fell far short of what I wanted to express. I'll attempt to rectify some of that with a writeup (see next para).

Thankfully though, a portion of the material will be able to serve as the basis for the, long overdue, third part in my three part DailyJS series on LevelDB & Node.

In summary, inspired by LevelDB, we've ended up with a core set of primitives in LevelUP that can be used to build feature-rich and advanced database functionality. Atomic batch and ReadStream are the two non-trivial primitives, open, close, get, put, del are all pretty easy to understand as primitives, although del is perhaps redundant but we're opting for explicitness.

My slides are online but hopefully I'll be able to get my DailyJS article sorted out soon and I'll be able to explain what I was trying to get at.

ReadStream as a primitive query mechanism is not too hard to understand once you get your head around key sorting and the implications for key structure. Batch is a little more subtle and relates to consistency and our ability to augment basic operations to create more complex functionality while keeping the data store in a consistent state.

I additionally raised "Buckets", or "Namespaces" as a primitive concept and discussed how sublevel has effectively become the standard for turning a one-dimensional data store into a multi-dimensional store able to encapsulate contain sophisticated functionality behind what is essentially just a key/value store.

Thanks to the LXJS team

It would be neglectful of me to not say how absolutely grateful I am to the LXJS team for putting so much effort into taking care of speakers; fantastic job.

LXJS is an amazing event, put on by a dedicated and very talented team of people committed to the JavaScript community and the JavaScript community in Portugal in particular. This conference sets a very high bar for community-driven conferences with the way it has managed to get so many locals (and internationals!) involved in running an event in their own time.

David Dias, Ana Hevesi, Pedro Teixeira, Luís Reis, Nuno Job, Tiago Rodrigues, Leo Xavier, Alexander Kustov, André Rodrigues and Bruno Coelho have managed to put on an amazing event and are some of the nicest and talented people I've met. Thank you to you all and everyone else who put on LXJS 2013, your hard work is appreciated and should be an inspiration to everyone involved in our local JavaScript communities, running events or considering running events like this.

NodeConf.eu

Wow, NodeConf.eu was certainly a once-in-a-lifetime event ... although there's talk of a repeat performance next year (don't miss the chance when it comes around!).

Raise that flag

Dominic Tarr, @substack and Julian Gruber raising the NodeConf.eu flag

NodeConf.eu was held in Waterford, Ireland, on an Island, in a Castle and was organised by the Node lovin' company, nearForm, in particular Cian O'Maidin and his amazing assistant Catherine Bradley. Of course Mikeal Rogers had a significant role in organising the event too.

Waterford Castle

Waterford Castle

Pig

The welcome banquet ... yep

Instead of describing the talks, I'll defer to the excellent four part series by Paul, Adam, Luke and Ben of Clock where you'll find a great summary of the talks and events of the conference.

For my part, I was deeply honoured to be involved in the "Node Databases" track of the conference. We started off the NodeConf.eu talks with a 3-part show. My talk was titled "A Real Database Rethink" and was followed by Dominic Tarr who talked more about the Level* ecosystem and the various pieces of the Node Databases puzzle that's being built. Julian Gruber then closed us off with some amazing live-coding of some browser/server streaming LevelUP/multilevel wizardry.

A Real Database Rethink

The slides of my talk are online. I attempted to break down the definition of the term "database" by looking at where the concept comes from historically. It's actually a difficult thing to define and I don't believe there is any one agreed upon meaning. What I came up with is:

A tool for interacting with structured data, externalised from the core of our application

  • Persistence
  • Performance
  • Simplify access to complex data

And sometimes...

  • Shared access
  • Scalability

But even that's pretty rough.

Taking that definition, we can apply Node philosophy of small-core and vibrant user-land, along with the culture of extreme modularity afforded us by npm, and build a new kind of database; or at least apply new thinking to the "database".

The bulk of my talk was taken up with talking about LevelUP and the basics of the Level ecosystem. There's a table on slide #7 that I'm going to try and refine over time to help describe what the Level / NodeBase world is all about.

Level Me Up Scotty!

One of the three workshops available at NodeConf.eu was all about Node Databases. I took the same approach as at CampJS recently where I built Learn You The Node.js For Much Win!, a tool that owes a debt to stream-adventure, a self-guided workshop-in-your-terminal application by @substack and Max Ogden written for NodeConf (US).

This time around, I received some great help from both @substack and Julian Gruber who helped write some exercises, I also received help from Eugene Ware who wasn't even at the conference but was assisting with development from Australia. Raynos was also a great help in getting the application working well.

We ended up with Level Me Up Scotty!, or just levelmeup.

levelmeup

Dominic Tarr, Thorsten Lorenz, Paolo Fragomeni, Matteo Collina, Magnus Skog, Max Ogden and other experienced Levelers helped on and off while the workshops were happening; so we had plenty of expertise at hand whenever there were questions.

Workshops were unstructured and the organisers of each workshop all ended up agreeing that we should just let people come and go as they pleased. This suited us as the workshop was open-ended and designed not to be finished by most people within the originally planned hour (I think that was the original plan).

levelmeup is installed from npm (npm install levelmeup -g) and is fully self-guided. You run the levelmeup application and it steps you through some exercises designed to:

  • introduce you to the format of the workshops with a simple "Hello World" style exercise
  • introduce you to LevelUP and its basic operations
  • help you understand ReadStream and the range-queries it makes possible
  • encourage creative thought regarding key structure
  • introduce sublevel
  • introduce multilevel

There's more planned for the future of this workshop application too, Matteo even has an a work-in-progress exercise that should be merged fairly soon.

nodeschool.io was hatched from NodeConf.eu and pulls together the three workshop applications currently available in npm. I believe this was an initiative of Brian J. Brennan and other Mozillans on the Open Badges project. workshopper is the engine that runs both learnyounode and levelmeup and we're trying to make it even easier for others to author their own workshop applications. There is already a Functional Javacript Workshop by Tim Oxley and there are more in development. Exciting times!

Level Me Up Workshoppers

Workshoppers stretching their brains with levelmeup

My experience with stream-adventure and learnyounode suggested that this format should prove to be relatively successful but ultimately I think we had most of the attendees come through at some point and sit down to have a crack at the workshop. This is particularly impressive given that Emily Rose, Elijah Insua and Matteo were running a NodeBots workshop which included Arduino and NodeCopter hacking (always popular!). And Max Bruning and TJ Fontaine were running a Manta / MDB / DTrace / SmartOS-magic workshop and their material was some of my favourite from NodeConf (US) so I'm sure people really enjoyed what they had to present.

Unfortunately I didn't get to attend these other workshops, I also missed out on some skeet!

Skeet

Karolina "don't mess with me" Szczur, photo by Matthew Bergman

But there was plenty of other experience to be had. It was also fantastic to meet so many people I only knew from IRC / Twitter / GitHub. For someone who lives in regional Australia and doesn't get a chance to socialise much with other nerds, this was a particularly special opportunity.

Shenanigans

Final night banquet shenanigans with Charlie McConnell and @substack ... the napkin hat thing is a story in itself, blame Jessica Lord, photo by Matthew Bergman

The Level* Gang

As an aside, NodeConf.eu had the largest concentration of LevelUP contributors and active Level* developers of any event that I'm aware of so far. So we took the opportunity to have our own little meeting. We even took minutes, of sorts.

There has been a long-standing plan to make a Level* / NodeBase website but being the disorganised rabble we are, it hasn't got off the ground. Karolina (and Jessica too I believe) are keen to help out on the design end but just need the content. So that's what we planned. There's a bunch of issues that form a TODO in the repo for this project. Hopefully we can all get on top of it sooner rather than later. We're also open to assistance from anyone else that would like to contribute.

Besides getting stuff done, it was just a pleasure to hang out with these people and talk shop.

A momentus event

The Level* Gang: Paolo, Dominic, @substack, Karolina, Magnus, Mikeal, Julian, Max, Matteo and Paul Fryzel. Raynos was around but missed this particular event, Thorsten was inside demoing his guitar-typing software.

Learn You The Node.js

CampJS has just finished, with a bigger crowd than last time around. It was lots of fun, and as usual, these events are more about meeting the people I collaborate, and socialise with online than anything else. There was a particularly large turn-out of the hackers on #polyhack, our Australian programmers channel on Freenode. Even @mwotton, our resident Haskell-troll was there! Lots of photos and news can be found on Storify. The next one will likely be near Melbourne in February some time and I highly recommend it if you can get there.

Learn You The Node.js For Much Win (presentation)

I was struck last CampJS how many JavaScript newbies were there, or at least people who deal with JavaScript as a secondary language and therefore only have a cursory understanding of it. And by extension, there were not many people who had much understanding of Node. So I wanted to present some intro-to-Node material this time.

I gave a 30 minute talk covering the very basics of what Node is, called Learn You The Node.js For Much Win. Obviously the title is inspired by Learn You a Haskell For Great Good and Learn You Some Erlang For Great Good. You can find my slides here (feel free to rip them off if you need to give a similar talk somewhere!). The video may be online at some point in the future.

Learn You The Node.js For Much Win (workshop)

The next morning, I gave a workshop on the same topic but it was much more hands-on. The inspiration for my workshop came from NodeConf, a couple of months earlier. @substack and @maxogden presented a workshop titled stream adventure which was a self-guided, interactive workshop for the terminal, built with Node. You can find it here and install it from npm with npm install stream-adventure -g, I highly recommend it.

NPM

I was so inspired that I stole their code and made my own workshop application! learnyounode. You can download and install it with npm install learnyounode -g.

NPM

The application itself is/was a series of 13 separate workshops. Starting off with a simple HELLO WORLD and ending with a JSON API HTTP server (contributed by the very clever @sidorares).

learnyounode

Nobody actually managed to finish the workshops in the allotted 60 minutes, although @alexdickson, an expert JavaScripter but Node-n00b was the first one I heard of finishing it not long after.

The workshops attempt to focus on some of the core concepts of Node. There's lots of console output because that's easiest to validate but it introduces filesystem I/O, both synchronous and asynchronous and moves straight on to networking because that's what Node is so good at. An HTTP CLIENT example, introduces HTTP and is expanded on in HTTP COLLECT which introduces streams. JUGGLING ASYNC builds on HTTP COLLECT to introduce the complexities of managing parallel asynchronous activities. From there, it switches from network clients to network servers, first a simple TCP server in TIME SERVER and then using streams to serve files in HTTP FILE SERVER and transforming data with HTTP UPPERCASERER. The final exercise presents you with a more complex, closer-to-real-world example, an HTTP API server with multiple end-points.

The entire workshop is designed to take longer than 1-hour, people ought to be able ot take it away and complete it later. It's also designed to be suitable for complete n00bs and also people with some experience, it ought to make a fun challenge for anyone already experienced with Node to see how quickly they can complete the examples (I believe I earned the honour of being the first person at NodeConf to finish stream-adventure in the allotted time!).

The Node-experts at CampJS were thankfully helping out during the workshop so there wasn't much competition going on there.

Many thanks to these expert Node nerds who hovered and helped people during the workshop and also did some test-driving of the workshop prior to the event:

(I really hope I haven't missed anyone out there; so many quality nerds at CampJS!)

Tim Oxley making a contribution during the workshop, along with Christopher Giffard (left) and Eugene Ware (right)

I had the solutions to the workshop ready on the big-screen and walked through some of the early solutions and talked through what was going on. I didn't expect many people to listen to those bits and the workshop was designed so you could totally zone-out and do it at your own pace if that suited.

If anyone wants to run a similar style workshop for their local meet-up, using the same content, I'd love to receive contributions to learnyounode. Alternatively, make your own! I extracted the core framework from learnyounode and it now lives separately as workshopper.

NPM

I would love feedback from anyone in attendance or anyone that uses this tool to run their own workshops! learnyounode is already listed in Max Ogden's excellent The Art of Node, so I'm looking forward to contributions to help turn this into a really useful teaching tool.