r.va.gg2018-09-05T13:39:41.472Zhttps://r.va.gg/Rod VaggThe perils of private politics in open source2018-09-05T00:00:00.000Zhttps://r.va.gg/2018/09/the-perils-of-private-politics-in-open-source.html
<h3 id="the-node-community-and-its-leadership-is-evolving">The Node community and its leadership is evolving</h3>
<p>Node Summit this year was an interesting, and encouraging experience. The stage was full of fresh faces. Fresh faces who weren't there just because they were fresh faces—the project and its surrounding community has genuinely refreshed. A couple of moments in the "hallway track" were instructive as we saw some historical big-names of Node, like Mikeal Rogers, go unrecognised by the busy crowds of active Node users. Node has not only grown up, but it's moved forward as any successful project should, and it has a new faces with fresh passion.</p>
<p>Node.js is a complex open source project to manage. There's a huge amount of activity surrounding it and it's become a critical piece of internet infrastructure, so it needs to be stable and trustworthy. Unfortunately this means that it needs corporate structures around it as it interacts with the world outside. One of the roles of the Node.js Foundation is to serve this purpose for the project.</p>
<h3 id="the-inevitability-of-politics">The inevitability of politics</h3>
<p>Politics is an unfortunate fact for open source projects of notable size. Over time, the cruft of corporate and personal politics builds up and creates messy baggage. Like an iceberg, the visible portion of politics represents only a small amount of what's built up over time and what's currently brewing. I had the displeasure of seeing behind the curtain of Node politics as I became more heavily involved, 6 or so years ago. It's not exactly a pretty sight, but I'm certain it's not unique to Node. For the most part, we have a tacit agreement to keep a lot of the messy stuff private. I suppose the theory here is that there's no reason to taint the rosy view of users and contributors who will never be directly impacted by it, and will never even have reason to see it. Speaking about a lot of these things would simply fall into the <em>gossip</em> category, in fact. So we set it aside and move on. But it never really goes away, the pile of cruft just gets bigger.</p>
<p>On many occasions I have watched as idealistic new contributors, technical and non-technical, are raised up to positions where they become exposed to the private politics. It's disheartening to stand by (or worse: be involved in the process), as someone with a rosy view of the Node community is lead, nose first, into its smelly armpits. Of course the armpits are only a small part of the body, so having a rosy view isn't illegitimate if you don't intend to get all up into the smelly bits! But leadership requires a good amount of exposure to the smelly parts of Node.</p>
<p>Many people who step into the <em>elite</em> realms of the Technical Steering Committee (TSC) and Community Committee eventually discover that there's a lot behind the curtains. I use the term "elite" sarcastically here, because that's not what these bodies are intended to be. But the fact of having a curtain creates an unfortunate tendency toward a separateness.</p>
<p>I'm confident that most, if not all, of the people currently on the TSC and the Community Committee, are in open source because they're attracted to <em>open</em> community. That's what's so great about Node and other thriving open source ecosystems. Many of us are in awkward positions where we have a foot in the corporate world and a foot in open source, so we have to learn to operate in different modes. We end up making choices about how we conduct ourselves regarding the community vs corporate and it not easy to stay conscious of the different requirements of our various roles.</p>
<p>There's a danger in being drawn up into these kinds of prominent positions in a complex open source project. We get exposed to the armpit, whether we like it or not. Making matters more complex, we are backed by, and therefore have to interact with, a corporation: the Node.js Foundation. The Foundation is run by a board of highly experienced corporate-political operators (and I mean no disrespect by this—it takes a significant amount of skill to navigate to the kinds of positions in large companies that make you an obvious choice to sit on such boards). Furthermore, the Node.js Foundation is essentially a shell that lives inside the Linux Foundation, a <em>very</em> rich source of corporate politics of the open source variety.</p>
<p>So we are faced with competing pressures: our personal passions for open communities, and the complexities of private corporate-style politics that don't fit very well with "openness" and "transparency". I've felt this since the Node.js Foundation begun—I sat on the board for two terms at the beginning, participating in those politics. But I've also been one of the loudest voices for transparency and "open governance"—a model that I championed prior to it being adopted by io.js, followed by Node.js under the Foundation. But to be honest, my record is mixed, just like everyone else who has had to navigate similar positions. None of us leave unscathed.</p>
<p>A TSC representative to the board usually wants to be able to report important items to the TSC where they are allowed, and also use the TSC as a sounding-board as they participate in the corporate decision-making process. The dual pressures of "this is sensitive and private" vs "we should be open and transparent" are nasty. Corporate structures and the kinds of politics they engender are not inherently bad, they are arguably necessary in the world in which we exist. In professionalised open source, we are trying to squish together openness and the closed nature of the corporate world which creates a lot of tension and conflicting incentives. It's not hard to understand why many open source projects actively avoid this kind of "professionalisation".</p>
<h3 id="serving-as-counter-balance-to-the-corporate">Serving as counter balance to the corporate</h3>
<p>Here's the critical part that's so easy to lose sight of: <strong>those on the open source side should be the advocates for openness</strong>. That's a large reason we get to be at the big table and it's on us to keep the pressure on, to ensure that a tension continues to exist. Those on the other side are advocates (in some cases legally so) of the corporate approach. The lure of private politics is so strong that it will always have momentum on its side, and it takes very conscious effort to push back against it. From discussions during the formation of the Foundation, I know that there are many on the corporate side that <em>expect</em> open source folks to provide this kind of pressure, that's part of the system's design. Perhaps we should even insert our responsibility as advocates for openness and transparency as an explicit feature of project governance.</p>
<p>The temptation to "keep it private", "take it offline", or "get it right before going public", is natural, because it honestly makes many things easier in the short-term, i.e., it's the path of expedience. Our heated sociopolitical environment today makes this worse; with the potential for drama and mob behaviour leading to an understandable risk aversion.</p>
<p>And so we have private mailing lists, private sections of "public" meetings, one-on-one strategy discussions, huddles in the corners of conferences where we occasionally meet in person. We have to keep the wheels turning and it's easier to just push things through privately than have to deal with the friction of public discussion, feedback, criticism, <em>drama</em>.</p>
<p>But that's neglecting what our communities charge us with! Particularly for Node.js, where we have a governance model that makes it clear that the TSC doesn't own or run the project. <strong>The TSC is intended to simply be a decision-making fallback</strong>.</p>
<p>The project is intended to be owned and run by its contributors. The TSC should be the facilitators of that, and reluctantly involve itself collectively where individual contributors can't find a way forward. The Community Committee is intended take an outward facing role, making for a different kind of challenging bargain when they get sucked in. Today we have various stake-holders asking for "the TSC's opinion", or "the Community Committee's opinion" on matters. I even tried to get that when I was a board member, attempting to "represent the TSC," which I believed was my role at the time. But there really should be no such thing as the TSC or Community Committee's "opinion", it's frankly absurd when you consider that these bodies are supposed to be made up of a broad diversity of viewpoints.</p>
<p>The more we accept private decision-making and politicking, the more we undermine community-focused governance.</p>
<h3 id="a-busy-time-for-private-politicking">A busy time for private politicking</h3>
<p>It could be an artefact of my perspective, but it seems that we have a nexus of some very heavy private political discussions happening in Node.js-land at the moment. My fear is that it's the sign of a trend, and I hope this post can help serve as a corrective.</p>
<p>Some highlights that you probably won't see the context of on GitHub include:</p>
<ol>
<li>Discussions about the Node.js Foundation's annual conference, Node.js Interactive, which was renamed this year to JS Interactive, and then renamed again recently to Node+JS Interactive. The Foundation's executive decided to involve the TSC and Community Committee in that last decision and it was resolved entirely in private (as far as I'm aware) and then announced to the world. I personally thought the switch to "JS Interactive" was a mistake. I also thought that changing the name <em>again</em> was a mistake. To be honest (and in hindsight), however, I'd rather the executive didn't even draw the TSC and Community Committee in, as collectives, to these private discussions, because we've now become complicit in the private decision making process. It's really not a good look for either committee to be involved in surprise major announcements—that stands in stark contrast to our open decision making processes. Seek individual feedback, sure, but this also goes back to my point about the problems with seeking collective opinions.</li>
<li>Discussions about the efficacy of the Foundation as an executive body, particularly as it focuses on filling an empty Executive Director (i.e. CEO) chair. I'll admit guilt to fuelling a lot of this discussion myself, I've been a strong critic of the Foundation in recent times. However, those of us with something to say either need to be bold enough to be public with critique, take it directly to decision-makers involved, or butt-out entirely. As an advocate of openness and transparency, I'd suggest that public discussion on these matters would be fruitful because so many people and organisations are impacted by them.</li>
<li>Discussions about very major restructuring of the Node.js Foundation itself. Having implications that would call for large changes to the by-laws, including changing the very purpose of the Foundation. I don't want to be the one to speak publicly about this, but I would like to see those who are driving this discussion be able to make their case in public sooner rather than later. This will again lead to surprise major announcements that the TSC and Community Committee will again be complicit in. The board should either make such changes and own the responsibility for it, or should set up an open process for feedback and discussion. The TSC and Community Committee should be rejecting the requests made of them to <em>be</em> the source of feedback and discussion prior to major changes. These bodies can facilitate broader discussions, but they are not the source of definitive opinion or truth for the Node.js community.</li>
<li>Discussions about major changes to Node.js project governance, instigated by parties external to the TSC and Community Committee, entirely in private and with significant political and ideological pressure. Large discussion threads and entire meetings have been devoted to these matters already, without one hint to the outside world that a flurry of pull requests may soon appear with little context. Given some of the ideological content there is potential for more <em>drama</em> so I have a lot of sympathy for people for wanting to take the easy path with this. I'm close enough to the centre of these matters that I will likely write publicly about them soon. I have very strong opinions about what's good for <em>open governance,</em> and maintaining a diversity of opinion and viewpoints on the critical bodies surrounding Node.js. My primary objection to the process conducted so far is that any discussions about change in open source governance by governing bodies <em>must</em> be conducted in the open. And that any private collective change-planning by these bodies undermines the community-focused open governance model that Node.js has adopted.</li>
</ol>
<h3 id="finding-a-balancing-point">Finding a balancing point</h3>
<p>I'd like to <em>not</em> give private politics any more legitimacy in open source. Leave it for the corporate realm. I'd like for the TSC and Community Committee to adopt an explicit position of being <em>against</em> closed-door discussions wherever possible and being advocates for openness and transparency. That will cause personal conflicts, it will mean difficulty for the Node.js Foundation board and the executive when they want to engage in "sensitive" topics. But the definition of "sensitive" on our side should be different. What's more, we are not there to solve their problems, it's precisely the opposite!</p>
<div style="text-align: center;"><img src="/2018/09/politics-venn.png" alt="The ideal dynamic between open source and its corporate partners" title="The ideal dynamic between open source and its corporate partners" width=800 height=505 align="center" /></div>
<p>Here's my initial suggestions for guidelines:</p>
<ol>
<li>If you want to call something "sensitive" then it better involve something personal about an individual who could be hurt by public discussion of the matter.</li>
<li>If you have a proposal for changing anything and you're not prepared to receive public feedback and critique, then it's not worth us discussing as groups.</li>
<li>The TSC and Community Committees should refuse, as much as possible, to be involved as collectives in decision-making processes that must be private for corporate governance or legal reasons. That's not their role and it's unfair to force them into an awkward corner.</li>
</ol>
<p>The wriggle-room between treating the TSC and Community Committee as "groups" vs pulling individuals from those groups into private politics as a proxy is a tricky one that every individual is going to have to negotiate. I would hope that having these groups explicitly state their preference for openness, while highlighting the risks would go a long way to creating the healthy tension that we need with our corporate partners.</p>
<p><em>I hope it’s obvious, but this is all my personal opinion and does not necessarily represent that of my employer nor any bodies surrounding Node.js that I’m involved in.</em></p>
Node.js and the "HashWick" vulnerability2018-08-30T00:00:00.000Zhttps://r.va.gg/2018/08/node.js-and-the-hashwick-vulnerability.html
<p>The following post was originally published on the <a href="https://nodesource.com/blog/node-js-and-the-hashwick-vulnerability">NodeSource Blog</a>. This text is copyright NodeSource and is reproduced with permission.</p>
<hr>
<p>Yesterday, veteran Node.js core contributor and former Node.js TSC member Fedor Indutny published an article on his personal blog detailing a newly-discovered vulnerability in V8. Named <a href="https://darksi.de/12.hashwick-v8-vulnerability/">HashWick</a>, this vulnerability will need to be addressed by Node.js, but as yet has not been patched.</p>
<p>This article will cover the details surrounding the disclosure yesterday, and explain some of the technical background. As a patch for Node.js is not yet available, I will also present some mitigation options for users and discuss how this vulnerability is likely to be addressed by Node.js.</p>
<h2 id="responsible-disclosure">Responsible disclosure</h2>
<p>Fedor originally reported this vulnerability to V8 and the Node.js security team in May. Unfortunately, the underlying issues are complex, and Node's use of older V8 engines complicates the process of finding and applying a suitable fix. The Node.js TSC delegated responsibility to the V8 team to come up with a solution.</p>
<p>After reporting the vulnerability, Fedor followed a standard practice of holding off public disclosure for 90 days, and although a fix has yet to land in Node, he published high-level details of his findings. </p>
<p>It is worth pointing out that Fedor’s disclosure does not contain code or specific details on how to exploit this vulnerability; moreover, to exploit HashWick a malicious party would need to tackle some fairly difficult timing analysis. However, knowledge that such a vulnerability exists, and can potentially be executed on a standard PC, is likely to spur some to reverse engineer the details for themselves.</p>
<p>These circumstances leave us all in an awkward situation while we wait for a fix, but I expect this disclosure to result in security releases in Node.js in the coming weeks.</p>
<h2 id="vulnerability-details">Vulnerability details</h2>
<p>There are three important concepts involved in this vulnerability:</p>
<ol>
<li>Hash functions and hash tables</li>
<li>Hash flooding attacks</li>
<li>Timing analysis</li>
</ol>
<h3 id="hash-functions">Hash functions</h3>
<p>Hash functions are a fundamental concept in computer science. They are typically associated with cryptography, but are widely used for non-cryptographic needs. A <a href="https://en.wikipedia.org/wiki/Hash_function">hash function</a> is simply any function that takes input data of some type and is able to repeatedly return output of a predictable size and range of values. An ideal hash function is one that exhibits apparent randomness and whose results spread evenly across the output range, regardless of input values.</p>
<p>To understand the utility of such functions, consider a "sharded" database system, divided into multiple storage backends. To route data storage and retrieval, you need a routing mechanism that knows which backend that data belongs in. Given a key, how should the routing mechanism determine where to <em>put</em> new data, and then where to <em>get</em> stored data when requested? A random routing mechanism isn't helpful here, unless you also want to store metadata telling you which random backend a particular key's value was placed in.</p>
<p>This is where hash functions come in handy. A hash function would allow you to take any given key and return a “backend identifier” value, directing the routing mechanism to assign data to a particular backend. Despite apparent randomness, a good hash function can thus distribute keys across all of your backends fairly evenly.</p>
<p>This concept also operates at the most basic levels of our programming languages and their runtimes. Most languages have hash tables of some kind; data structures that can store values with arbitrary keys. In JavaScript, almost any object can become a hash table because you can add string properties, and store whatever values you like. This is because <code>Object</code> is a form of hash table, and almost everything is related to <code>Object</code> in some way. <code>const foo = { hash: 'table' }</code> stores the value <code>'table'</code> at key <code>'hash'</code>. Even an <code>Array</code> can take the form of a hash table. Arrays in JavaScript are not limited to integer keys, and they can be as sparse as you like: <code>const a = [ 1, 2, 3 ]; a[1000] = 4; a['hash'] = 'table';</code>. The underlying storage of these hash tables in JavaScript needs to be practical and efficient.</p>
<p>If a JavaScript object is backed by a memory location of a fixed size, the runtime needs to know where in that space a particular key's value should be located. This is where hash functions come in. An operation such as <code>a['hash']</code> involves taking the string <code>'hash'</code>, running it through a hash function, and determining exactly where in the object's memory storage the value belongs. But here's the catch: since we are typically dealing with small memory spaces (a new <code>Array</code> in V8 starts off with space for only 4 values by default), a hash function is likely to produce "collisions", where the output for <code>'hash'</code> may collide with the same location as <code>'foo'</code>. So the runtime has to take this into account. V8 deals with collision problems by simply incrementing the storage location by one until an empty space can be found. So if the storage location for <code>'hash'</code> is already occupied by the value of <code>'foo'</code>, V8 will move across one space, and store it there if that space is empty. If a new value has a collision with either of these spaces, then the incrementing continues until an empty space is found. This process of incrementing can become costly, adding time to data storage operations, which is why hash functions are so important: a good hash function will exhibit maximum randomness.</p>
<h3 id="hash-flooding-attacks">Hash flooding attacks</h3>
<p>Hash flooding attacks take advantage of predictability, or poor randomness, in hash functions to overwhelm a target and force it to work hard to store or look up values. These attacks essentially bypass the utility of a hash function by forcing excessive work to find storage locations.</p>
<p>In our sharded data store example above, a hash flood attack may involve an attacker knowing exactly how keys are resolved to storage locations. By forcing the storage or look-up of values in a single backend, an attacker may be able to overwhelm the entire storage system by placing excessive load on that backend, thereby bypassing any load-sharing advantage that a bucketing system normally provides.</p>
<p>In Node.js, if an attacker knows exactly how keys are converted to storage locations, they may be able to send a server many object property keys that resolve to the same location, potentially causing an increasing amount of work as V8 performs its check-and-increment operations finding places to store the values. Feed enough of this colliding data to a server and it'll end up spending most of its time simply trying to figure out how to store and address it. This could be as simple as feeding a JSON string to a server that is known to parse input JSON. If that JSON contains an object with many keys that all collide, the object construction process will be very expensive. This is the essence of a denial-of-service (DoS) attack: force the server to do an excessive amount of work, preventing it from being able to perform its normal functions.</p>
<p>Hash flooding is a well known attack type, and standard mitigation involves very good hash functions, combined with additional randomness: <strong><em>keyed hash functions</em></strong>. A keyed hash function, is a hash function that is seeded with a random key. That same seed is provided with every hash operation, so that together, the seed and an input value yield the same output value. Change the seed, and the output value is entirely different. In this way, it is not good enough to simply know the particular hash function being used, you also need to know the random seed the system is using.</p>
<p>V8 uses a keyed hash function for its object property storage operations (and other operations that require hash functions). It generates a random key at start-up and keeps on using that key for the duration of the application's lifetime. To execute a hash flood type attack against V8, you need to know the random seed it's using internally. This is precisely what Fedor has figured out how to do—determine the hash seed used by an instance of V8 by inspecting it from the outside. Once you have the seed, you can perform a hash flood attack and render a Node.js server unresponsive, or even crash it entirely.</p>
<h3 id="timing-attacks">Timing attacks</h3>
<p>We covered timing attacks in some detail in our <a href="https://nodesource.com/blog/node-js-security-release-summary-august-2018">deep dive of the August 2018 Node.js security releases</a>. A timing attack is a method of determining sensitive data or program execution steps, by analyzing the time it takes for operations to be performed. This can be done at a very low level, such as most of the recent high-profile vulnerabilities reported against CPUs that rely on memory look-up timing and the timing of other CPU operations.</p>
<p>At the application level, a timing attack could simply analyze the amount of time it takes to compare strings and make strong guesses about what's being compared. In a sensitive operation such as <code>if (inputValue == 'secretPassword') ...</code>, an attacker may feed many string variations and analyze the timing. The time it takes to process a <code>inputValue</code>s of <code>'a'</code>, <code>'b'</code> ... <code>'s'</code> may give enough information to assume the first character of the secret. Since timing differences are so tiny, it may take many passes and an average of results to be able to make strong enough inference. Timing attacks often involve a <em>lot</em> of testing and a timing attack against a remote server will usually involve sending a <em>lot</em> of data.</p>
<p>Fedor's attack against V8 involves using timing differences to work out the hash seed in use. He claims that by sending approximately 2G of data to a Node.js server, he can collect enough information to reverse engineer the seed value. Thanks to quirks in JavaScript and in the way V8 handles object construction, an external attacker can force many increment-and-store operations. By collecting enough timing data on these operations, combined with knowledge of the hash algorithm being used (which is no secret), a sophisticated analysis can unearth the seed value. Once you have the seed, a hash flood attack is fairly straightforward.</p>
<h2 id="mitigation">Mitigation</h2>
<p>There are a number of ways a Node.js developer can foil this type of attack without V8 being patched, or at least make it more difficult. These also represent good practice in application architecture so they are worth implementing regardless of the impact of this specific vulnerability.</p>
<p>The front-line for mitigating against timing attacks for publicly accessible network services is <strong>rate limiting</strong>. Note that Fedor needs to send 2G of data to determine the hash seed. A server that implements basic rate limiting for clients is likely to make it more difficult or impractical to execute such an attack. Unfortunately, such rate limiting needs to be applied <em>before</em> too much internal V8 processing is allowed to happen. A <code>JSON.parse()</code> on an input string <em>before</em> telling the client that they have exceeded the maximum requests for their IP address won't help mitigate. Additionally, rate limiting may not mitigate against distributed timing attacks, although these are much more difficult to execute due to the variability in network conditions across multiple clients, leading to very fuzzy timing data.</p>
<p>Other types of <strong>input limiting</strong> will also be useful. If your service blindly applies a <code>JSON.parse()</code>, or other operation, to any length of input, it will be much easier for an attacker to unearth important timing information. Ensure that you have basic input limit checks in place and your network services don't blindly process whatever they are provided.</p>
<p>Standard <strong>load balancing</strong> approaches make such attacks more difficult too. If a client cannot control which Node.js instance it is talking to for any given connection, it will be much more difficult to perform a useful timing analysis of the type Fedor has outlined. Likewise, if a client has no way to determine which unique instance it has been talking to (such as a cookie that identifies the server instance), such an attack may be impossible given a large enough cluster.</p>
<h3 id="the-future-for-v8">The future for V8</h3>
<p>As Fedor outlined in his post, the best mitigation comes from V8 fixing its weak hash function. The two suggestions he has are:</p>
<ol>
<li>Increase the hash seed size from 32 bits to 64 bits</li>
<li>Replace the hash function with something that exhibits better randomness</li>
</ol>
<p>The key size suggestion simply increases the complexity and cost of an attack, but doesn't make it go away. Any sufficiently motivated attacker with enough resources may be able to perform the same attack, just on a different scale. Instead of 2G of data, a lot more may need to be sent and this may be impossible in many cases.</p>
<p>A change of hash function would follow a practice adopted by many runtimes and platforms that require hash functions but need to protect against hash flood attacks. <a href="https://en.wikipedia.org/wiki/SipHash">SipHash</a> was developed specifically for this use and has been slowly adopted as a standard since its introduction 6 years ago. Perl, Python, Rust and Haskell all use SipHash in some form for their hash table data structures.</p>
<p>SipHash has properties similar to constant-time operations used to mitigate against other forms of timing attacks. By analyzing the timing of the hash function, you cannot (as far as we know) make inference about the seed being used. SipHash is also fast in comparison to many other common and secure keyed hash functions, although it may not be faster than the more naive operation V8 is currently using. Ultimately, it’s up to the V8 authors to come up with an appropriate solution that takes into account the requirement for security and the importance of speed.</p>
Background Briefing: August Node.js Security Releases2018-08-29T00:00:00.000Zhttps://r.va.gg/2018/08/background-briefing-august-node.js-security-releases.html
<p>The following post was originally published on the NodeSource Blog as <a href="https://nodesource.com/blog/node-js-security-release-summary-august-2018">Node.js Security Release Summary - August 2018</a>. This text is copyright NodeSource and is reproduced with permission. This is a deep-dive into the security vulnerabilities described in my brief summary on the Node.js blog as <a href="https://nodejs.org/en/blog/vulnerability/august-2018-security-releases/">August 2018 Security Releases</a>.</p>
<hr>
<p>This month's Node.js security releases are primarily focused on upgrades to the OpenSSL library. There are also two minor Node.js security-related flaws in Node.js' <code>Buffer</code> object. All of the flaws addressed in the OpenSSL upgrade and the fixes to <code>Buffer</code> can be classified as either "low" or "very low" in severity. However, this assessment is generic and may not be appropriate to your own Node.js application. It is important to understand the basics of the flaws being addressed and make your own impact assessment. Most users will not be impacted at all by the vulnerabilities being patched but specific use-cases may cause a high severity impact. You may also be exposed via packages you are using via npm, so upgrading as soon as practical is always recommended.</p>
<p>Node.js switched to the new 1.1.0 release line of OpenSSL for version 10 earlier this year. Before Node.js 10 becomes LTS in October, we expect to further upgrade it to OpenSSL 1.1.1 which will add TLS 1.3 support. Node.js' current LTS lines, 8 ("Carbon") and 6 ("Boron") will continue to use OpenSSL 1.0.2.</p>
<p>In the meantime, OpenSSL continues to support their 1.1.0 and 1.0.2 release lines with a regular stream of security fixes and improvements and Node.js has adopted a practice of shipping new releases with these changes included shortly after their release upstream. Where there are non-trivial "security" fixes, Node.js will generally ship LTS releases with only those security fixes so users have the ability to drop in low-risk upgrades to their deployments. This is the case for this month's releases.</p>
<p>The August OpenSSL releases of versions 1.1.0i and 1.0.2p are technically labelled "bug-fix" releases <a href="https://mta.openssl.org/pipermail/openssl-announce/2018-August/000129.html">by the OpenSSL team</a> but they do include security fixes! The reason this isn't classified as a security release is that those security fixes have already been disclosed and the code is available on GitHub. They are low severity, and one of the three security items included doesn't even have a CVE number assigned to it. However, this doesn't mean they should be ignored. You should be aware of the risks and possible attack vectors before making decisions about rolling out upgrades.</p>
<h2 id="openssl-client-dos-due-to-large-dh-parameter-cve-2018-0732-https-www-openssl-org-news-secadv-20180612-txt-">OpenSSL: Client DoS due to large DH parameter (<a href="https://www.openssl.org/news/secadv/20180612.txt">CVE-2018-0732</a>)</h2>
<p>All actively supported release lines of Node.js are impacted by this flaw. Patches are included in both OpenSSL 1.1.0i (Node.js 10) and 1.0.2p (Node.js 6 LTS "Boron" and Node.js 8 LTS "Carbon").</p>
<p>This fixes a potential denial of service (DoS) attack against <em>client</em> connections by a malicious server. During a TLS communication handshake, where both client and server agree to use a cipher-suite using DH or DHE (Diffie–Hellman, in both ephemeral and non-ephemeral modes), a malicious server can send a very large prime value to the client. Because this has been unbounded in OpenSSL, the client can be forced to spend an unreasonably long period of time to generate a key, potentially causing a denial of service.</p>
<p>We would expect to see a higher severity for this bug if it were reversed and a client could impose this tax on servers. But in practice, there are more limited scenarios where a denial of service is practical against client connections.</p>
<p>The <a href="https://github.com/openssl/openssl/commit/ea7abeeab">fix</a> for this bug in OpenSSL limits the number of bits in the prime modulus to 10,000 bits. Numbers in excess will simply fail the DH handshake and a standard SSL error will be emitted.</p>
<p>Scenarios where Node.js users may need to be concerned about this flaw include those where your application is making client TLS connections to untrusted servers, where significant CPU costs in attempting to establish that connection is likely to cause cascading impact in your application. A TLS connection could be for HTTPS, encrypted HTTP/2 or a plain TLS socket. An "untrusted server" is one outside of your control and not in the control of trustworthy third-parties. An application would likely need to be forced to make a large number of these high-cost connections for an impact to be felt, but you should assess your architecture to determine if such an impact is likely, or even possible.</p>
<h2 id="openssl-cache-timing-vulnerability-in-rsa-key-generation-cve-2018-0737-https-www-openssl-org-news-secadv-20180416-txt-">OpenSSL: Cache timing vulnerability in RSA key generation (<a href="https://www.openssl.org/news/secadv/20180416.txt">CVE-2018-0737</a>)</h2>
<p>Node.js is not impacted by this vulnerability as it doesn't expose or use RSA key generation functionality in OpenSSL. However, it is worth understanding some of the background of this vulnerability as we are seeing an increasing number of software and hardware flaws relating to potential timing attacks. Programming defensively so as to not expose the timing of critical operations in your application is just as important as sanitizing user input while constructing SQL queries. Unfortunately, timing attacks are not as easy to understand, or as obvious, so tend to be overlooked.</p>
<p>Side-channel attacks are far from new, but there is more interest in this area of security, and researchers have been focusing more attention on novel ways to extract hidden information. <a href="https://spectreattack.com/">Spectre and Meltdown</a> are the two recent high-profile examples that target CPU design flaws. CVE-2018-0737 is another example, and itself uses hardware-level design flaws. A <a href="https://eprint.iacr.org/2018/367.pdf">paper</a> by Alejandro Cabrera Aldaya, Cesar Pereida García, Luis Manuel Alvarez Tapia and Billy Bob Brumley from Universidad Tecnológica de la Habana (CUJAE), Cuba, and Tampere University of Technology, Finland outlines a cache-timing attack on RSA key generation, the basis of this OpenSSL flaw.</p>
<p>The CVE-2018-0737 flaw relies on a "<a href="https://www.usenix.org/node/184416">Flush+Reload attack</a>" which targets the last-level of cache on the system (L3, or level-3 cache on many modern processors). This type of attack exploits the way that Intel x86 architectures structure their cache and share it between processors and processes for efficiency. By setting up a local process that shares an area of cache memory with another process you wish to attack, you can make high-confidence inferences about the code being executed in that process. The attack is called "Flush+Reload" because the process executing the attack, called the "spy", causes a flush on the area of cache containing a piece of critical code, then waits a small amount of time and reloads that code in the cache. By measuring the amount of time the reload takes, the spy can infer whether the process under attack loaded, and therefore executed, the code in question or not. This attack looks at code being executed, not data, but in many cryptographic calculations, the sequence of operations can tell you all you need to know about what data is being generated or operated on. These attacks have been successfully demonstrated against different implementations of RSA, ECDSA and even AES. The attack has been shown to work across virtual machines in shared environments under certain circumstances. One researcher even demonstrated the ability to detect the sequence of operations executed by a user of <code>vi</code> on a shared machine.</p>
<p>An important take-away about cache-timing attacks is that they require local access to the system under attack. They are an attack type that probes the physical hardware in some way to gather information. Public clouds are usually not vulnerable because of the way cache is configured and partitioned, but we shouldn't assume we won't see new novel timing attacks that impact public clouds in the future. Of course browsers blur the definition of "local code execution", so we shouldn't be complacent! CVE-2018-0737 is marked as "Low" severity by the OpenSSL team because of the requirement for local access, the difficulty in mounting a successful attack and the rare circumstances in which an attack is feasible.</p>
<p>The best protection against Flush+Reload and many other classes of timing attacks is to use <strong>constant-time operations</strong> for cryptographic primitives and operations that expose potentially sensitive information. If an operation follows a stable code path and takes a constant amount of time regardless of input or output then it can be hard, or impossible to make external inference about what is going on. An operation as simple as <code>if (userInput === 'supersecretkey') { ... }</code> can be vulnerable to a timing attack if an attacker has the ability to execute this code path enough times. In 2014, as the PHP community debated switching certain operations to constant-time variants, <a href="https://blog.ircmaxell.com/2014/11/its-all-about-time.html">Anthony Ferrara</a> wrote a great piece about timing attacks and the types of mitigations available. Even though it addresses PHP specifically, the same concepts are universal.</p>
<p>The fix that OpenSSL applied for CVE-2018-0737 was a straight-forward switch to constant-time operations for the code in question. For RSA, this has the effect of masking the operations being performed from side-channel inspection, such as the use of cache.</p>
<p>Be aware that Node.js has a <a href="https://nodejs.org/docs/latest-carbon/api/crypto.html#crypto_crypto_timingsafeequal_a_b"><code>crypto.timingSafeEqual()</code></a> operation that can be used whenever performing sensitive comparisons. Using this function, our vulnerable operation becomes <code>if (crypto.timingSafeEqual(Buffer.fromString(userInput), Buffer.fromString('supersecretkey')) { ... }</code> and we stop exposing timing information to potential attackers.</p>
<h2 id="openssl-ecdsa-key-extraction-local-side-channel">OpenSSL: ECDSA key extraction local side-channel</h2>
<p>All actively supported release lines of Node.js are impacted by this flaw. Patches are included in both OpenSSL 1.1.0i (Node.js 10) and 1.0.2p (Node.js 6 LTS "Boron" and Node.js 8 LTS "Carbon").</p>
<p>This flaw does not have a CVE due to OpenSSL policy to not assign itself CVEs for local-only vulnerabilities that are more academic than practical. This vulnerability was discovered by <a href="https://www.nccgroup.trust/us/our-research/technical-advisory-return-of-the-hidden-number-problem/">Keegan Ryan at NCC Group</a> and impacts many cryptographic libraries including LibreSSL, BoringSSL, NSS, WolfCrypt, Botan, libgcrypt, MatrixSSL, and of course OpenSSL. A CVE was assigned for this issue specifically for libgcrypt, CVE-2018-0495.</p>
<p>This flaw is very similar to the above RSA key generation cache-timing flaw in that it also uses cache-timing and an attacker must be able to execute code on the local machine being attacked. It also uses a Flush+Reload to infer the operations being performed but this time it examines Digital Signature Algorithm (DSA) the Elliptic Curve Digital Signature Algorithm (ECDSA), but a little more information is required to mount a successful attack. In an attack scenario, the victim uses a private key to create several signatures. The attacker observes the resulting signatures must know the messages being signed. Then, the cache-timing side-channel is used to infer order of operations and backfill to find the private key.</p>
<p>This attack could be used against TLS, or SSH, and there are mechanisms in both that would give an attacker enough information to perform a successful attack under certain circumstances. The key component again being local access to a server performing the DSA or ECDSA signing operation, or access to a virtual machine on the same host as long as cache isn't partitioned as it often is for public clouds.</p>
<p>Unlike the RSA flaw, a fix is not as simple as switching to constant-time operations. Instead, the <a href="https://github.com/openssl/openssl/pull/6523">fix</a> involves adding a <a href="https://en.wikipedia.org/wiki/Blinding_(cryptography">“blinding”</a>) to the calculation. Blinding is a technique that can mask the underlying operation from side-channel inspection by inserting unpredictability which can be later reversed. This specific fix addresses the problematic addition (<code>+</code>) operation which exposes the side-channel leak. It does this by adding a random value as noise to both sides of the equation. Now, when observing the operation, it is theoretically impossible to remove the noise and discover the important information that would leak data.</p>
<h2 id="unintentional-exposure-of-uninitialized-memory-in-buffer-creation-cve-2018-7166-">Unintentional exposure of uninitialized memory in <code>Buffer</code> creation (CVE-2018-7166)</h2>
<p>All versions of Node.js 10 are impacted by this flaw. Prior release lines are not impacted.</p>
<p>Node.js TSC member Сковорода Никита Андреевич (Nikita Skovoroda / <a href="https://github.com/chalker">@ChALkeR</a>) discovered an argument processing flaw that causes causes <code>Buffer.alloc()</code> to return uninitialized memory. This method is intended to be safe and only return initialized, or cleared, memory.</p>
<p>Memory is not automatically cleared after use by most software and it is not generally cleared within Node.js during an application's lifetime when memory is freed from internal use. This means that a call to <code>malloc()</code> (system memory allocation) usually returns a block of memory that contains data stored by the previous user of that block who <code>free()</code>d it without clearing it. This can cause problems if an attacker can find a way to create these blocks and inspect their contents as secrets usually pass through memory—passwords, credit card numbers, etc. Allocate enough blocks of uncleared memory and you're bound to find something interesting.</p>
<p>In the browser, you have no way to allocate uninitialized memory, so a malicious site can't inspect your memory to find sensitive data arising from your interactions with another site. <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer"><code>ArrayBuffer</code></a> and the various <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray"><code>TypedArray</code></a> types will only ever give you initialized, or zeroed memory—memory that contains only <code>0</code>s.</p>
<p>Historically, for the sake of performance, Node.js has acted more like a traditional un-sandboxed server-side runtime that doesn't need the same kinds of protections as browsers. Unfortunately, many JavaScript programmers are not as attuned to the risks of using uninitialized memory. Additionally, the <code>Buffer</code> constructor itself has some usability flaws that have lead to many expert programmers exposing uninitialized memory to potential attackers. <a href="https://github.com/websockets/ws">ws</a>, the very popular WebSocket library, authored by skilled programmers, <a href="https://github.com/websockets/ws/releases/tag/1.0.1">famously exposed uninitialized memory</a> to client connections over the network by means of a simple remote <code>ping()</code> call that passed an integer instead of a string.</p>
<p>The usability concerns around <code>Buffer</code> lead to the deprecation of the <code>Buffer()</code> constructor and introduction of new factory methods: <a href="https://nodejs.org/api/buffer.html#buffer_buffer_from_buffer_alloc_and_buffer_allocunsafe"><code>Buffer.from()</code>, <code>Buffer.alloc()</code>, <code>Buffer.allocUnsafe()</code></a>, and the <a href="https://nodejs.org/api/buffer.html#buffer_the_zero_fill_buffers_command_line_option"><code>--zero-fill-buffers</code></a> command line argument. It's worth noting that from version 1.0, <a href="https://nodesource.com/products/nsolid">N|Solid</a>, NodeSource's enterprise Node.js runtime, included a <code>"zeroFillAllocations"</code> option in its <a href="https://docs.nodesource.com/latest/docs#polices">policies</a> feature to address similar concerns.</p>
<p>Unfortunately, the root cause of <code>Buffer</code> constructor usability concerns—too much flexibility in argument types—is still with us, this time in <a href="https://nodejs.org/api/buffer.html#buffer_buf_fill_value_offset_end_encoding"><code>Buffer#fill()</code></a> who's signature is far too flexible: <code>Buffer#fill(value[, offset[, end]][, encoding])</code>. Internal re-use of this function, and its flexible argument parsing, by <code>Buffer.alloc()</code> exposes a bug that allows a supposedly <em>safe</em> allocation method to return <em>unsafe</em> (i.e. uninitialized) memory blocks.</p>
<p><code>Buffer.alloc()</code> allows a third argument, <code>encoding</code>. When there is a second argument, <code>fill</code>, this and the <code>encoding</code> argument are passed blindly to the internal <code>fill()</code> implementation as second and third arguments. This is where it encounters the familiar <code>Buffer()</code> constructor problem:</p>
<div class="highlight"><pre><span class="kd">function</span> <span class="nx">_fill</span><span class="p">(</span><span class="nx">buf</span><span class="p">,</span> <span class="nx">val</span><span class="p">,</span> <span class="nx">start</span><span class="p">,</span> <span class="nx">end</span><span class="p">,</span> <span class="nx">encoding</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="k">typeof</span> <span class="nx">val</span> <span class="o">===</span> <span class="s1">'string'</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">start</span> <span class="o">===</span> <span class="kc">undefined</span> <span class="o">||</span> <span class="k">typeof</span> <span class="nx">start</span> <span class="o">===</span> <span class="s1">'string'</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">encoding</span> <span class="o">=</span> <span class="nx">start</span><span class="p">;</span>
<span class="nx">start</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="nx">end</span> <span class="o">=</span> <span class="nx">buf</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="k">typeof</span> <span class="nx">end</span> <span class="o">===</span> <span class="s1">'string'</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">encoding</span> <span class="o">=</span> <span class="nx">end</span><span class="p">;</span>
<span class="nx">end</span> <span class="o">=</span> <span class="nx">buf</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// ...</span>
</pre></div>
<p>The intention here is that by only passing three arguments, with the third one being <code>encoding</code>, the flexible argument parsing rules would enter the top set of instructions and set <code>encoding = start</code>, <code>start = 0</code>, <code>end = buf.length</code>, precisely what we want for a <code>Buffer</code> fully initialized with the provided <code>val</code>. However, because <code>Buffer.alloc()</code> does minimal type checking of its own, the <code>encoding</code> argument could be a number and this whole block of argument rewriting would be skipped and <code>start</code> could be set to some arbitrary point in the <code>Buffer</code>, even the very end, leaving the whole memory block uninitialized:</p>
<pre><code>> Buffer.alloc(20, 1)
<Buffer 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01>
> Buffer.alloc(20, 'x')
<Buffer 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78>
> Buffer.alloc(20, 1, 20)
<Buffer 80 be 6a 01 01 00 00 00 ff ff ff ff ff ff ff ff 00 00 00 00>
// whoops!
</code></pre><p>This is only a security concern if you are allowing unsanitized user input to control the third argument to <code>Buffer.alloc()</code>. Unless you are fully sanitizing and type-checking everything coming in from an external source and know precisely what types are required by your dependencies, you should not assume that you are not exposed.</p>
<p>The <a href="https://github.com/nodejs/node/commit/40a7beeddac9b9ec9ef5b49157daaf8470648b08">fix</a> for CVE-2018-7166 simply involves being explicit with internal arguments passed from <code>alloc()</code> to <code>fill()</code> and bypassing the argument shifting code entirely. Avoiding argument cleverness is a good rule to adopt in any case for robustness and security.</p>
<h2 id="out-of-bounds-oob-write-in-buffer-cve-2018-12115-">Out of bounds (OOB) write in <code>Buffer</code> (CVE-2018-12115)</h2>
<p>All actively supported release lines of Node.js are impacted by this flaw.</p>
<p>Node.js TSC member Сковорода Никита Андреевич (Nikita Skovoroda / <a href="https://github.com/chalker">@ChALkeR</a>) discovered an OOB write in <code>Buffer</code> that can be used to write to memory outside of a <code>Buffer</code>'s memory space. This can corrupt unrelated <code>Buffer</code> objects or cause the Node.js process to crash.</p>
<p><code>Buffer</code> objects expose areas of raw memory in JavaScript. Under the hood, this is done in different ways depending on how the <code>Buffer</code> is created and how big it needs to be. For <code>Buffer</code>s less than 8k bytes in length created via <code>Buffer.allocUnsafe()</code> and from most uses of <code>Buffer.from()</code>, this memory is allocated from a pool. This pool is made up of areas of block-allocated memory larger than an individual <code>Buffer</code>. So <code>Buffer</code>s created sequentially will often occupy adjoining memory space. In other cases, memory space may sit adjacent with some other important area of memory used by the current application—likely an internal part of V8 which makes heaviest use of memory in a typical Node.js application.</p>
<p>CVE-2018-12115 centers on <code>Buffer#write()</code> when working with UCS-2 encoding, (recognized by Node.js under the names <code>'ucs2'</code>, <code>'ucs-2'</code>, <code>'utf16le'</code> and <code>'utf-16le'</code>) and takes advantage of its two-bytes-per-character arrangement.</p>
<p>Exploiting this flaw involves confusing the UCS-2 string encoding utility in Node.js by telling it you wish to write new contents in the second-to-last position of the current <code>Buffer</code>. Since one byte is not enough for a single UCS-2 character, it should be rejected without changing the target <code>Buffer</code>, just like any <code>write()</code> with zero bytes is. The UCS-2 string encoding utility is written with the assumption that it has at least one whole character to write, but by breaking this assumption we end up setting the "maximum number of characters to write" to <code>-1</code>, which, when passed to V8 to perform the <a href="https://v8docs.nodesource.com/node-10.6/d2/db3/classv8_1_1_string.html#a79d9a617e12421ae3afb7a2060eb6fe4">write</a>, is interpreted as "all of the buffer you provided".</p>
<p>UCS-2 encoding can therefore be tricked to write as many bytes as you want from the second-to-last position of a <code>Buffer</code> on to the next area of memory. This memory space may be occupied by another <code>Buffer</code> in the application, or even to another semi-random memory space within our application, corrupting state and potentially causing an immediate segmentation fault crash. At best this can be used for a denial of service by forcing a crash. At worst, it could be used to overwrite sensitive data to trick an application into unintended behavior.</p>
<p>As with CVE-2018-7166, exploiting this flaw requires the passing of unsanitized data through to <code>Buffer#write()</code>, possibly in both the data to be written and the position for writing. Unfortunately, this is not an easy scenario to recognize and such code has been found to exist in npm packages available today.</p>
<p>The <a href="https://github.com/nodejs/node/commit/88105c998ef9d3f54aa8f22b82ec8cc31cbfac95">fix</a> for CVE-2018-12115 involves checking for this underflow and bailing early when there really are no full UCS-2 characters to write.</p>
The Truth About Rod Vagg2017-08-25T00:00:00.000Zhttps://r.va.gg/2017/08/the-truth-about-rod-vagg.html
<p><em>NOTE: This post is copied from <a href="https://github.com/nodejs/CTC/issues/165#issuecomment-324798494">https://github.com/nodejs/CTC/issues/165#issuecomment-324798494</a> and the primary intended audience was the Node.js CTC.</em></p>
<hr>
<p><em>Dear reader from <code>${externalSource}</code>: I neither like nor support personal abuse or attacks. If you are showing up here getting angry at any party involved, I would ask you to refrain from targeting them, privately or in public. Specifically to people who think they may be supporting me by engaging in abusive behaviour: I do not appreciate, want or need it, in any form and it is not helpful in any way.</em></p>
<p>Yep, this is a long post, but no apologies for the length this time. Buckle up.</p>
<p>I'm sad that we have reached this point, and that the CTC is being asked to make such a difficult decision. One of the reasons that we initially split the TSC into two groups was to insulate the technical <em>doers</em> on the CTC from the overhead of administrative and political tedium. I know many of you never imagined you'd have to deal with something like this when you agreed to join and that this is a very uncomfortable experience for you.</p>
<p>It's obvious that we never figured out a suitable structure that made the TSC a useful, functional, and healthy body that might be able to deal more effectively with these kinds of problems, more isolated from the CTC. I'm willing to accept a sizeable share of the blame for not improving our organisational structure during my tenure in leadership.</p>
<h2 id="my-response">My response</h2>
<p>Regarding the request for me to resign from the CTC: in lieu of clear justification that my removal is for the benefit of the Node.js project, or a case for my removal that is not built primarily on hearsay and innuendo, I respectfully decline.</p>
<p>There are two primary reasons for which I am standing my ground.</p>
<p>I cannot, in good conscience, give credence to the straw-man version of me being touted loudly on social media and on GitHub. This caricature of me and vague notions regarding my "toxicity", my propensity for "harassment", the "systematic" breaking of rules and other slanderous claims against my character has no basis in fact. I will not dignify these attacks by taking tacit responsibility through voluntary resignation.</p>
<p>Secondly, and arguably more importantly for the CTC: I absolutely will not take responsibility for the precedent that is currently being set. The dogged pursuit of a leader of this project, the strong-arm tactics being deployed with the goal of having me voluntarily resign, or my eventual removal from this organisation are not the behavior of a healthy, productive, or inclusive community. </p>
<p>My primary concern is that the consequences of these actions endanger the future health of the Node.js project. I do not believe that I am an irreplaceable snowflake (I’m entirely replaceable). There is reason to pause before making this an acceptable part of how we conduct our governance and our internal relationships.</p>
<p>However, while I am not happy to have the burden of this decision being foisted upon all of you, I am content with standing to be judged by this group. As the creative force behind Node.js and the legitimate owners of this project, my respect for you as individuals and as a group and your rightful position as final arbiters of the technical Node.js project makes entirely comfortable living with whatever decision you arrive at regarding my removal.</p>
<p>I will break the rest of this post into the following sections: </p>
<ul>
<li>My critique of the process so far</li>
<li>My response to list of complaints made against me <a href="https://github.com/nodejs/TSC/issues/310">to the TSC</a></li>
<li>Addressing the claims often repeated across the internet regarding me as a hinderance to progress on inclusivity and diversity</li>
<li>The independence of the technical group, the new threats posed to that independence</li>
<li>The threats posed to future leadership of the project</li>
</ul>
<h3 id="the-process-so-far">The process so far</h3>
<p>My personal experience so far has been approximately as follows:</p>
<ul>
<li>Some time ago I received notification via email that there are complaints against me. No details were provided and I was informed that I would neither receive those details or be involved in the whatever process was to take place. Further, TSC members were not allowed to speak to me directly about these matters, <em>including</em> my work colleagues also on the TSC. I was never provided with an opportunity to understand the specific charges against me or be involved in any discussions on this topic from that point onward.</li>
<li>3 days ago, I saw <a href="https://github.com/nodejs/TSC/issues/310">nodejs/TSC#310</a> at the same time as the public. <strong>This was the first time that I had seen the list of complaints</strong>. It was the first that I heard that there was a vote taking place regarding my position.</li>
<li>At no point have I been provided with an opportunity to answer to these complaints, correct the factual errors contained in them (see below), apologise and make amends where possible, or provide additional context that may further explain accusations against me.</li>
<li>At no point have I been approached by a member of the TSC or CTC regarding any of these items other than what the record that we have here on GitHub shows—primarily in the threads involved and in the moderation repository, the record is open for you to view regarding the due diligence undertaken either by my accusers or those executing the process. I have had interactions with only a single member of the TSC regarding one of these matters in private email and in person which has, on both occasions, involved me attempting to coax out the source of bad feelings that I had sensed and attempting to (relatively blindly) make amends.</li>
</ul>
<p>I hope you can empathise that to me this process is rather unfair and regardless of whether this process is informed or dictated by our governance documents as has been claimed, it should be changed so that in the future accused parties have the chance to at least respond to accusations.</p>
<h3 id="response-to-the-list-of-complaints">Response to the list of complaints</h3>
<p>I am including the text that was redacted from <a href="https://github.com/nodejs/TSC/issues/310">nodejs/TSC#310</a> as it is already in the public domain, on social media, also on GitHub and now in the press. Please note that I did not ask for this text to be redacted.</p>
<h4 id="1-">1.</h4>
<blockquote>
<p>In [link to moderation repository discussion, not copied here out of respect for additional parties involved], Rod’s first action was to apologize to a contributor who had been repeatedly moderated. Rod did not discuss the issue with other members of the CTC/TSC first. The result undermined the moderation process as it was occurring. It also undercut the authority as moderators of other CTC/TSC members.</p>
</blockquote>
<p>Rather than delving into the details of this complaint, I will simply say that I was unaware at the time that the actions I had taken were inappropriate and had caused hurt to some CTC/TSC members involved in this matter. Having had this belatedly explained to me (again, something I have had to coax out, not offered freely to me), I issued a private statement to the TSC and CTC via email at the beginning of this month offering my sincere apologies. (I did this without knowing whether it was part of the list of complaints against me.) The most relevant part of my private statement is this:</p>
<blockquote>
<p>In relation to my behaviour in the specific: I should not have weighed in so heavily, or at all, in this instance as I lacked so much of the context of what was obviously a very sensitive matter that was being already dealt with by some of you (in a very taxing way, as I understand it). I missed those signals entirely and weighed in without tact, took sides against some of you—apologising to [unnecessary details withheld] on behalf of some of you was an absurd thing for me to do without having being properly involved prior to this. And for this I unreservedly apologise!</p>
</blockquote>
<p>I don't know if this apology was acknowledged during the process of dealing with the complaints against me. This apology has neither been acknowledged in the publication of the complaints handling process, nor has it seemed to have any impact on the parties involved who continue to hold it against me. I can only assume that they either dismiss my sincerity or that apologies are not a sufficient means of rectifying these kinds of missteps.</p>
<p>In this matter I accept responsibility and have already attempted to make amends and prevent a similar issue from recurring. It disappoints me that it is still used as an active smear against me. Again, had I been given clear feedback regarding my misstep earlier, I would have attempted to resolve this situation sooner. </p>
<h4 id="2-">2.</h4>
<blockquote>
<p>In nodejs/board#58 and nodejs/moderation#82 Rod did not moderate himself when asked by another foundation director and told them he would take it to the board. He also ignored the explicit requests to not name member companies and later did not moderate the names out of his comments when requested. Another TSC member needed to follow up later to actually clean up the comments. Additionally he discussed private information from the moderation repo in the public thread, which is explicitly against the moderation policy.</p>
</blockquote>
<p>My response to this complaint is as follows:</p>
<ol>
<li>This thread unfortunately involves a significant amount of background corporate politics, personal relationship difficulties and other matters which conspired to raise the temperature, for me at least. This is not an excuse, simply an explanation for what may have appeared to some to be a heated interjection on my part.</li>
<li>I <em>did</em> edit my post very soon after—I was the first to edit my posts in there after the quick discussion that followed in the moderation repository and I realised I had made a poor judgement call with my choice of words. I both removed my reading of intent into the words of another poster and removed the disclosure of matters discussed in a private forum.</li>
<li>I do not recall being asked to <em>remove</em> the names of the companies involved, I have only now seen that they have been edited out of my post. I cannot find any evidence that such a request was even made. This would have been a trivial matter on my part and I would have done it without argument if I had have seen such a request. To find this forming the basis of a complaint is rather troubling without additional evidence.</li>
<li>A board member asking another board member (me) to edit their postings seemed to me to be a board matter, hence my suggestion to take it to the board. I was subsequently corrected on this—as it is a TSC-owned repository it was therefore referred to the TSC for adjudication.</li>
</ol>
<p>I considered the remaining specifics of this issue to have been resolved and have not been informed otherwise since this event took place. Yet I now find that the matters are still active and I am the target of criticism rather than that criticism being aimed at the processes that apparently resolved the matter in the first place. Why was I never informed that my part in the resolution was unsatisfactory and why was I not provided a chance to rectify additional perceived misdeeds?</p>
<h3 id="3-">3.</h3>
<blockquote>
<p>Most recently Rod tweeted in support of an inflammatory anti-Code-of-Conduct article. As a perceived leader in the project, it can be difficult for outsiders to separate Rod’s opinions from that of the project. Knowing the space he is participating in and the values of our community, Rod should have predicted the kind of response this tweet received. <a href="https://twitter.com/rvagg/status/887652116524707841">https://twitter.com/rvagg/status/887652116524707841</a></p>
<p>His tweeting of screen captures of immature responses suggests pleasure at having upset members of the JavaScript community and others. As a perceived leader, such behavior reflects poorly on the project. <a href="https://twitter.com/rvagg/status/887790865766268928">https://twitter.com/rvagg/status/887790865766268928</a></p>
<p>Rod’s public comments on these sorts of issues is a reason for some to avoid project participation. <a href="https://twitter.com/captainsafia/status/887782785221615618">https://twitter.com/captainsafia/status/887782785221615618</a></p>
<p>It is evidence to others that Node.js may not be serious about its commitment to community and inclusivity. <a href="https://twitter.com/nodebotanist/status/887724138516951049">https://twitter.com/nodebotanist/status/887724138516951049</a></p>
</blockquote>
<ol>
<li>The post I linked to was absolutely <strong>not</strong> an anti-Code-of-Conduct article. It was an article written by an Associate Professor of Evolutionary Psychology at the University of New Mexico, discussing free speech in general and suggesting a case against speech codes <em>in American university campuses</em>. In sharing this, I hoped to encourage meaningful discussion regarding the possible shortcomings of some standard Code of Conduct language. My intent was not to suggest that the Node.js project should not have a Code of Conduct in place.</li>
<li>"Rod should have predicted the kind of response this tweet received" is a deeply normative statement. I did not predict the storm generated, and assumed that open discussion on matters of speech policing was still possible, and that my personal views would not be misconstrued as the views of the broader Node.js leadership group or community. I obviously chose the wrong forum. If TSC/CTC members are going to be held responsible for attempting to share or discuss personal views on personal channels, then that level of accountability should be applied equally across technical and Foundation leadership.</li>
<li>"His tweeting of screen captures of immature responses suggests pleasure" is an assumption of my feelings at the time. I find this ironic especially in the context of complaint number 2 (above); I was criticised for reading the intention of another individual into their words yet that’s precisely what is being done here. This claim is absolutely untrue, I do not take pleasure in upsetting people. I will refrain from justifying my actions further on this matter but this accusation is baseless and disingenuous.</li>
<li>To re-state for further clarity, I <strong>have not made a case against Codes of Conduct in general</strong>, but rather, would like to see ongoing discussion about how such social guidelines could be improved upon, as they clearly have impact on open source project health.</li>
<li>I have never made a case against the Node.js Code of Conduct.</li>
<li>I have a clear voting record for adopting the Node.js project's Code of Conduct and for various changes made to it. Codes of Conduct have been adopted by a number of my own projects which have been moved from my own GitHub account to that of the Node.js Foundation.</li>
</ol>
<p>I will refrain from further justifying a tweet. As with all of you, I bring my own set of opinions and values to our diverse mix and we work to find an acceptable common space for us all to operate within. I don’t ask that you agree with me, but within reason I hope that mutual respect is stronger than a single disagreement. I cannot accept that my opinions on these matters form a valid reason for my removal. I have submitted myself to our Code of Conduct as a participant in this project. I have been involved in the application of our Code of Conduct. But I do not accept it as a sacred text that is above critique or even <em>discussion</em>.</p>
<p>While not a matter for the TSC or CTC, a Board member on the Foundation who (by their own admission), has repeatedly discussed sensitive and private Board matters publicly on Twitter, causing ongoing consternation and legal concern for the Board. As far as I know, this individual has not been asked to resign. I consider this type of behaviour to be considerably more problematic for the Foundation than my tweeting of a link to an article completely unrelated to Node.js.</p>
<p>Taking action against me on the basis of this tweet, while ignoring the many tweets and other social media posts that stand in direct conflict to the goals of the Foundation by other members of our technical team, its leadership and other members of the Foundation and its various bodies, strikes me as a deeply unequal (and, it must be said, un-inclusive) application of the rules. </p>
<p>If it is the case that the TSC/CTC is setting limits on personal discussion held outside the context of the project repo, then these limits should be applied to all members of both groups without prejudice.</p>
<h4 id="board-accusations">Board accusations</h4>
<p>In addition to the above list, we now have <a href="https://github.com/nodejs/board/issues/67">new claims</a> from the Node.js Foundation board. It appears to suggest that I have and/or do engage in <em>“antagonistic, aggressive or derogatory behavior”</em>, with no supporting evidence provided. Presumably the supporting evidence is the list in <a href="https://github.com/nodejs/TSC/issues/310">nodejs/TSC#310</a> to which I have responded with above.</p>
<p>I can’t respond to an unsupported claim such as this, it’s presented entirely without merit and I cannot consider it anything other than malicious, self-serving, and an obvious attempt to emotionally manipulate the TSC and CTC by charging the existing claims with a completely new level of seriousness by the sprinkling of an assortment of stigmatic <em>evil person</em> descriptors.</p>
<p>To say that I am disappointed that a majority of the Board would agree to conduct themselves in such an unprofessional and immature manner is an understatement. However this is neither the time nor place for me to attempt to address their attempts to smear, defame and <em>unperson</em> me. After requesting of me directly that I “fall on my sword” and not receiving the answer it wanted, the Board has chosen to make it clear to where it collectively thinks the high moral ground is in this matter. As I have already expressed to them, I believe they have made a poor assessment of the facts, and have not made the correct choice on their moral stance, and have now stood by and encouraged additional smears against me.</p>
<p>I will have more to say on the Board’s role and our relationship to it below, however.</p>
<h3 id="that-i-am-a-barrier-to-inclusivity-efforts">That I am a barrier to inclusivity efforts</h3>
<p>This is a refrain that is often repeated on social media about me and it's never been made clear, to me at least, how this is justified.</p>
<p>By most objective measures, the Node.js project has been healthier and more open to outsiders during my 2-year tenure in leadership than at any time in its history. One of the great pleasures I've had during this time has been in showing and celebrating this on the conference circuit. We have record numbers of contributors overall, per month overall and unique per month. Our issue tracker is so busy with activity that very few of us can stay subscribed to the firehose any more. We span the globe such that our core and working group meetings are very difficult to schedule and usually have to end up leaving people out. We regularly have to work to overcome language and cultural barriers as we continue to expand.</p>
<p>When I survey the contributor base, the collaborator list, the CTC membership, I see true diversity across many dimensions. Claims that I am a barrier to inclusivity and the building of a diverse contributor base are at odds with the prominent role I've had in the project during its explosive growth.</p>
<p>My assessment of the claim that I am a hindrance to inclusivity efforts is that it hinges on the singular matter of moderation and control of discourse that occurs amongst the technical team. From the beginning I have strongly maintained that the technical team should retain authority over its own space. That its independence also involves its ability to enforce the rules of social interaction and discussion as it sees fit. This has lead to disagreements with individuals that would rather insert external arbiters into the moderation process; arbiters who have not earned the right to stand in judgement of technical team members, and have not been held to the same standards by which technical team members are judged to earn their place in the project.</p>
<p>On this matter I remain staunchly opposed to the dilution of independence of the technical team and will continue to advocate for its ability to make such critical decisions for itself. This is not only a question of moral (earned) authority, but of the risk of subversion of our organisational structures by individuals who are attracted to the project by the possibility of pursuing a personal agenda, regardless of the impact this has on the project itself. I see current moves in this direction, as in this week’s moderation policy proposal at <a href="https://github.com/nodejs/TSC/pull/276">nodejs/TSC#276</a>, as presenting such a risk. I don't expect everyone to agree with me on this, but I have just as much right as everyone else to make my case and not be vilified in my attempts to convince enough of the TSC to prevent such changes.</p>
<p>Further, regarding other smears against my character that now circulate regularly on social media and GitHub. I would ask that if you are using any of these as the basis of your judgement against me, please ask for supporting evidence of those making or repeating such smears. It's been an educational experience to watch a caricatured narrative about my character grow into the monster that it is today, and it saddens me when people I respect take this narrative at face value without bothering to scratch the surface to see if there is any basis in fact.</p>
<p>The use of language such as “systematic” and “pattern” to avoid having to outline specifics should be seen for what they are: baseless smears. I have a large body of text involving many hundreds of social interactions scattered through the Node.js project and its various repositories on GitHub. If any such “systematic” behavioural problems exist then it should not be difficult to provide clear documentation of them.</p>
<h3 id="threats-to-the-independence-of-the-technical-group">Threats to the independence of the technical group</h3>
<p>We now face the unprecedented move by the Node.js Foundation Board to <a href="https://github.com/nodejs/board/issues/67">inject itself</a> directly in our decision-making process. The message being: the TSC voted the wrong way, they should do it again until you get the “right” outcome. </p>
<p>This echoes the sentiment being <a href="https://github.com/nodejs/community-committee/issues/111">expressed in the Community Committee</a> and elsewhere, that since there were accusations, there must be guilt and the fault lies in the inability of the TSC to deal with that guilt. With no credence paid to the possibility that <em>perhaps</em> the TSC evaluated the facts and reached a consensus that no further action was necessary. </p>
<p>I have some sympathy for the position of the Node.js Foundation board. These are tough times in the Silicon Valley environment, particularly with the existing concerns surrounding diversity, inclusivity, and tolerance. I can understand how <em>rumors</em> of similarly unacceptable behavior can pose a threat, even absent any evidence of such behavior. That said, I do not believe that it is in the long-term interests of Node.js or its Foundation to pander to angry mobs, as they represent a small fraction of our stakeholders and their demands are rarely rational. In this case, I believe that a majority of outsiders will be viewing this situation with bemusement at best. It saddens me that there is no recognition of the fact that appeasing angry and unverified demands by activists only leads to greater demands and less logical discussion of these issues. If we accept this precedent then we place the future health of this project in jeopardy, as we will have demonstrated that we allow outsiders to adjust our course to suit personal or private agendas, as long as they can concoct a story to create outrage and dispense mob justice without reproach.</p>
<p>While difficult, I believe that it is important for the technical team to continue to assert its independence, to the board and to other outside influences. We are not children who need adult supervision; treating us as such undermines so much of what we have built over these last few years and erodes the feelings of ownership of the project that we have instilled in our team of collaborators.</p>
<h3 id="the-threat-to-future-leadership-of-the-project">The threat to future leadership of the project</h3>
<p>Finally, I want to address a critical problem which has been overlooked, but now poses a big problem for our future: how to grow, enable and support leadership in such a difficult environment.</p>
<p>My tenure in leadership easily represents the most difficult years of my life. The challenges I have had to face have forced me to grow in ways I never expected. I'm thankful for the chance to meet these challenges, however, and even though it's taken a toll on my health, I'll be glad to have had the experience when I look back.</p>
<p>One of my tasks as a leader, particularly serving in the role of bridge between the Board and the technical team, has involved maintaining that separation and independence but also shielding the technical team from the intense corporate and personal politics that constantly exists and is being exercised within, and around the Foundation. This role forced me to take strong positions on many issues and to stand up to pressure applied from many different directions. In doing what I felt was best to support my technical team members I’m sure I’ve put people off-side—that's an unfortunate consequence of good intentions, but not an uncommon one. I wouldn't say I've made enemies so much as had to engage in <em>very</em> difficult conversations and involve myself in the surfacing of many disagreements that are difficult and sometimes impossible to resolve.</p>
<p>Having to involve yourself in a wide variety of decision-making processes inevitably requires that you make tough calls or connect yourself in some way to controversial discussions. I'm sure our current leadership can attest to the awkward positions they have found themselves in, and the difficult conversations they have had to navigate, including this one!</p>
<p>I'll never pretend I don't have limitations in the skills, both intellectually and emotionally, required to navigate through these tough waters. But when I consider the sheer number of dramas, controversies, and difficult conversations I've had to be involved in—and when I consider the thousands of pages of text I have left littered across GitHub and the other forums we use to get things done—I come to this conclusion: If the best reason you can find force my resignation is the above list of infractions, <strong>given the weight of content you could dredge through, then you're either not trying very hard or I should be pretty proud of myself for keeping a more level head than I had imagined</strong>.</p>
<p>That aside, my greatest concern for the role of leadership coming as a consequence of the actions currently being pursued, is that we've painted ourselves into a corner regarding the leaders we're going to have available. The message that the Board has chosen to send today can be rightly interpreted as this: if the mob comes calling, if the narrative of evil is strong enough, regardless of the objective facts, the Foundation does not have your back. As developers and leaders, the Foundation is signalling that they will not stand up for us when things get tough. Combine this with a difficult and thankless job, where the result of exercising your duties could be career-killing, the only path forward for leadership is that we will likely only have:</p>
<ul>
<li>Individuals who are comfortable giving in to the whims of the outside activists, whatever the demands, slowly transforming this project into something entirely different and focused on matters not associated with making Node.js a success</li>
<li>Individuals who are capable but shrewd enough to avoid responsibility</li>
<li>Individuals who are capable and take on responsibility, exercise backbone when standing against pressure groups and mob tactics but get taken down because the support structures either abandon them or turn against them</li>
</ul>
<p>This kind of pattern is being evidenced across the professionalised open source sphere, with Node.js about to set a new low bar. Do not be surprised as quality leaders become more difficult to find or become unconvinced that the exercise of leadership duties is at all in their personal interest.</p>
<p>This is a great challenge for modern open source and I'm so sad that I am being forced to be involved in the setting of our current trajectory. I hope we can find space in the future to have the necessary dialog to find a way out of the hole being dug.</p>
<h3 id="in-summary">In summary</h3>
<p>Obviously I hope that you agree that (a) this action against me is unwarranted, is based on flawed and/or irrelevant claims of “misbehaviour” and is based in malicious intent, and that (b) allowing this course of action to be an acceptable part of our governance procedures will have detrimental consequences for the future health of the project.</p>
<p>I ask the CTC to reject this motion, for the TSC to reject the demand by the Board for my suspension, and that we as a technical team send a signal that our independence is critical to the success of the project, despite the accusations of an angry mob.</p>
<p>Thank you if you dignified my words by reading this far!</p>
Why I don't use Node's core 'stream' module2014-06-14T00:00:00.000Zhttps://r.va.gg/2014/06/why-i-dont-use-nodes-core-stream-module.html
<p><em>This article was originally offered to nearForm for publishing and appeared for some time on their blog from early 2014 (at this URL: <a href="http://www.nearform.com/nodecrunch/dont-use-nodes-core-stream-module">http://www.nearform.com/nodecrunch/dont-use-nodes-core-stream-module</a>). It has since been deleted. I'd rather not speculate about the reasons for the deletion but I believe the article contains a very important core message so I'm now republishing it here.</em></p>
<h2 id="tl-dr">TL;DR</h2>
<p>The "readable-stream" package available in npm is a mirror of the Streams2 and Streams3 implementations in Node-core. You can guarantee a stable streams base, regardless of what version of Node you are using, if you only use "readable-stream".</p>
<h2 id="the-good-ol-days">The good 'ol days</h2>
<p>Prior to Node 0.10, implementing a stream meant extending the core <code>Stream</code> object. This object was simply an <code>EventEmitter</code> that added a special <code>pipe()</code> method to do the streaming magic.</p>
<p>Implementing a stream usually started with something like this:</p>
<div class="highlight"><pre><span class="kd">var</span> <span class="nx">Stream</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'stream'</span><span class="p">).</span><span class="nx">Stream</span>
<span class="kd">var</span> <span class="nx">util</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'util'</span><span class="p">)</span>
<span class="kd">function</span> <span class="nx">MyStream</span> <span class="p">()</span> <span class="p">{</span>
<span class="nx">Stream</span><span class="p">.</span><span class="nx">call</span><span class="p">(</span><span class="k">this</span><span class="p">)</span>
<span class="p">}</span>
<span class="nx">util</span><span class="p">.</span><span class="nx">inherits</span><span class="p">(</span><span class="nx">MyStream</span><span class="p">,</span> <span class="nx">Stream</span><span class="p">)</span>
<span class="c1">// stream logic, implemented however you want</span>
</pre></div>
<p>If you ever had to write a non-trivial stream implementation for pre-Node 0.10 without using a helper library (such as <a href="https://github.com/dominictarr/through">through</a>), you know what a nightmare the state-management it can be. The actual implementation of a custom stream is a lot more than just the above code.</p>
<h2 id="welcome-to-node-0-10">Welcome to Node 0.10</h2>
<p>Thankfully, Streams2 came along with a brand new set of base Stream implementations that do a whole lot more than <code>pipe()</code>. The biggest win for stream implementers comes from the fact that state-management is almost entirely taken care of for you. You simply need to provide concrete implementations of some abstract methods to make a fully functional stream, even for non-trivial workloads.</p>
<p>Implementing a stream now looks something like this:</p>
<div class="highlight"><pre><span class="kd">var</span> <span class="nx">Readable</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'stream'</span><span class="p">).</span><span class="nx">Readable</span>
<span class="c1">// `Stream` is still provided for backward-compatibility</span>
<span class="c1">// Use `Writable`, `Duplex` and `Transform` where required</span>
<span class="kd">var</span> <span class="nx">util</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'util'</span><span class="p">)</span>
<span class="kd">function</span> <span class="nx">MyStream</span> <span class="p">()</span> <span class="p">{</span>
<span class="nx">Readable</span><span class="p">.</span><span class="nx">call</span><span class="p">(</span><span class="k">this</span><span class="p">,</span> <span class="p">{</span> <span class="cm">/* options, maybe `objectMode:true` */</span> <span class="p">})</span>
<span class="p">}</span>
<span class="nx">util</span><span class="p">.</span><span class="nx">inherits</span><span class="p">(</span><span class="nx">MyStream</span><span class="p">,</span> <span class="nx">Readable</span><span class="p">)</span>
<span class="c1">// stream logic, implemented mainly by providing concrete method implementations:</span>
<span class="nx">MyStream</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">_read</span> <span class="o">=</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">size</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// ... </span>
<span class="p">}</span>
</pre></div>
<p>State-management is handled by the base-object and you interact with internal methods, such as <code>this.push(chunk)</code> in the case of a <code>Readable</code> stream.</p>
<p>While the internal streams implementations are an order-of-magnitude more complex than the previous core-streams implementation, most of it is there to make life an order-of-magnitude easier for those of us implementing custom streams. Yay!</p>
<h2 id="backward-compatibility">Backward-compatibility</h2>
<p>When every new major stable release of Node occurs, anyone releasing public packages in npm has to make a decision about which versions of Node they support. As a general rule, the authors of the most popular packages in npm will support the current stable version of Node and the previous stable release.</p>
<p>Streams2 was designed with backwards-compatibility in mind. Streams using <code>require('stream').Stream</code> as a base will still mostly work as you'd expect and they will also work when piped to streams that extend the other classes. Streams2 streams won't work like classic EventEmitter objects when you pipe them together, as old-style streams do. But when you pipe a Streams2 stream and an old-style EventEmitter-based stream together, Streams2 will fall-back to "compatibility-mode" and operate in a backward-compatible way.</p>
<p>So Streams2 are great and mostly backward-compatible (aside from some tricky edge cases). But what about when you want to implement Streams2 and run on Node 0.8? And what about open source packages in npm that want to still offer Node 0.8 compatibility while embracing the new Streams2-goodness?</p>
<h3 id="-readable-stream-to-the-rescue">"readable-stream" to the rescue</h3>
<p>During the 0.9 development phase, prior to the 0.10 release, Isaac developed the new Streams2 implementation in a package that was released in npm and usable on older versions of Node. The <a href="https://github.com/isaacs/readable-stream">readable-stream</a> package is essentially a mirror of the streams implementation of Node-core but is available in npm. This is a pattern we will hopefully be seeing more of as we march towards Node 1.0. Already there is a <a href="https://github.com/isaacs/core-util-is">core-util-is</a> package that makes available the shiny new <code>is</code> type-checking functions in the 0.11 core 'util' package.</p>
<p><strong>readable-stream</strong> gives us the ability to use Streams2 on versions of Node that don't even have Streams2 in core. So a common pattern for supporting older versions of Node while still being able to hop on the Streams2-bandwagon starts off something like this, assuming you have "readable-stream" as a dependency:</p>
<div class="highlight"><pre><span class="kd">var</span> <span class="nx">Readable</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'stream'</span><span class="p">).</span><span class="nx">Readable</span> <span class="o">||</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'readable-stream'</span><span class="p">).</span><span class="nx">Readable</span>
</pre></div>
<p>This works because there is no <code>Readable</code> object on the core 'stream' package in 0.8 and prior, so if you are running on an older version of Node it skips straight to the "readable-stream" package to get the required implementation.</p>
<h2 id="streams3-a-new-flavour">Streams3: a new flavour</h2>
<p>The <strong>readable-stream</strong> package is still being used to track the changes to streams coming in 0.12. The upcoming Streams3 implementation is more of a tweak than a major change. It contains an attempt to make "compatibility mode" more of a first-class citizen of the API and also some improvements to pause/resume behaviour.</p>
<p>Like Streams2, the aim with Streams3 is for backward (and forward) compatibility but there are limits to what can be achieved on this front.</p>
<p>While this new streams implementation will likely be an improvement over the current Streams2 implementation, it is part of the <em>unstable</em> development branch of Node and is so far not
without its edge cases which can break code designed against the pure 0.10 versions of Streams2.</p>
<h2 id="what-is-your-base-implementation-">What is your base implementation?</h2>
<p>Looking back at the code used to fetch the base Streams2 implementation for building custom streams, let's consider what we're actually getting with different versions of Node:</p>
<div class="highlight"><pre><span class="kd">var</span> <span class="nx">Readable</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'stream'</span><span class="p">).</span><span class="nx">Readable</span> <span class="o">||</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'readable-stream'</span><span class="p">).</span><span class="nx">Readable</span>
</pre></div>
<ul>
<li><em>Node 0.8 and prior:</em> we get whatever is provided by the readable-stream package in our dependencies.</li>
<li><em>Node 0.10:</em> we get the particular version of Streams2 that comes with the version of Node we're using.</li>
<li><em>Node 0.11:</em> we get the particular version of Streams3 that comes with the version of Node we're using.</li>
</ul>
<p>This may not be interesting if you have full control over all deployments of your custom stream implementations and which version(s) of Node they will be used on. But it can cause some problems in the case of open source libraries distributed via npm with users still stuck on 0.8 (for some, the upgrade path is not an easy one for various reasons), 0.10 and even people trying out some of the new Node and V8 features available in 0.11.</p>
<p>What you end up with is a very unstable base upon which to build your streams implementation. This is particularly acute since the vast bulk of the code used to construct the stream logic is coming from either Node-core or the readable-stream package. Any <em>bugs</em> fixed in later Node 0.10 releases will obviously still be present for people still stuck on earlier 0.10 releases even if the readable-stream dependency has the <em>fixed</em> version.</p>
<p>Then, when your streams code is run on Node 0.11, suddenly it's a Streams3 stream which has slightly different behaviour to what most of your users are experiencing.</p>
<p>One of the ways these subtle differences are exposed is in bug reports. Users may report a bug that only occurs on their particular combination of core-streams and readable-stream and it may not be obvious that the problem is related to base-stream implementation edge-cases they are stumbling upon; wasting time for everyone.</p>
<p>And what about stability? The fragmentation introduced by all of the possible combinations means that your otherwise stable library is having instability foisted upon it from the outside. This is one of the costs of relying on a featureful standard-library (core) within a rapidly developing, pre-v1 platform. But we can do something about it by taking control of the exact version of the base streams objects we want to extend regardless of what is bundled in the version of Node being used. <strong>readable-stream</strong> to the rescue!</p>
<h2 id="taking-control">Taking control</h2>
<p>To control exactly what code your streams implementation is building on, simply pin the version of readable-stream and use only it, avoiding <code>require('stream')</code> completely. Then you get to make the choice when to upgrade to Streams3, even if that's some time <em>after</em> Node 0.12.</p>
<p><strong>readable-stream</strong> comes in two major versions, <strong>v1.0.x</strong> and <strong>v1.1.x</strong>. The former tracks the Streams2 implementation in Node 0.10, including bug-fixes and minor improvements as they are added. The latter tracks Streams3 as it develops in Node 0.11; we may see a v1.2.x branch for Node 0.12.</p>
<p>Any library worth using should be following the basics of semver minor and patch versions (the merits and finer points of major versioning are still something worth debating). readable-stream gives you proper patch-level versioning so if you pin to <code>"~1.0.0"</code> you'll get the latest Node 0.10 Streams2 implementation, including any fixes and minor non-breaking improvements. The patch-level version of 1.0.x and 1.1.x should mirror the patch-level versions of Node core releases as we proceed.</p>
<p>When you're ready to start using Streams3 you can pin to <code>"~1.1.0"</code>, but you should hold off until much closer to Node 0.12, if not after its formal release.</p>
<h2 id="small-core-ftw-">Small core FTW!</h2>
<p>Being able to control precisely the versions of dependencies your code uses reduces the scope for bugs introduced by version incompatibilities or new and unproven implementations.</p>
<p>When we rely on a bulky standard-library to build our libraries and applications, we're relying on a shifting sand that we have little control over. This is particularly a problem for open source libraries whose users have legitimate (and sometimes not-so-legitimate) reasons for using versions that you'd rather not have to support.</p>
<p>Streams2 is a powerful abstraction, but the implementation is far from simple. The Streams2 code is some of the most complex JavaScript you'll find in Node core. Unless you want to have a detailed understanding of how they work and be able to track the changes as they develop, you should pin your Streams2 dependency in the same way as you pin all your other dependencies. Opt for <strong>readable-stream</strong> over what Node-core offers:</p>
<div class="highlight"><pre><span class="p">{</span>
<span class="nt">"name"</span><span class="p">:</span> <span class="s2">"mystream"</span><span class="p">,</span>
<span class="err">...</span>
<span class="nt">"dependencies"</span><span class="p">:</span> <span class="p">{</span>
<span class="nt">"readable-stream"</span><span class="p">:</span> <span class="s2">"~1.0.0"</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<div class="highlight"><pre><span class="kd">var</span> <span class="nx">Readable</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'readable-stream'</span><span class="p">).</span><span class="nx">Readable</span>
<span class="kd">var</span> <span class="nx">util</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'util'</span><span class="p">)</span>
<span class="kd">function</span> <span class="nx">MyStream</span> <span class="p">()</span> <span class="p">{</span>
<span class="nx">Readable</span><span class="p">.</span><span class="nx">call</span><span class="p">(</span><span class="k">this</span><span class="p">)</span>
<span class="p">}</span>
<span class="nx">util</span><span class="p">.</span><span class="nx">inherits</span><span class="p">(</span><span class="nx">MyStream</span><span class="p">,</span> <span class="nx">Readable</span><span class="p">)</span>
<span class="nx">MyStream</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">_read</span> <span class="o">=</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">size</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// ... </span>
<span class="p">}</span>
</pre></div>
<h2 id="addendum-through2-">Addendum: "through2"</h2>
<p>If the boilerplate of the Streams2 base objects ("classes") is too much for you or triggers some past-life Java PTSD, you can just opt for the "through2" package in npm to get the job done.</p>
<p><a href="https://github.com/rvagg/through2">through2</a> is based on Dominic Tarr's <a href="https://github.com/dominictarr/through">through</a> but is built for Streams2, whereas "through" is a pure Streams1 style. The API isn't quite the same but the flexibility and simplicity is.</p>
<p>through2 gives you a <code>DuplexStream</code> as a base to implement any kind of stream you like, be it as purely readable, purely writable or a fully duplex stream. In fact, you can even use through2 to implement a <code>PassThrough</code> stream by not providing an implementation!</p>
<p>From the examples:</p>
<div class="highlight"><pre><span class="kd">var</span> <span class="nx">through2</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'through2'</span><span class="p">)</span>
<span class="nx">fs</span><span class="p">.</span><span class="nx">createReadStream</span><span class="p">(</span><span class="s1">'ex.txt'</span><span class="p">)</span>
<span class="p">.</span><span class="nx">pipe</span><span class="p">(</span><span class="nx">through2</span><span class="p">(</span><span class="kd">function</span> <span class="p">(</span><span class="nx">chunk</span><span class="p">,</span> <span class="nx">enc</span><span class="p">,</span> <span class="nx">callback</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="nx">chunk</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">chunk</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span> <span class="o">==</span> <span class="mi">97</span><span class="p">)</span>
<span class="nx">chunk</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span> <span class="o">=</span> <span class="mi">122</span> <span class="c1">// swap 'a' for 'z'</span>
<span class="k">this</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk</span><span class="p">)</span>
<span class="nx">callback</span><span class="p">()</span>
<span class="p">}))</span>
<span class="p">.</span><span class="nx">pipe</span><span class="p">(</span><span class="nx">fs</span><span class="p">.</span><span class="nx">createWriteStream</span><span class="p">(</span><span class="s1">'out.txt'</span><span class="p">))</span>
</pre></div>
<p>Or an object stream:</p>
<div class="highlight"><pre><span class="nx">fs</span><span class="p">.</span><span class="nx">createReadStream</span><span class="p">(</span><span class="s1">'data.csv'</span><span class="p">)</span>
<span class="p">.</span><span class="nx">pipe</span><span class="p">(</span><span class="nx">csv2</span><span class="p">())</span>
<span class="p">.</span><span class="nx">pipe</span><span class="p">(</span><span class="nx">through2</span><span class="p">.</span><span class="nx">obj</span><span class="p">(</span><span class="kd">function</span> <span class="p">(</span><span class="nx">chunk</span><span class="p">,</span> <span class="nx">enc</span><span class="p">,</span> <span class="nx">callback</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">data</span> <span class="o">=</span> <span class="p">{</span>
<span class="nx">name</span> <span class="o">:</span> <span class="nx">chunk</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="p">,</span> <span class="nx">address</span> <span class="o">:</span> <span class="nx">chunk</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span>
<span class="p">,</span> <span class="nx">phone</span> <span class="o">:</span> <span class="nx">chunk</span><span class="p">[</span><span class="mi">10</span><span class="p">]</span>
<span class="p">}</span>
<span class="k">this</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span>
<span class="nx">callback</span><span class="p">()</span>
<span class="p">}))</span>
<span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">'data'</span><span class="p">,</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">data</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">all</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span>
<span class="p">})</span>
<span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">'end'</span><span class="p">,</span> <span class="kd">function</span> <span class="p">()</span> <span class="p">{</span>
<span class="nx">doSomethingSpecial</span><span class="p">(</span><span class="nx">all</span><span class="p">)</span>
<span class="p">})</span>
</pre></div>
NodeSchool comes to Australia2014-06-14T00:00:00.000Zhttps://r.va.gg/2014/06/nodeschool-comes-to-australia.html
<p><strong><a href="http://nodeschool.io">NodeSchool</a></strong> has its genesis ultimately at NodeConf 2013 where <a href="https://twitter.com/substack">@substack</a> introduced us to <a href="https://github.com/substack/stream-adventure">stream-adevnture</a>. I took the concept home to <a href="http://campjs.com/">CampJS</a> and wrote <strong><a href="https://github.com/rvagg/learnyounode/">learnyounode</a></strong> for the <em>introduction to Node.js</em> workshop I was to run. As part of the process I extracted a package called <a href="https://github.com/rvagg/workshopper">workshopper</a> to do the work of making a terminal workshop experience. Most of the logic originally came from stream-adventure. A short time after CampJS, I created <a href="https://github.com/rvagg/levelmeup">levelmeup</a> for <a href="http://nodeconf.eu/">NodeConf.eu</a> and suddenly we had a selection of what have come to be known as <em>"workshoppers"</em>. At NodeConf.eu, <a href="https://twitter.com/brianloveswords">@brianloveswords</a> suggested the NodeSchool concept and registered the <a href="http://nodeschool.io">domain</a>, @substack provided the artwork and the ball was rolling.</p>
<p>Today, workshopper is depended on by 22 packages in npm, most of which are workshoppers that you can install and use to learn aspects of Node.js or JavaScript in general. The curated list of usable workshoppers is maintained at <a href="http://nodeschool.io">nodeschool.io</a>.</p>
<p>learnyounode itself is now being downloaded at a rate of roughly 200 <em>per day</em>. That's at least 200 more people each day wanting to learn how to <em>do Node.js</em>.</p>
<div style="margin: 0 auto; text-align: center;">
<img src="https://nodei.co/npm-dl/learnyounode.png?months=6" alt="learnyounode downloads">
</div>
<p>I can't recall exactly how <em>"NodeSchool IRL"</em> events started but it was probably <a href="http://twitter.com/maxogden">@maxogden</a> who has been responsible for a large number of these events. There have now been over 30 of these events around the world and the momentum is only picking up steam. The beauty of this format is that it's low-cost and low-effort to make it happen. All you need is a venue where nerds can show up with their computers and some basic guidance. There have even been a few events that were without experienced Node.js mentors but that's no great barrier as the lessons are largely self-guided and work particularly well when pairs or groups of people work together on solutions.</p>
<div style="margin: 0 auto; text-align: center;">
<img src="https://raw.githubusercontent.com/nodeschool/nodeschool.github.io/master/images/nodeschool-hex.png" style="width: 300px;">
</div>
<h2 id="nodeschool-comes-to-australia">NodeSchool comes to Australia</h2>
<p>It's surprising that so far, given all of the NodeSchool activity around the world, that we haven't had a single NodeSchool event in Australia. CampJS had learnyounode last year and this year there were <a href="https://github.com/nodeschool/discussions/issues/323">3 brand new workshoppers</a> introduced there, so it's the closest thing we've had.</p>
<p>Next weekend, on the <strong>21st of June</strong>, we are attempting a <strong>coordinated Australian NodeSchool</strong> event. At the moment, that coordination amounts to having events hosted in Sydney, Melbourne and Hobart, unfortunately the timing has been difficult for Brisbane and we haven't managed to bring anyone else out of the woodwork. But, we will be attempting to do this regularly, plus we'd like to encourage meet-up hosts to use the format now and again with their groups.</p>
<h2 id="nodeschool-in-sydney">NodeSchool in Sydney</h2>
<p>I'll be at NodeSchool in Sydney next weekend. It will be proudly hosted by <a href="http://nicta.com.au/">NICTA</a> who have a space for up to 60 people. NICTA are currently doing some interesting work with WebRTC, you should catch up with <a href="https://twitter.com/DamonOehlman">@DamonOehlman</a> if this is something you're interested in. <a href="https://www.tabcorp.com.au/">Tabcorp</a> will also be a major sponsor of the event. Tabcorp have been building a new digital team with the back-end almost entirely in Node.js. They are also doing a great job engaging and contributing to existing and new open source projects. They are also hiring so be sure to catch up with <a href="https://twitter.com/romainprieto">@romainprieto</a> if you're doing PHP, Java or some other abomination and want to be doing Node!</p>
<p>Thanks to the sponsorship, we'll be able to provide some catering for the event. Currently we're looking at providing lunch but we may be expanding that to providing some breakfast treats. We'll also be providing refreshments for everyone attending throughout the day.</p>
<p>Start time is 9.30am, end is 4pm. The plan is to spend the first half of the day doing introductory Node.js which will mainly mean working through learnyounode. The second half of the day will be less structured and we'll encourage attendees to work on other workshoppers that they find interesting. Thankfully we have some amazing Node.js programmers in Sydney and they'll be available as mentors.</p>
<p>We are currently <em>selling</em> tickets for $5, the money will contribute towards the event, there is no profit involved here. We don't <em>need</em> to charge for the event but given the generally dismal turnout for tech meet-ups that are free we feel that providing a small commitment barrier will help us maximise the use of the space we have available. <strong>If the money is a barrier for you please contact us!</strong> We don't want anyone to miss out. Also, we have special "mentor" tickets available for experienced Node.js programmers who are able to assist. If you think you fit into this category please contact us also.</p>
<p>You can <strong>sign up for Sydney NodeSchool at <a href="https://ti.to/nodeschool/sydney-june-2014/">https://ti.to/nodeschool/sydney-june-2014/</a></strong>. If you are tempted, don't sit on the fence because spots are limited and as of writing the tickets are almost 1/2 gone.</p>
<h2 id="nodeschool-in-melbourne">NodeSchool in Melbourne</h2>
<p>NodeSchool in Melbourne is being supported by <a href="http://www.thoughtworks.com/">ThoughtWorks</a> who have been doing Node in Australia for a while now. If you're interested in their services or want to chat about employment opportunities you should catch up with <a href="https://github.com/lfendy">Liauw Fendy</a>.</p>
<p><a href="https://twitter.com/sidorares">@sidorares</a> is putting in a large amount of the legwork in to Melbourne's event. He was a major contributor to the original learnyounode and is a huge asset to the Melbourne Node community. Along with Andrey, Melbourne has large number of expert Node.js hackers, many of who will be available as mentors. This will be a treat for Melbournians so this is not something you should miss out on if you are in town! Potential mentors should contact Andrey.</p>
<p>You can <strong>sign up for Melbourne NodeSchool at <a href="https://ti.to/nodeschool/melbourne-june-2014/">https://ti.to/nodeschool/melbourne-june-2014/</a></strong>.</p>
<h2 id="nodeschool-in-hobart">NodeSchool in Hobart</h2>
<p>Hobart is lucky to have <a href="http://joshgilli.es/">@joshgillies</a>, a local tech-community organiser responsible for many Tasmanian web and JavaScript events. The event is being hosted at <a href="http://typewriterfactory.com/">The Typewriter Factory</a>, a business workspace that Josh helps run. Sponsorship is being provided by <a href="http://www.acs.org.au/">ACS</a> who will be helping support the venue and also provide some catering.</p>
<p>You can <strong>sign up for Hobart NodeSchool at <a href="https://ti.to/nodeschool/hobart-june-2014/">https://ti.to/nodeschool/hobart-june-2014/</a></strong>.</p>
Testing code against many Node versions with Docker2013-11-26T00:00:00.000Zhttps://r.va.gg/2013/11/testing-code-against-many-node-versions-with-docker.html
<p>I haven't found reason to play with <a href="http://www.docker.io">Docker</a> until now, but I've finally came up with an excellent use-case.</p>
<p><a href="https://github.com/rvagg/nan">NAN</a> is a project that helps build native Node.js add-ons while maintaining compatibility with Node and V8 from Node versions 0.8 onwards. V8 is currently undergoing major internal changes which is making add-on development very difficult; NAN's purpose is to abstract that pain. Instead of having to manage the difficulties of keeping your code compatible across Node/V8 versions, NAN does it for you. But this means that we have to be sure to keep NAN tested and compatible with all of the versions it claims to support.</p>
<p><a href="https://travis-ci.org/">Travis</a> can help a little with this. It's possible to use <a href="https://github.com/creationix/nvm">nvm</a> to test across different versions of Node, we've tried this with NAN (see the <a href="https://github.com/rvagg/nan/blob/ba82a9c1fba01b3df553ac624aeaf15ca3688315/.travis.yml">.travis.yml</a>). Ideally you'd have better choice of Node versions, but Travis have had some <a href="https://github.com/travis-ci/travis-ci/issues/1328">difficulty</a> keeping up. Also, npm bugs make it difficult, with a high failure rate from npm install problems, like <a href="https://travis-ci.org/rvagg/nan/jobs/14440485">this</a> and <a href="https://travis-ci.org/rvagg/nan/jobs/14474613">this</a>, so I don't even publish the badge on the NAN README.</p>
<p>The other problem with Travis is that it's a CI solution, not a proper testing solution. Even if it worked well, it's not really that helpful in the development process, you need rapid feedback that your code is working on your target platforms (this is one reason why I love back-end development more than front-end development!)</p>
<p>Enter Docker and <strong><a href="https://github.com/rvagg/dnt">DNT</a></strong></p>
<div style="margin: 0 auto;">
<img src="https://www.docker.com/sites/default/files/legal/small_v.png" width="114" height="114">
<img src="https://nodejs.org/images/logos/nodejs-dark.png" width="212" height="114">
<img src="https://img.pandawhale.com/29490-Picard-applause-clapping-gif-s5nz.gif" width="151" height="114">
</div>
<h3 id="dnt-docker-node-tester">DNT: Docker Node Tester</h3>
<p>Docker is a tool that simplifies the use of Linux containers to create lightweight, isolated compute "instances". Solaris and its variants have had this functionality for years in the form of "zones" but it's a fairly new concept for Linux and Docker makes the whole process a lot more friendly.</p>
<p><strong>DNT</strong> contains two tools that work with Docker and Node.js to set-up containers for testing and run your project's tests in those containers.</p>
<div style="margin: 0 auto;">
<img src="https://r.va.gg/images/2013/11/nan-dnt.png">
</div>
<p><strong>DNT</strong> includes a <code>setup-dnt</code> script that sets up the most basic Docker images required to run Node.js applications, nothing extra. It first creates an image called "dev_base" that uses the default Docker "ubuntu" image and adds the build tools required to compile and install Node.js</p>
<p>Next it creates a "node_dev" image that contains a complete copy of the Node.js <a href="http://github.com/joyent/node">source repository</a>. Finally, it creates a series of images that are required; for each Node version, it creates an image with Node installed and ready to use.</p>
<p>Setting up a project is a matter of creating a <em>.dntrc</em> file in the root directory of the project. This configuration file involves setting a <code>NODE_VERSIONS</code> variable with a list of all of the versions of Node you want to test against, and this can include "master" to test the latest code from the Node repository. You also set a <code>TEST_CMD</code> variable with a series of commands required to set up, compile and execute your tests. The <code>setup-dnt</code> command can be run against a <em>.dntrc</em> file to make sure that the appropriate Docker images are ready. The <code>dnt</code> command can then be used to execute the tests against all of the Node versions you specified.</p>
<p>Since Docker containers are completely isolated, <strong>DNT</strong> can run tests in parallel as long as the machine has the resources. The default is to use the number of cores on the computer as the concurrency level but this can be configured if not appropriate.</p>
<p>Currently <strong>DNT</strong> is designed to parse TAP test output by reading the final line as either "ok" or "not ok" to report test status back on the command-line. It is configurable but you need to supply a command that will transform test output to either an "ok" or "not ok" (<code>sed</code> to the rescue?).</p>
<h3 id="how-i-m-using-it">How I'm using it</h3>
<p>My primary use-case is for testing <strong>NAN</strong>. The test suite needs a lot of work so being able to test against all the different V8 and Node APIs while coding is super helpful; particularly when tests run so quickly! My NAN <em>.dntrc</em> file tests against master, all of the 0.11 releases since 0.11.4 (0.11.0 to 0.11.3 are explicitly not supported by NAN) and the last 5 releases of the 0.10 and 0.8 series. At the moment that's 17 versions of Node in all and on my computer the test suite takes approximately 20 seconds to complete across all of these releases.</p>
<p><strong>The NAN <a href="https://raw.github.com/rvagg/nan/master/.dntrc">.dntrc</a></strong></p>
<div class="highlight"><pre><span class="nv">NODE_VERSIONS</span><span class="o">=</span><span class="s2">"\</span>
<span class="s2"> master \</span>
<span class="s2"> v0.11.9 \</span>
<span class="s2"> v0.11.8 \</span>
<span class="s2"> v0.11.7 \</span>
<span class="s2"> v0.11.6 \</span>
<span class="s2"> v0.11.5 \</span>
<span class="s2"> v0.11.4 \</span>
<span class="s2"> v0.10.22 \</span>
<span class="s2"> v0.10.21 \</span>
<span class="s2"> v0.10.20 \</span>
<span class="s2"> v0.10.19 \</span>
<span class="s2"> v0.10.18 \</span>
<span class="s2"> v0.8.26 \</span>
<span class="s2"> v0.8.25 \</span>
<span class="s2"> v0.8.24 \</span>
<span class="s2"> v0.8.23 \</span>
<span class="s2"> v0.8.22 \</span>
<span class="s2">"</span>
<span class="nv">OUTPUT_PREFIX</span><span class="o">=</span><span class="s2">"nan-"</span>
<span class="nv">TEST_CMD</span><span class="o">=</span><span class="s2">"\</span>
<span class="s2"> cd /dnt/test/ && \</span>
<span class="s2"> npm install && \</span>
<span class="s2"> node_modules/.bin/node-gyp --nodedir /usr/src/node/ rebuild && \</span>
<span class="s2"> node_modules/.bin/tap js/*-test.js; \</span>
<span class="s2">"</span>
</pre></div>
<p>Next I configured <strong><a href="https://github.com/rvagg/node-leveldown">LevelDOWN</a></strong> for <strong>DNT</strong>. The needs are much simpler, the tests need to do a compile and run a lot of node-tap tests.</p>
<p><strong>The LevelDOWN <a href="https://raw.github.com/rvagg/node-leveldown/master/.dntrc">.dntrc</a></strong></p>
<div class="highlight"><pre><span class="nv">NODE_VERSIONS</span><span class="o">=</span><span class="s2">"\</span>
<span class="s2"> master \</span>
<span class="s2"> v0.11.9 \</span>
<span class="s2"> v0.11.8 \</span>
<span class="s2"> v0.10.22 \</span>
<span class="s2"> v0.10.21 \</span>
<span class="s2"> v0.8.26 \</span>
<span class="s2">"</span>
<span class="nv">OUTPUT_PREFIX</span><span class="o">=</span><span class="s2">"leveldown-"</span>
<span class="nv">TEST_CMD</span><span class="o">=</span><span class="s2">"\</span>
<span class="s2"> cd /dnt/ && \</span>
<span class="s2"> npm install && \</span>
<span class="s2"> node_modules/.bin/node-gyp --nodedir /usr/src/node/ rebuild && \</span>
<span class="s2"> node_modules/.bin/tap test/*-test.js; \</span>
<span class="s2">"</span>
</pre></div>
<p>Another native Node add-on that I've set up with <strong>DNT</strong> is my <a href="https://github.com/rvagg/node-libssh">libssh bindings</a>. This one is a little more complicated because you need to have some non-standard libraries installed before compile. My <em>.dntrc</em> adds some extra <code>apt-get</code> sauce to fetch and install those packages. It means the tests take a little longer but it's not prohibitive. An alternative would be to configure the <em>node_dev</em> base-image to have these packages to all of my versioned images have them too.</p>
<p><strong>The node-libssh <a href="https://raw.github.com/rvagg/node-libssh/master/.dntrc">.dntrc</a></strong></p>
<div class="highlight"><pre><span class="nv">NODE_VERSIONS</span><span class="o">=</span><span class="s2">"master v0.11.9 v0.10.22"</span>
<span class="nv">OUTPUT_PREFIX</span><span class="o">=</span><span class="s2">"libssh-"</span>
<span class="nv">TEST_CMD</span><span class="o">=</span><span class="s2">"\</span>
<span class="s2"> apt-get install -y libkrb5-dev libssl-dev && \</span>
<span class="s2"> cd /dnt/ && \</span>
<span class="s2"> npm install && \</span>
<span class="s2"> node_modules/.bin/node-gyp --nodedir /usr/src/node/ rebuild --debug && \</span>
<span class="s2"> node_modules/.bin/tap test/*-test.js --stderr; \</span>
<span class="s2">"</span>
</pre></div>
<p><a href="https://github.com/rvagg/node-levelup">LevelUP</a> isn't a native add-on but it does use LevelDOWN which requires compiling. For the DNT config I'm removing <em>node_modules/leveldown/</em> prior to <code>npm install</code> so it gets rebuilt each time for each new version of Node.</p>
<p><strong>The <a href="https://raw.github.com/rvagg/node-levelup/master/.dntrc">LevelUP .dntrc</a></strong></p>
<div class="highlight"><pre><span class="nv">NODE_VERSIONS</span><span class="o">=</span><span class="s2">"\</span>
<span class="s2"> master \</span>
<span class="s2"> v0.11.9 \</span>
<span class="s2"> v0.11.8 \</span>
<span class="s2"> v0.10.22 \</span>
<span class="s2"> v0.10.21 \</span>
<span class="s2"> v0.8.26 \</span>
<span class="s2">"</span>
<span class="nv">OUTPUT_PREFIX</span><span class="o">=</span><span class="s2">"levelup-"</span>
<span class="nv">TEST_CMD</span><span class="o">=</span><span class="s2">"\</span>
<span class="s2"> cd /dnt/ && \</span>
<span class="s2"> rm -rf node_modules/leveldown/ && \</span>
<span class="s2"> npm install --nodedir=/usr/src/node && \</span>
<span class="s2"> node_modules/.bin/tap test/*-test.js --stderr; \</span>
<span class="s2">#"</span>
</pre></div>
<h3 id="what-s-next-">What's next?</h3>
<p>I have no idea but I'd love to have helpers flesh this out a little more. It's not hard to imagine this forming the basis of a local CI system as well as a general testing tool. The speed even makes it tempting to run the tests on every git commit, or perhaps on every save.</p>
<p>If you'd like to contribute to development then please submit a pull request, I'd be happy to discuss anything you might think would improve this project. I'm keen to share ownership with anyone making significant contributions; as I do with most of my open source projects.</p>
<p>See the <strong><a href="https://github.com/rvagg/dnt">DNT</a></strong> GitHub repo for installation and detailed usage instructions.</p>
LevelDOWN v0.10 / managing GC in native V8 programming2013-11-18T00:00:00.000Zhttps://r.va.gg/2013/11/leveldown-v0.10-managing-gc-in-native-v8-programming.html
<p><img src="https://twimg0-a.akamaihd.net/profile_images/3360574989/92fc472928b444980408147e5e5db2fa_bigger.png" alt="LevelDB"></p>
<p>Today we released version 0.10 of <a href="https://github.com/rvagg/node-leveldown">LevelDOWN</a>. LevelDOWN is the package that directly binds LevelDB into Node-land. It's mainly C++ and is a fairly raw & direct interface to LevelDB. <a href="https://github.com/rvagg/node-levelup">LevelUP</a> is the package that we recommend most people use for LevelDB in Node as it takes LevelDOWN and makes it much more Node-friendly, including the addition of those lovely <em>ReadStreams</em>.</p>
<p>Normally I wouldn't write a post about a minor release like this but this one seems significant because of a number of small changes that culminate in a <em>relatively</em> major release.</p>
<p><strong><em>In this post:</em></strong></p>
<ul>
<li><strong>V8 <code>Persistent</code> references</strong></li>
<li><strong><code>Persistent</code> in LevelDOWN; some removed, some added</strong></li>
<li><strong>Leaks!</strong></li>
<li><strong>Snappy 1.1.1</strong></li>
<li><strong>Some embarrassing bugs</strong></li>
<li><strong>Domains</strong></li>
<li><strong>Summary</strong></li>
<li><strong><em>A final note on Node 0.11.9</em></strong></li>
</ul>
<h3 id="v8-persistent-references">V8 <code>Persistent</code> references</h3>
<p>The main story of this release are <code>v8::Persistent</code> references. For the uninitiated, V8 internally has two different ways to track "handles", which are references to JavaScript objects and values currently active in a running program. There are <code>Local</code> references and there are <code>Persistent</code> references. <code>Local</code> references are the most common, they are the references you get when you create an object or pass them around within a function and do the normal work that you do with an object. <code>Persistent</code> references are a special case that is all about <em>Garbage Collection</em>. An object that has at least one active <code>Persistent</code> reference to it is not a candidate for garbage collection. <code>Persistent</code> references must be explicitly destroyed before they release the object and make it available to the garbage collector.</p>
<p>Prior to V8 3.2x.xx <em>(I don't know the exact version, does it matter? It roughly corresponds to Node v0.11.3.)</em>, these handles were both as easy as each other to create and interchange. You could swap one for the other whenever you needed. My guess is that the V8 team decided that this was a little <em>too</em> easy and that a major cause for memory leaks in C++ V8 code was the ease at which you could swap a <code>Local</code> for a <code>Persistent</code> and then forget to destroy the <code>Persistent</code>. So they tweaked the "ease" equation and it's become quite difficult.</p>
<p><code>Persistent</code> and <code>Local</code> no longer share the same type hierarchy and the way you instantiate and assign a <code>Persistent</code> has become quite awkward. You now have to go through enough gymnastics to create a <code>Persistent</code> that it makes you ask the question: <em>"Do I really need this to be a <code>Persistent</code>?"</em> Which I guess is a good thing for memory leaks. <a href="https://github.com/rvagg/nan">NAN</a> to the rescue though! We've somewhat papered over those difficulties with the capabilities introduced in NAN, it's still not as easy as it once was but it's not a total headache.</p>
<p>So, you understand <code>v8::Persistent</code> now? Great, so back to LevelDOWN.</p>
<h3 id="-persistent-in-leveldown-some-removed-some-added-"><code>Persistent</code> in LevelDOWN; some removed, some added!</h3>
<p><strong>Some removed</strong></p>
<p>Recently, <a href="https://github.com/mcollina">Matteo</a> noticed that when you're performing a <code>Batch()</code> operation in LevelDB, there is an explicit copy of the data that you're feeding in to that batch. When you construct a Batch operation in LevelDB you start off with a short string representing the batch and then build on that string as you build your batch with both <code>Put()</code> and <code>Del()</code> operations. You end up with a long string containing all of your write data; keys and values. Then when you call <code>Write()</code> on the Batch, that string gets fed directly into the main LevelDB store as a single write—which is where the atomicity of Batch comes from.</p>
<p>Both the chained-form and array-form <code>batch()</code> operations work this way internally in LevelDOWN.</p>
<p>However, with almost all operations in LevelDOWN, we perform the actual writes and reads against LevelDB in libuv worker threads. So we have to create the "descriptor" for work in the main V8 Node thread and then hand that off to libuv to perform the work in a separate thread. Once the work is completed we get the results back in the main V8 Node thread from where we can trigger a callback. This is where <code>Persistent</code> references come in.</p>
<p>Before we hand off the work to libuv, we need to make <code>Persistent</code> references to any V8 object that we want to survive across the asynchronous operation. Obviously the main candidate for this is <code>callback</code> functions. Consider this code:</p>
<div class="highlight"><pre><span class="nx">db</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="s1">'foo'</span><span class="p">,</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">err</span><span class="p">,</span> <span class="nx">value</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'foo = %s'</span><span class="p">,</span> <span class="nx">value</span><span class="p">)</span>
<span class="p">})</span>
</pre></div>
<p>What we've actually done is create an anonymous closure for our callback. It has nothing referencing it, so as far as V8 is concerned it's a candidate for garbage collection once the current thread of execution is completed. In Node however, we're doing asynchronous work with it and need it to survive until we actually call it. This is where <code>Persistent</code> references come in. We receive the <code>callback</code> function as a <code>Local</code> in our C++ but then assign it to a <code>Persistent</code> so GC doesn't touch it. Once we're done our async work we can call the function and destroy the <code>Persistent</code>, effectively turning it back in to a <code>Local</code> and freeing it up for GC.</p>
<p>Without the <code>Persistent</code> then the behaviour is indeterminate. It depends on the version of V8, the GC settings, the workload currently in the program and the amount of time the async work takes to complete. If the GC is aggressive enough and has a chance to run before our async work is complete, the <code>callback</code> will disappear and we'll end up trying to call a function that no longer exists. This can obviously lead to runtime errors and will most likely crash our program.</p>
<p>In LevelDOWN, if you're passing in <code>String</code> objects for keys and values then to pull out the data and turn it in to a form that LevelDB can use we have to do an explicit <em>copy</em>. Once we've copied the data from the <code>String</code> then we don't need to care about the original object and GC can get its hands on it as soon as it wants. So we can leave <code>String</code> objects as <code>Local</code> references while we are building the descriptor for our async work.</p>
<p><code>Buffer</code> objects are a different matter all together. Because we have access to the raw character array of a <code>Buffer</code>, we can feed that data straight in to LevelDB and this will save us one <em>copy</em> operation (which can be a significant performance boost if the data is significantly large or you're doing lots of operations—so prefer <code>Buffer</code>s where convenient if you need higher perf). When building the descriptor for the async work, we are just passing a character array to the LevelDB data structures that we're setting up. Because the data is shared with the original <code>Buffer</code> we have to make sure that GC doesn't clean up that <code>Buffer</code> before we have a chance to use the data. So we make a <code>Persistent</code> reference for it which we clean up after the async work is complete. So you can do this without worrying about GC:</p>
<div class="highlight"><pre><span class="nx">db</span><span class="p">.</span><span class="nx">put</span><span class="p">(</span>
<span class="k">new</span> <span class="nx">Buffer</span><span class="p">(</span><span class="s1">'foo'</span><span class="p">)</span>
<span class="p">,</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'crypto'</span><span class="p">).</span><span class="nx">randomBytes</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span>
<span class="p">,</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'foo is now some random data!'</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">)</span>
</pre></div>
<p>This has been the case in LevelDOWN for all operations since pretty much the beginning. But back to Matteo's observation. If LevelDB's data structures perform an explicit copy on the data we feed it then perhaps we don't need to keep the original data safe from GC? For a <code>batch()</code> call it turns out that we don't! When we're constructing the Batch descriptor, as we feed in data to it, both <code>Put()</code> and <code>Del()</code>, it's taking a copy of our data to create its internal representation. So even when we're using <code>Buffer</code> objects on the JavaScript side, we're done with them before the call down in to LevelDOWN is completed so there's no reason to save a <code>Persistent</code> reference! For other operations we're still doing some copying during the asynchronous cycle but the removal of the overhead of creating and deleting <code>Persistent</code> references for <code>batch()</code> calls is fantastic news for those doing bulk data loading (like Max Ogden's <a href="https://github.com/maxogden/dat">dat</a> project which needs to bulk load a <em>lot</em> of data).</p>
<p><strong>Some added</strong></p>
<p>Another gem from Matteo was reports of crashes during certain <code>batch()</code> operations. Difficult to reproduce and only under very particular circumstances, it seems to be mostly reproducible by the kinds of workloads generated by LevelGraph. Thanks to some simple C++ debugging we traced it to a dropped reference, obviously by GC. The code in question boiled down to something like this:</p>
<div class="highlight"><pre><span class="kd">function</span> <span class="nx">doStuff</span> <span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">batch</span> <span class="o">=</span> <span class="nx">db</span><span class="p">.</span><span class="nx">batch</span><span class="p">()</span>
<span class="nx">batch</span><span class="p">.</span><span class="nx">put</span><span class="p">(</span><span class="s1">'foo'</span><span class="p">,</span> <span class="s1">'bar'</span><span class="p">)</span>
<span class="nx">batch</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="kd">function</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'done'</span><span class="p">,</span> <span class="nx">err</span><span class="p">)</span>
<span class="p">})</span>
<span class="p">}</span>
</pre></div>
<p>In this code, the <code>batch</code> object is actually a LevelDOWN <code>Batch</code> object created in C++-land. During the <code>write()</code> operation, which is asynchronous, we end up with no hard references to <code>batch</code> in our code because the JS thread has yieled and moved on and the <code>batch</code> is contained within the scope of the <code>doStuff()</code> function. Because most of the asynchronous operations we perform are relatively quick, this normally doesn't matter. But for writes to LevelDB, if you have enough data in your write and you have enough data already in your data store, you can trigger a compaction upstream which can delay the write which can give V8's GC time to clean up references that might be important and for which you have no <code>Persistent</code> handles.</p>
<p>In this case, we weren't actually creating internal <code>Persistent</code> references for some of our objects. <code>Batch</code> in this case but also <code>Iterator</code>. Normally this isn't a problem because to use these objects you <em>generally</em> keep references to them yourself in your own code.</p>
<p>We managed to debug Matteo's crash by adjusting his test code to look something like this and watching it succeed without a crash:</p>
<div class="highlight"><pre><span class="kd">function</span> <span class="nx">doStuff</span> <span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">batch</span> <span class="o">=</span> <span class="nx">db</span><span class="p">.</span><span class="nx">batch</span><span class="p">()</span>
<span class="nx">batch</span><span class="p">.</span><span class="nx">put</span><span class="p">(</span><span class="s1">'foo'</span><span class="p">,</span> <span class="s1">'bar'</span><span class="p">)</span>
<span class="nx">batch</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="kd">function</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'done'</span><span class="p">,</span> <span class="nx">err</span><span class="p">)</span>
<span class="nx">batch</span><span class="p">.</span><span class="nx">foo</span> <span class="o">=</span> <span class="s1">'bar'</span>
<span class="p">})</span>
<span class="p">}</span>
</pre></div>
<p>By reusing <code>batch</code> inside our <code>callback</code> function, we're creating some work that V8 can't optimise away and therefore has to assume isn't a noop. Because the <code>batch</code> variable is also now referenced by the <code>callback</code> function and we already have an internal <code>Persistent</code> for it, GC has to pass over <code>batch</code> until the <code>Persistent</code> is destroyed for the <code>callback</code>.</p>
<p>So the solution is simply to create a <code>Persistent</code> for the internal objects that need to survive across asynchronous operations and make no assumptions about how they'll be used in JavaScript-land. In our case we've gone for assigning a <code>Persistent</code> just prior to every asynchronous operation and destroying it after. The alternative would be to have a <code>Persistent</code> assigned upon the creation of objects we care about but sometimes we want GC to do its work:</p>
<div class="highlight"><pre><span class="kd">function</span> <span class="nx">dontDoStuff</span> <span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">batch</span> <span class="o">=</span> <span class="nx">db</span><span class="p">.</span><span class="nx">batch</span><span class="p">()</span>
<span class="nx">batch</span><span class="p">.</span><span class="nx">put</span><span class="p">(</span><span class="s1">'foo'</span><span class="p">,</span> <span class="s1">'bar'</span><span class="p">)</span>
<span class="c1">// nothing else, wut?</span>
<span class="p">}</span>
</pre></div>
<p>I don't know why you would write that code but perhaps you have a use-case where you want the ability to start constructing a batch but then decide not to follow through with it. GC should be able to take care of your mess like it does with all of the other messes you create in your daily adventures with JavaScript.</p>
<p>So we are only assigning a <code>Persistent</code> when you do a <code>write()</code> with a chained-batch operation in LevelDOWN since it's the only asynchronous operation. So in <code>dontDoStuff()</code> GC will come along and rid us of <code>batch</code>, <code>'foo'</code> and <code>'bar'</code> when it has the next opportunity and our C++ code will have the appropriate destructors called that will clean up any other objects we have created along the way, like the internal LevelDB <code>Batch</code> with its copy of our data.</p>
<h3 id="leaks-">Leaks!</h3>
<p>We've been having some trouble with leaks in LevelUP/LevelDOWN lately <em>(<a href="https://github.com/rvagg/node-levelup/issues/171">LevelDOWN/#171</a>, <a href="https://github.com/mcollina/levelgraph/issues/40">LevelGraph/#40</a>)</em>. And it turns out that these leaks aren't related to <code>Persistent</code> references, which shouldn't be a surprise since it's so easy to leak with non-GC code, particularly if you spend most of your day programming in a language with GC.</p>
<p>With the help of <a href="http://valgrind.org/">Valgrind</a> we tracked the leak down to the omission of a <code>delete</code> in the destructor of the asynchronous work descriptor for array-batch operations. The internal LevelDB representation of a Batch wasn't being cleaned up unless you were using the chained-form of LevelDOWN's <code>batch()</code>. This one has been dogging us for a few releases now and it's been a headache particularly for people doing bulk-loading of data so I hope we can finally put it behind us!</p>
<h3 id="snappy-1-1-1">Snappy 1.1.1</h3>
<p>Google released a new version of Snappy, version 1.1.1. I don't really understand how Google uses <a href="http://semver.org/">semver</a>; we get very simple LevelDB releases with the minor version bumped and then we get versions of Snappy released with non-trivial changes with only the patch version bumped. I suspect that Google doesn't know how it uses semver either and there's no internal policy on it.</p>
<p>Anyway, Snappy 1.1.1 has some fixes, some minor speed and compression improvements but most importantly it breaks compilation on Windows. So we had to figure out how to fix that for this release. Ugh. I also took the opportunity to clean up some of the compilation options for Snappy and we may see some improvements in the way it works now... perhaps.</p>
<h3 id="some-embarrassing-bugs">Some embarrassing bugs</h3>
<p><a href="https://github.com/kytwb">Amine Mouafik</a> is new to the LevelDOWN repository but has picked up some rather embarrassing bugs/omissions that are probably my fault. It's great to have more eyes on the C++ code, there's not enough JavaScript programmers with the confidence to dig in to messy C++-land.</p>
<p>Firstly, on our standard LevelDOWN releases, it turns out that we haven't actually been enabling the internal <strong>bloom filter</strong>. The bloom filter was introduced in LevelDB to speed up read operations to avoid having to scan through whole blocks to find the data a read is looking for. So that's now enabled for 0.10.</p>
<p>Then he discovered that we had been <strong>turning off compression</strong> by default! I believe this happened with the the switch to NAN. The signature for reading boolean options from V8 objects was changed from internal <code>LD_BOOLEAN_OPTION_VALUE</code> & <code>LD_BOOLEAN_OPTION_VALUE_DEFTRUE</code> macros for defaulting to true and false respectively when the options aren't supplied, to the NAN version which is a unified <code>NanBooleanOptionValue</code> which takes an optional <code>defaultValue</code> argument that can be used to make the default <code>true</code>. This happened at roughly Node version 0.11.4.</p>
<p>Well, this code:</p>
<div class="highlight"><pre><span class="kt">bool</span> <span class="n">compression</span> <span class="o">=</span>
<span class="n">NanBooleanOptionValue</span><span class="p">(</span><span class="n">optionsObj</span><span class="p">,</span> <span class="n">NanSymbol</span><span class="p">(</span><span class="s">"compression"</span><span class="p">));</span>
</pre></div>
<p>is now this:</p>
<div class="highlight"><pre><span class="kt">bool</span> <span class="n">compression</span> <span class="o">=</span>
<span class="n">NanBooleanOptionValue</span><span class="p">(</span><span class="n">optionsObj</span><span class="p">,</span> <span class="n">NanSymbol</span><span class="p">(</span><span class="s">"compression"</span><span class="p">),</span> <span class="nb">true</span><span class="p">);</span>
</pre></div>
<p>so if you don't supply a <code>"compression"</code> boolean option in your db setup operation then it'll now actually be turned on!</p>
<h3 id="domains">Domains</h3>
<p>We've finally caught up with properly supporting Node's <a href="http://nodejs.org/docs/latest/api/domain.html">domains</a> by switching all C++ <code>callback</code> calls from standard V8 <code>callback->Call(...)</code> to Node's own <code>node::MakeCallback(callback, ...)</code> which does the same thing but also does lots of additional things, including accounting for domains. This change was also included in NAN version 0.5.0.</p>
<h3 id="summary">Summary</h3>
<p><strong>Go and upgrade!</strong></p>
<p>leveldown@0.10.0 is packaged with the new levelup@0.18.0 and level@0.18.0 which have their minor versions bumped purely for this LevelDOWN release.</p>
<p>Also released are the packages:</p>
<ul>
<li>leveldown-hyper@0.10.0</li>
<li>leveldown-basho@0.10.0</li>
<li>rocksdb@0.10.0 (based on the same LevelDOWN code) (Linux only)</li>
<li>level-hyper@0.18.0 (levelup on leveldown-hyper)</li>
<li>level-basho@0.18.0 (levelup on leveldown-basho)</li>
<li>level-rocks@0.18.0 (levelup on rocksdb) (Linux only)</li>
</ul>
<p>I'll write more about these packages in the future since they've gone largely under the radar for most people. If you're interested in catching up then please join <strong>##leveldb</strong> on Freenode where there's a bunch of Node database people and also a few non-Node LevelDB people like <a href="https://twitter.com/rescrv">Robert Escriva</a>, author of HyperLevelDB and all-round LevelDB expert.</p>
<h3 id="-a-final-note-on-node-0-11-9-"><em>A final note on Node 0.11.9</em></h3>
<p>There will be a LevelDOWN@0.10.1 very soon that will increment the NAN dependency to 0.6.0 when it's released. This new version of NAN will specifically deal with Node 0.11.9 compatibility where there are more breaking V8 changes that will cause compile errors for any addon not taking them in to account. So if you're living on the edge in Node then we should have a release soon enough for you!</p>
All the levels!2013-10-09T00:00:00.000Zhttps://r.va.gg/2013/10/all-the-levels.html
<p>When we completely separated <a href="https://github.com/rvagg/node-levelup">LevelUP</a> and <a href="https://github.com/rvagg/node-leveldown">LevelDOWN</a> so that installing LevelUP didn't automatically get you LevelDOWN, we set up a new package called <strong><a href="https://github.com/Level/level">Level</a></strong> that has them both as a dependency so you just need to do <code>var level = require('level')</code> and everything is done for you.</p>
<p>But, we now have more than just the vanilla (Google) LevelDB in LevelDOWN. We also have a HyperLevelDB version and a Basho fork. These are maintained on branches in the LevelDOWN repo and are usually released now every time a new LevelDOWN is released. They are called <strong>leveldown-hyper</strong> and <strong>leveldown-basho</strong> in npm but you need to plug them in to LevelUP yourself to make them work. We also have <a href="https://github.com/rvagg/lmdb">Node LMDB</a> that's LevelDOWN compatible and a few others.</p>
<p>So, as of today, we've released a new, small library called <strong><a href="https://github.com/level/level-packager">level-packager</a></strong> that does this bundling process so that you can feed it a LevelDOWN instance and it'll return a Level-type object that can be exported from a package like <strong>Level</strong>. This is meant to be used internally and it's now being used to support these new packages that are available in npm:</p>
<ul>
<li><strong><a href="https://github.com/Level/level-hyper">level-hyper</a></strong> bundles the HyperLevelDB version of LevelDOWN with LevelUP</li>
<li><strong><a href="https://github.com/Level/level-basho">level-basho</a></strong> bundles the Bash fork of LevelDB in LevelDOWN with LevelUP</li>
<li><strong><a href="https://github.com/Level/level-lmdb">level-lmdb</a></strong> bundles Node LMDB with LevelUP</li>
</ul>
<p>The version numbers of these packages will track the version of LevelUP.</p>
<p>So you can now simply do:</p>
<div class="highlight"><pre><span class="kd">var</span> <span class="nx">level</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'level-hyper'</span><span class="p">)</span>
<span class="kd">var</span> <span class="nx">db</span> <span class="o">=</span> <span class="nx">level</span><span class="p">(</span><span class="s1">'/path/to/db'</span><span class="p">)</span>
<span class="nx">db</span><span class="p">.</span><span class="nx">put</span><span class="p">(</span><span class="s1">'foo'</span><span class="p">,</span> <span class="s1">'woohoo!'</span><span class="p">)</span>
</pre></div>
<p>If you're already using <strong>Level</strong> then you can very easily switch it out with one of these alternatives to try them out.</p>
<p>Both HyperLevelDB and the Basho LevelDB fork are binary-compatible with Google's LevelDB, with one small caveat: with the latest release, LevelDB has switched to making <em>.ldb</em> files instead of <em>.sst</em> files inside a data store directory because of something about Windows backups (blah blah). Neither of the alternative forks know anything about these new files yet so you may run in to trouble if you have <em>.ldb</em> files in your store (although I'm pretty sure you can simply rename these to <em>.sst</em> and it'll be fine with any version).</p>
<p>Also, LMDB is completely different to LevelDB so you won't be able to open an existing data store. But you should be able to do something like this:</p>
<div class="highlight"><pre><span class="nx">require</span><span class="p">(</span><span class="s1">'level'</span><span class="p">)(</span><span class="s1">'/path/to/level.db'</span><span class="p">).</span><span class="nx">createReadStream</span><span class="p">()</span>
<span class="p">.</span><span class="nx">pipe</span><span class="p">(</span><span class="nx">require</span><span class="p">(</span><span class="s1">'level-lmdb'</span><span class="p">)(</span><span class="s1">'/path/to/lmdb.db'</span><span class="p">).</span><span class="nx">createWriteStream</span><span class="p">())</span>
</pre></div>
<p>Whoa...</p>
<h3 id="a-note-about-hyperleveldb">A note about HyperLevelDB</h3>
<p>Lastly, I'd like to encourage you to try the HyperLevelDB version if you are pushing hard on LevelDB's performance. The HyperDex fork is tuned for multi-threaded access for reads and writes and is therefore particularly suited to how we use it in Node. The Basho version doesn't show much performance difference mainly because they are optimising for Riak running 16 separate instances on the same server so multi-threaded access isn't as interesting for them. You should find significant performance gains if you're doing very heavy writes in particular with HyperLevelDB. Also, if you're interested in support for HyperLevelDB then pop in to ##leveldb on Freenode and bother <em><a href="https://twitter.com/rescrv">rescrv</a></em> (Robert Escriva), author of HyperLevelDB and our resident LevelDB expert.</p>
<p>It's also worth nothing that HyperDex are interested in offering commercial support for people using LevelDB, not just HyperLevelDB but also Google's LevelDB. This means that anyone using either of these packages in Node should be able to get solid support if they are doing any heavy work in a commercial environment and need the surety of experts behind them to help pick up the pieces. I imagine this would cover things like LevelDB corruption and any LevelDB bugs you may run in to (we're currently looking at a subtle <a href="https://github.com/rvagg/node-levelup/issues/171">batch-related LevelDB bug</a> that's come along with the 1.14.0 release, they do exist!). Talk to Robert if you want more information about commercial support.</p>
Should I use a single LevelDB or many to hold my data?2013-10-03T00:00:00.000Zhttps://r.va.gg/2013/10/should-i-use-a-single-leveldb-or-many-to-hold-my-data.html
<p>This is a long overdue post, so long in fact that I can't remember who I promised to do this for! Regardless, I keep on having discussions around this topic so I thought it worthwhile putting down some notes on what I believe to be the factors you should consider when making this decision.</p>
<h3 id="what-s-the-question-">What's the question?</h3>
<p>It goes like this: You have an application that uses LevelDB, in particular I'm talking about Node.js applications here but the same would apply if you're using LevelUP in the browser and also most of the other back-ends for LevelUP. And you invariably end up with different kinds of data, sometimes the kinds of data you're storing is so different that it feels strange putting them into the same storage blob. Often though, you just have sets of not-very-related data that you need to store and you end up having to make a decision: <strong>do I put everything into a single LevelDB store or do I put things into their own, separate, LevelDB store?</strong></p>
<h3 id="this-stuff-doesn-t-belong-together-">This stuff doesn't <em>belong</em> together!</h3>
<p>Coming from an relational database background, it took me a little while to displace the concept of discrete <em>tables</em> with the notion of <em>namespacing</em> within the same store. I can understand the temptation to want to keep things separate, not wanting to end up with a huge blob of data that just <em>shouldn't be together</em>. But this isn't the relational database world and you need to move on!</p>
<p>We have a set of LevelUP addons, such as <a href="https://github.com/dominictarr/level-sublevel">sublevel</a>, that exist mainly to provide you with the comfort of being able to separate your data by whatever criteria makes sense. <a href="https://github.com/deanlandolt/bytewise">bytewise</a> is another tool that can serve a similar purpose and some people even use sublevel and bytewise together to achieve more complex organisation.</p>
<p><strong>We have the tools at our disposal in Node.js to turn a one-dimensional storage array into a very complex, multidimensional storage <em>system</em> where unrelated, and semi-related data can coexist.</strong> So, if the only reason you want to store things in separate stores is because it just <em>feels</em> right to do so, you should probably be looking at what's making you think that way. You may need to update your assumptions.</p>
<h3 id="technical-considerations">Technical considerations</h3>
<p>That aside, there are some technical considerations for making this decision:</p>
<h4 id="size-and-performance">Size and performance</h4>
<p>To be clear, <strong>LevelDB is fast</strong> and it can also store <strong>lots of data</strong>, it'll handle Gigabytes of data without too much sweat. However, there <em>are</em> some performance concerns when you start getting in to the Gigabyte range, mainly when you're trying to push data in at a high rate. Most use-cases don't do this so be honest about your performance needs. For most people LevelDB is simply fast.</p>
<p>However, if you do have a high-throughput scenario involving a large amount of data that you need to store then you may want to consider having a separate store to deal with the large data and another one to deal with the rest of your data so the performance isn't impacted across the board.</p>
<p>But again, be honest about what your workload is, you're probably not pushing <a href="http://voxer.com">Voxer</a> amounts of data so don't prematurely optimise around the workload you'd like to think you have or are going to have one day in the distant future.</p>
<h3 id="cache">Cache</h3>
<p>Caching is transparent by default with LevelDB so it's easy to forget about it when making these kinds of decisions but it's actually quite important for this particular question.</p>
<p>By default, you have an 8M LRU cache with LevelDB and <em>all</em> reads use that cache, for look-ups and also for updating with newly read values. So, you can have a lot of cache-thrash unless you're reading the same values again and again. </p>
<p>But, there is a <code>fillCache</code> (boolean) option for read operations (both <code>get()</code> and <code>createReadStream()</code> and its variations). So you can set this to <code>false</code> where you know you won't be needing fast access to those entries again and you don't want to push out other entries from the LRU.</p>
<p>So caching strategies can be separate for different types of data and are not a strong reason to keep things in a separate data store.</p>
<p>I always recommend that you should tinker with the <code>cacheSize</code> option when you're using LevelDB, it can be as large as you want to fit in the available memory of your machine. As a rule of thumb, somewhere between 2/3 and 3/4 of the available memory should be a maximum if you can afford it.</p>
<p>Consider though what happens if you are using separate LevelDB stores, you now have to deal with juggling <code>cacheSize</code> between the stores. Often, you're probably going to be best served by having a single, large cache that can operate across all your data types and let the normal behaviour of your application determine what gets cached with occasional reliance on <code>'fillCache': false</code> to fine-tune. </p>
<h3 id="consistency">Consistency</h3>
<p>As I discussed in my <a href="https://r.va.gg/presentations/lxjs2013/">LXJS</a> talk, the <em>atomic batch</em> is an important primitive for building solid database functionality with inherent <em>consistency</em>. When you're using <strong>sublevel</strong>, even though you have what operate like separate LevelUP instances for each sublevel, you still get to perform atomic batch operations between sublevels. Consider indexing where you may have a primary sublevel for the entries you're writing and a secondary sublevel for the indexing data used to reference the primary data for lookups. If you're running these as separate stores then you lose the benefits of the atomic batch, you just can't perform multiple operations with guaranteed consistency.</p>
<p>Try and keep the atomic batch in mind when building your application, instead of accepting the possibility of inconsistent state, use the batch to keep consistency.</p>
<h3 id="back-end-flexibility">Back-end flexibility</h3>
<p>OK, this one is a bit left-field, but remember that LevelUP is back-end-agnostic. It's inspired by LevelDB but it doesn't have to be Google's LevelDB that's storing data for you. It could be Basho's fork or HyperLevelDB. It could even be LMDB or something a little crazy like MemDOWN or mysqlDOWN! </p>
<p>If you're at all concerned about performance, and most people claim to be even though they're not building performance-critical applications, then you should be benchmarking your particular workload against your storage system. Each of the back-ends for LevelUP have different performance characteristics and different trade-offs that you need to understand and test against your needs. You may find that one back-end works for one kind of data in your application and another back-end works for another.</p>
<h3 id="summary">Summary</h3>
<p>The TL;DR is: in most cases, a single LevelDB store is generally preferable unless you have a <em>real</em> reason for having separate ones.</p>
<p>Have I missed any considerations that you've come across when making this choice? Let me know in the comments.</p>
Primitives for JS Databases (an LXJS adventure)2013-10-03T00:00:00.000Zhttps://r.va.gg/2013/10/primitives-for-js-databases-an-lxjs-adventure.html
<p>I gave a talk yesterday at <strong><a href="http://2013.lxjs.org">LXJS</a></strong> yesterday in the <em>"Infrastructure.js"</em> block and tried to talk about JavaScript Database Primitives; i.e. the basic building blocks we have landed on for building more complex database solutions in JavaScript.</p>
<p>The talk certainly wasn't as good or clear as I wanted it to be, it worked much better in my head! A huge venue with over 300 talented JavaScripters, an absolutely massive screen, bright lights and loud amplification got the better of me and I wasn't able to pull the material together how I wanted to. The introvert within me is telling me to become a recluse for a little while just to recover! My <em>hope</em> is that at least one or two people are inspired to give <em>database hacking</em> a go because it's really not that difficult once you get your head around the primitives.</p>
<p><strong><em>Edit:</em></strong> <em>I wasn't trying to elicit sympathy here, I genuinely think that I wasn't clear on what I was trying to communicate. It went so well in my head, as it usually does, but I fell far short of what I wanted to express. I'll attempt to rectify some of that with a writeup (see next para).</em></p>
<p>Thankfully though, a portion of the material will be able to serve as the basis for the, long overdue, third part in my <a href="http://dailyjs.com/2013/04/19/leveldb-and-node-1/">three</a> <a href="http://dailyjs.com/2013/05/03/leveldb-and-node-2/">part</a> <a href="http://dailyjs.com">DailyJS</a> series on LevelDB & Node.</p>
<p>In summary, inspired by LevelDB, we've ended up with a core set of primitives in <a href="https://github.com/rvagg/node-levelup">LevelUP</a> that can be used to build feature-rich and advanced database functionality. <strong>Atomic batch</strong> and <strong>ReadStream</strong> are the two non-trivial primitives, open, close, get, put, del are all pretty easy to understand as primitives, although <em>del</em> is perhaps redundant but we're opting for explicitness.</p>
<p>My <a href="https://r.va.gg/presentations/lxjs2013">slides are online</a> but hopefully I'll be able to get my DailyJS article sorted out soon and I'll be able to explain what I was trying to get at.</p>
<p>ReadStream as a primitive query mechanism is not too hard to understand once you get your head around key sorting and the implications for key structure. Batch is a little more subtle and relates to consistency and our ability to augment basic operations to create more complex functionality while keeping the data store in a consistent state.</p>
<p>I additionally raised "Buckets", or "Namespaces" as a primitive concept and discussed how <a href="https://github.com/dominictarr/level-sublevel">sublevel</a> has effectively become the standard for turning a one-dimensional data store into a multi-dimensional store able to encapsulate contain sophisticated functionality behind what is essentially just a key/value store.</p>
<h3 id="thanks-to-the-lxjs-team">Thanks to the LXJS team</h3>
<p>It would be neglectful of me to not say how absolutely grateful I am to the LXJS team for putting so much effort into taking care of speakers; fantastic job.</p>
<p>LXJS is an amazing event, put on by a dedicated and very talented team of people committed to the JavaScript community and the JavaScript community in Portugal in particular. This conference sets a very high bar for community-driven conferences with the way it has managed to get so many locals (and internationals!) involved in running an event in their own time.</p>
<p><strong>David Dias, Ana Hevesi, Pedro Teixeira, Luís Reis, Nuno Job, Tiago Rodrigues, Leo Xavier, Alexander Kustov, André Rodrigues and Bruno Coelho</strong> have managed to put on an amazing event and are some of the nicest and talented people I've met. Thank you to you all and everyone else who put on LXJS 2013, your hard work is appreciated and should be an inspiration to everyone involved in our local JavaScript communities, running events or considering running events like this.</p>
NodeConf.eu2013-09-27T00:00:00.000Zhttps://r.va.gg/2013/09/nodeconf.eu.html
<p>Wow, <strong><a href="http://nodeconf.eu/">NodeConf.eu</a></strong> was certainly a once-in-a-lifetime event ... although there's talk of a repeat performance next year (don't miss the chance when it comes around!).</p>
<div style="text-align: center;">
<img src="https://r.va.gg/images/2013/09/nodeconfeu_raiseflag.jpg" alt="Raise that flag">
<p style="text-align: center;">Dominic Tarr, @substack and Julian Gruber raising the NodeConf.eu flag</p>
</div>
<p>NodeConf.eu was held in Waterford, Ireland, on an <strong>Island</strong>, in a <strong>Castle</strong> and was organised by the Node lovin' company, <a href="http://nearform.com/">nearForm</a>, in particular <a href="http://cianomaidin.com/">Cian O'Maidin</a> and his amazing assistant Catherine Bradley. Of course <a href="http://futurealoof.com/">Mikeal Rogers</a> had a significant role in organising the event too.</p>
<div style="text-align: center;">
<img src="https://r.va.gg/images/2013/09/nodeconfeu_castle.jpg" alt="Waterford Castle">
<p style="text-align: center;"><a href="http://waterfordcastle.com/">Waterford Castle</a></p>
</div>
<div style="text-align: center;">
<img src="https://r.va.gg/images/2013/09/nodeconfeu_pig.jpg" alt="Pig">
<p style="text-align: center;">The welcome banquet ... yep</p>
</div>
<p>Instead of describing the talks, I'll defer to the <a href="http://clock.co.uk/tech-blogs/nodeconfeu-2013-part-one">excellent</a> <a href="http://clock.co.uk/tech-blogs/nodeconfeu-2013-part-two">four</a> <a href="http://clock.co.uk/tech-blogs/nodeconfeu-2013-part-three">part</a> <a href="http://clock.co.uk/tech-blogs/nodeconfeu-reflection">series</a> by Paul, Adam, Luke and Ben of <a href="http://clock.co.uk/">Clock</a> where you'll find a great summary of the talks and events of the conference.</p>
<p>For my part, I was deeply honoured to be involved in the <em>"Node Databases"</em> track of the conference. We started off the NodeConf.eu talks with a 3-part show. My talk was titled "A Real Database Rethink" and was followed by <a href="https://twitter.com/dominictarr">Dominic Tarr</a> who talked more about the Level* ecosystem and the various pieces of the Node Databases puzzle that's being built. <a href="http://juliangruber.com/">Julian Gruber</a> then closed us off with some amazing live-coding of some browser/server streaming LevelUP/multilevel <a href="https://github.com/juliangruber/nodeconfeu-13">wizardry</a>.</p>
<h3 id="a-real-database-rethink">A Real Database Rethink</h3>
<p>The slides of my talk are <a href="https://r.va.gg/presentations/nodeconfeu.2013/">online</a>. I attempted to break down the definition of the term <em>"database"</em> by looking at where the concept comes from historically. It's actually a difficult thing to define and I don't believe there is any one agreed upon meaning. What I came up with is:</p>
<blockquote>
<p>A tool for interacting with structured data, externalised from the core of our application</p>
<ul>
<li>Persistence</li>
<li>Performance</li>
<li>Simplify access to complex data</li>
</ul>
<p>And sometimes...</p>
<ul>
<li>Shared access</li>
<li>Scalability</li>
</ul>
</blockquote>
<p>But even that's pretty rough.</p>
<p>Taking that definition, we can apply Node philosophy of small-core and vibrant user-land, along with the culture of extreme modularity afforded us by npm, and build a new kind of database; or at least apply new thinking to the "database".</p>
<p>The bulk of my talk was taken up with talking about LevelUP and the basics of the Level<em> ecosystem. There's a table on slide #7 that I'm going to try and refine over time to help describe what the Level</em> / NodeBase world is all about.</p>
<h3 id="level-me-up-scotty-">Level Me Up Scotty!</h3>
<p>One of the three workshops available at NodeConf.eu was all about Node Databases. I took the same approach as at <a href="http://campjs.com/">CampJS</a> recently where I built <a href="https://r.va.gg/2013/08/learn-you-the-node.js.html">Learn You The Node.js For Much Win!</a>, a tool that owes a debt to <a href="https://github.com/substack/stream-adventure">stream-adventure</a>, a self-guided workshop-in-your-terminal application by <a href="https://twitter.com/substack">@substack</a> and <a href="https://twitter.com/maxogden">Max Ogden</a> written for NodeConf (US).</p>
<p>This time around, I received some great help from both @substack and Julian Gruber who helped write some exercises, I also received help from <a href="http://twitter.com/eugeneware">Eugene Ware</a> who wasn't even at the conference but was assisting with development from Australia. <a href="http://twitter.com/raynos2">Raynos</a> was also a great help in getting the application working well.</p>
<p>We ended up with <strong><em>Level Me Up Scotty!</em></strong>, or just <strong>levelmeup</strong>.</p>
<div style="text-align: center;">
<img src="https://raw.github.com/rvagg/levelmeup/master/levelmeup.png" alt="levelmeup">
</div>
<p>Dominic Tarr, <a href="https://twitter.com/thlorenz">Thorsten Lorenz</a>, <a href="https://twitter.com/hij1nx">Paolo Fragomeni</a>, <a href="http://www.matteocollina.com/">Matteo Collina</a>, <a href="https://twitter.com/ralphtheninja">Magnus Skog</a>, Max Ogden and other experienced <em>Levelers</em> helped on and off while the workshops were happening; so we had plenty of expertise at hand whenever there were questions.</p>
<p>Workshops were unstructured and the organisers of each workshop all ended up agreeing that we should just let people come and go as they pleased. This suited us as the workshop was open-ended and designed not to be finished by most people within the originally planned hour <em>(I think that was the original plan)</em>.</p>
<p><strong><a href="https://github.com/rvagg/levelmeup">levelmeup</a></strong> is installed from npm (<code>npm install levelmeup -g</code>) and is fully self-guided. You run the <code>levelmeup</code> application and it steps you through some exercises designed to:</p>
<ul>
<li>introduce you to the format of the workshops with a simple "Hello World" style exercise</li>
<li>introduce you to LevelUP and its basic operations</li>
<li>help you understand ReadStream and the range-queries it makes possible</li>
<li>encourage creative thought regarding key structure</li>
<li>introduce <a href="https://github.com/dominictarr/level-sublevel">sublevel</a></li>
<li>introduce <a href="https://github.com/juliangruber/multilevel">multilevel</a></li>
</ul>
<p>There's more planned for the future of this workshop application too, Matteo even has an a <a href="https://github.com/rvagg/levelmeup/pull/19">work-in-progress exercise</a> that should be merged fairly soon.</p>
<p><strong><a href="http://nodeschool.io/">nodeschool.io</a></strong> was hatched from NodeConf.eu and pulls together the three workshop applications currently available in npm. I believe this was an initiative of <a href="https://twitter.com/brianloveswords">Brian J. Brennan</a> and other Mozillans on the <a href="http://openbadges.org/">Open Badges</a> project. <strong><a href="https://github.com/rvagg/workshopper">workshopper</a></strong> is the engine that runs both learnyounode and levelmeup and we're trying to make it even easier for others to author their own workshop applications. There is already a <a href="https://github.com/timoxley/functional-javascript-workshop/">Functional Javacript Workshop</a> by <a href="https://twitter.com/secoif">Tim Oxley</a> and there are more in development. Exciting times!</p>
<div style="text-align: center;">
<img src="https://r.va.gg/images/2013/09/nodeconfeu_levelmeup.jpg" alt="Level Me Up Workshoppers">
<p style="text-align: center;">Workshoppers stretching their brains with <strong>levelmeup</strong></p>
</div>
<p>My experience with <strong>stream-adventure</strong> and <strong>learnyounode</strong> suggested that this format should prove to be relatively successful but ultimately I think we had most of the attendees come through at some point and sit down to have a crack at the workshop. This is particularly impressive given that <a href="http://nexxylove.tumblr.com/">Emily Rose</a>, <a href="http://tmpvar.com/">Elijah Insua</a> and Matteo were running a NodeBots workshop which included Arduino and NodeCopter hacking (always popular!). And <a href="https://twitter.com/mrbruning">Max Bruning</a> and <a href="https://twitter.com/tjfontaine">TJ Fontaine</a> were running a Manta / MDB / DTrace / SmartOS-magic workshop and their material was some of my favourite from NodeConf (US) so I'm sure people really enjoyed what they had to present.</p>
<p>Unfortunately I didn't get to attend these other workshops, I also missed out on some skeet!</p>
<div style="text-align: center;">
<img src="https://farm6.staticflickr.com/5338/9726258926_e3ea4a656f_z.jpg" alt="Skeet">
<p style="text-align: center;">Karolina <em>"don't mess with me"</em> Szczur, photo by <a href="http://www.flickr.com/photos/matthewbergman/sets/72157635446400980/">Matthew Bergman</a></p>
</div>
<p>But there was plenty of other <em>experience</em> to be had. It was also fantastic to meet so many people I only knew from IRC / Twitter / GitHub. For someone who lives in regional Australia and doesn't get a chance to socialise much with other nerds, this was a particularly special opportunity.</p>
<div style="text-align: center;">
<img src="https://farm8.staticflickr.com/7392/9783982165_43ca4edef2_z.jpg" alt="Shenanigans">
<p style="text-align: center;">Final night banquet shenanigans with <a href="https://twitter.com/Av1anFlu">Charlie McConnell</a> and @substack ... the napkin hat thing is a story in itself, blame <a href="https://twitter.com/jllord">Jessica Lord</a>, photo by <a href="http://www.flickr.com/photos/matthewbergman/sets/72157635446400980/">Matthew Bergman</a></p>
</div>
<h3 id="the-level-gang">The Level* Gang</h3>
<p>As an aside, NodeConf.eu had the largest concentration of LevelUP contributors and active Level* developers of any event that I'm aware of so far. So we took the opportunity to have our own little meeting. We even took minutes, <a href="https://github.com/karolinaszczur/leveldb.org/blob/master/meetup-nodeland">of sorts</a>.</p>
<p>There has been a long-standing plan to make a Level* / NodeBase website but being the disorganised rabble we are, it hasn't got off the ground. Karolina (and Jessica too I believe) are keen to help out on the design end but just need the content. So that's what we planned. There's a bunch of issues that form a TODO in the <a href="https://github.com/karolinaszczur/leveldb.org/issues">repo</a> for this project. Hopefully we can all get on top of it sooner rather than later. We're also open to assistance from anyone else that would like to contribute.</p>
<p>Besides getting stuff done, it was just a pleasure to hang out with these people and talk <em>shop</em>.</p>
<div style="text-align: center;">
<img src="https://r.va.gg/images/2013/09/nodeconfeu_levelgang.jpg" alt="A momentus event">
<p style="text-align: center;"><strong>The Level* Gang</strong>: Paolo, Dominic, @substack, Karolina, Magnus, Mikeal, Julian, Max, Matteo and <a href="https://twitter.com/paulfryzel">Paul Fryzel</a>. Raynos was around but missed this particular <em>event</em>, Thorsten was inside demoing his guitar-typing software.</p>
</div>
Learn You The Node.js2013-08-14T00:00:00.000Zhttps://r.va.gg/2013/08/learn-you-the-node.js.html
<p><strong><a href="http://campjs.com/">CampJS</a></strong> has just finished, with a bigger crowd than last time around. It was lots of fun, and as usual, these events are more about meeting the people I collaborate, and socialise with online than anything else. There was a particularly large turn-out of the hackers on #polyhack, our Australian programmers channel on Freenode. Even <a href="https://twitter.com/mwotton">@mwotton</a>, our resident Haskell-troll was there! Lots of photos and news can be found on <a href="http://storify.com/campjs/campjs-ii">Storify</a>. The next one will likely be near Melbourne in February some time and I highly recommend it if you can get there.</p>
<h3 id="learn-you-the-node-js-for-much-win-presentation-">Learn You The Node.js For Much Win (presentation)</h3>
<p>I was struck last CampJS how many JavaScript newbies were there, or at least people who deal with JavaScript as a secondary language and therefore only have a cursory understanding of it. And by extension, there were not many people who had much understanding of Node. So I wanted to present some intro-to-Node material this time.</p>
<p>I gave a 30 minute talk covering the very basics of <em>what Node <strong>is</strong></em>, called <strong>Learn You The Node.js For Much Win</strong>. Obviously the title is inspired by <em><a href="http://learnyouahaskell.com/">Learn You a Haskell For Great Good</a></em> and <em><a href="http://learnyousomeerlang.com/">Learn You Some Erlang For Great Good</a></em>. You can find my slides <a href="https://r.va.gg/presentations/campjs-learn-you-node/">here</a> (feel free to rip them off if you need to give a similar talk somewhere!). The video may be online at some point in the future.</p>
<h3 id="learn-you-the-node-js-for-much-win-workshop-">Learn You The Node.js For Much Win (workshop)</h3>
<p><img src="https://pbs.twimg.com/media/BRWaBeeCcAA9R7v.jpg" style="border-radius:4px; border: solid 2px white; box-shadow: 1px 1px 15px rgba(0,0,0,0.4);"></p>
<p>The next morning, I gave a workshop on the same topic but it was much more hands-on. The inspiration for my workshop came from <a href="http://www.nodeconf.com/">NodeConf</a>, a couple of months earlier. <a href="https://twitter.com/substack">@substack</a> and <a href="https://twitter.com/maxogden">@maxogden</a> presented a workshop titled <strong>stream adventure</strong> which was a self-guided, interactive workshop for the terminal, built with Node. You can find it <a href="https://github.com/substack/stream-adventure">here</a> and install it from npm with <code>npm install stream-adventure -g</code>, I highly recommend it.</p>
<p><a href="https://nodei.co/npm/stream-adventure/"><img src="https://nodei.co/npm/stream-adventure.png?downloads=true&stars=true" alt="NPM"></a></p>
<p>I was so inspired that I stole their code and made my own workshop application! <strong><a href="https://github.com/rvagg/learnyounode/">learnyounode</a></strong>. You can download and install it with <code>npm install learnyounode -g</code>.</p>
<p><a href="https://nodei.co/npm/learnyounode/"><img src="https://nodei.co/npm/learnyounode.png?downloads=true&stars=true" alt="NPM"></a></p>
<p>The application itself is/was a series of 13 separate workshops. Starting off with a simple <em>HELLO WORLD</em> and ending with a JSON API HTTP server (contributed by the very clever <a href="https://twitter.com/sidorares">@sidorares</a>).</p>
<p><img src="https://raw.github.com/rvagg/learnyounode/master/learnyounode.png" alt="learnyounode"></p>
<p>Nobody actually managed to finish the workshops in the allotted 60 minutes, although <a href="http://twitter.com/alexdickson">@alexdickson</a>, an expert JavaScripter but Node-n00b was the first one I heard of finishing it not long after.</p>
<p>The workshops attempt to focus on some of the core concepts of Node. There's lots of console output because that's easiest to validate but it introduces filesystem I/O, both synchronous and asynchronous and moves straight on to networking because that's what Node is so good at. An <em>HTTP CLIENT</em> example, introduces HTTP and is expanded on in <em>HTTP COLLECT</em> which introduces streams. <em>JUGGLING ASYNC</em> builds on <em>HTTP COLLECT</em> to introduce the complexities of managing parallel asynchronous activities. From there, it switches from network clients to network servers, first a simple TCP server in <em>TIME SERVER</em> and then using streams to serve files in <em>HTTP FILE SERVER</em> and transforming data with <em>HTTP UPPERCASERER</em>. The final exercise presents you with a more complex, closer-to-real-world example, an HTTP API server with multiple end-points.</p>
<p>The entire workshop is designed to take longer than 1-hour, people ought to be able ot take it away and complete it later. It's also designed to be suitable for complete n00bs and also people with some experience, it ought to make a fun challenge for anyone already experienced with Node to see how quickly they can complete the examples (I believe I earned the honour of being the first person at NodeConf to finish stream-adventure in the allotted time!).</p>
<p>The Node-experts at CampJS were thankfully helping out during the workshop so there wasn't much competition going on there.</p>
<p>Many thanks to these expert Node nerds who hovered and helped people during the workshop and also did some test-driving of the workshop prior to the event:</p>
<ul>
<li><a href="https://twitter.com/nicholasf">Nicholas Faiz</a></li>
<li><a href="https://twitter.com/cgiffard">Christopher Giffard</a></li>
<li><a href="https://twitter.com/secoif">Tim Oxley</a> (who also poured his heart and soul into organising CampJS)</li>
<li><a href="http://twitter.com/deoxxa">Conrad Pankoff</a></li>
<li><a href="https://twitter.com/sidorares">Andrey Sidorov</a> (who also contributed the final exercise of the workshop)</li>
<li><a href="https://twitter.com/EugeneWare">Eugene Ware</a> (who was also brilliant all weekend, running the local <a href="http://en.wikipedia.org/wiki/Sneakernet">sneakernet</a> because the network was so flakey)</li>
</ul>
<p><em>(I really hope I haven't missed anyone out there; so many quality nerds at CampJS!)</em></p>
<p><img src="https://lh5.googleusercontent.com/-tKp0U1N7XNw/UgngKk01qqI/AAAAAAAAAoc/xxAOCTqMCZ0/w600-h800-no/campJS+%252870+of+118%2529.jpg" style="border-radius:4px; border: solid 2px white; box-shadow: 1px 1px 15px rgba(0,0,0,0.4);"></p>
<p><em>Tim Oxley making a contribution during the workshop, along with Christopher Giffard (left) and Eugene Ware (right)</em></p>
<p>I had the <a href="https://r.va.gg/presentations/campjs-learn-you-node/workshop.html">solutions</a> to the workshop ready on the big-screen and walked through some of the early solutions and talked through what was going on. I didn't expect many people to listen to those bits and the workshop was designed so you could totally zone-out and do it at your own pace if that suited.</p>
<p>If anyone wants to run a similar style workshop for their local meet-up, using the same content, I'd love to receive contributions to <strong>learnyounode</strong>. Alternatively, make your own! I extracted the core framework from <strong>learnyounode</strong> and it now lives separately as <strong><a href="https://github.com/rvagg/workshopper">workshopper</a></strong>.</p>
<p><a href="https://nodei.co/npm/workshopper/"><img src="https://nodei.co/npm/workshopper.png?downloads=true&stars=true" alt="NPM"></a></p>
<p>I would love feedback from anyone in attendance or anyone that uses this tool to run their own workshops! <strong>learnyounode</strong> is already listed in Max Ogden's excellent <strong><a href="https://github.com/maxogden/art-of-node">The Art of Node</a></strong>, so I'm looking forward to contributions to help turn this into a really useful teaching tool.</p>
LevelDOWN Alternatives2013-06-07T00:00:00.000Zhttps://r.va.gg/2013/06/leveldown-alternatives.html
<p>Since <strong><a href="https://github.com/rvagg/node-levelup">LevelUP</a></strong> v0.9, <strong><a href="https://github.com/rvagg/node-leveldown/">LevelDOWN</a></strong> has been an optional dependency, allowing you to switch in alternative back-ends.</p>
<p>We have <strong><a href="https://github.com/rvagg/node-memdown">MemDOWN</a></strong>, a pure in-memory data-store, allowing you to run LevelUP against transient, and very fast storage.</p>
<p>We also have <strong><a href="https://github.com/maxogden/level.js">level.js</a></strong> which works against <strong>IndexedDB</strong>, allowing you to run LevelUP in the browser!</p>
<p>Since LevelUP just needs some basic primitives and a sorted bi-directional iterator, we can swap out the back-end with numerous alternatives.</p>
<p>The easy targets are the forks of LevelDB that purport to <em>fix</em> or <em>improve</em> LevelDB in some way. I have another post brewing on what I think about the claims made in this area and how we ought to approach them, but that can come later. For now I have some packages in npm for you to try for yourself!</p>
<h2 id="basho">Basho</h2>
<p>First of all we have <strong>leveldown-basho</strong> which bundles the <a href="https://github.com/basho/leveldb">Basho LevelDB fork</a> into LevelDOWN. See Matthew Von-Maszewski's <a href="https://speakerdeck.com/basho/optimizing-leveldb-for-performance-and-scale-ricon-east-2013">slides</a> from the recent Ricon East 2013 for more information on what they've tried to do with LevelDB.</p>
<p>In summary, Basho's aim is to optimise LevelDB "for the server", particularly for high write throughput. They use >1 compaction threads and relax the rules a little on overlapping keys for the lower levels. Plus a few other things that I won't get in to here.</p>
<div class="highlight"><pre><span class="nv">$ </span>npm install levelup leveldown-basho
</pre></div>
<div class="highlight"><pre><span class="kd">var</span> <span class="nx">levelup</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'levelup'</span><span class="p">)</span>
<span class="p">,</span> <span class="nx">leveldown</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'leveldown-basho'</span><span class="p">)</span>
<span class="p">,</span> <span class="nx">db</span> <span class="o">=</span> <span class="nx">levelup</span><span class="p">(</span><span class="s1">'/path/to/db'</span><span class="p">,</span> <span class="p">{</span> <span class="nx">db</span><span class="o">:</span> <span class="nx">leveldown</span> <span class="p">})</span>
<span class="c1">// go to work on `db`</span>
</pre></div>
<p><em>Disclaimer: some of the LevelDOWN and LevelUP tests are failing on the current build for this release, although I don't believe they should impact on standard usage but your mileage may vary...</em></p>
<h2 id="hyperdex">HyperDex</h2>
<p>Next, we have <strong>leveldown-hyper</strong>, which bundles a fork by the people behind <a href="http://hyperdex.org/">HyperDex</a>, a key-value store. Again their aim is to optimise LevelDB for a server environment. You can see some of their claims about performance <a href="http://hyperdex.org/performance/leveldb/">here</a>. I don't know as much about this fork, I'll investigate further when I have time, but they are also using multiple compaction threads to do the background work.</p>
<div class="highlight"><pre><span class="nv">$ </span>npm install levelup leveldown-hyper
</pre></div>
<div class="highlight"><pre><span class="kd">var</span> <span class="nx">levelup</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'levelup'</span><span class="p">)</span>
<span class="p">,</span> <span class="nx">leveldown</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'leveldown-hyper'</span><span class="p">)</span>
<span class="p">,</span> <span class="nx">db</span> <span class="o">=</span> <span class="nx">levelup</span><span class="p">(</span><span class="s1">'/path/to/db'</span><span class="p">,</span> <span class="p">{</span> <span class="nx">db</span><span class="o">:</span> <span class="nx">leveldown</span> <span class="p">})</span>
<span class="c1">// go to work on `db`</span>
</pre></div>
<h2 id="-i-strike-lies-strike-i-benchmarks-"><i><strike>Lies!</strike></i> Benchmarks!</h2>
<p>OK, benchmarks kind of suck, particularly microbenchmarks. It's really hard to test something that's meaningful for everyone's use-case. But you can make pretty pictures with them and they can tell something of a story, even if it's just the first page of a novel.</p>
<p>So here we go. I've put together a simplistic benchmark that tries to test the kind of situation that these two forks are aiming to optimise for. In particular, high-throughput writes. There's a common claim that LevelDB has problems with writes because the compaction thread can hold up levels 0 and 1 while it's working on higher levels; and you really want to be flushing the new data as soon as possible so you can get more in. (Again, I have more to say on this & the claims about "fixing" the problem in a later post).</p>
<p>I have a sorted-write benchmark in the <a href="https://github.com/rvagg/node-leveldown/tree/master/bench">LevelDOWN repo</a> that tries to push in 10M pre-sorted entries as fast as possible, fully utilising Node's worker-threads for the job. So this isn't your typical browser scenario. An important point here is that <strong>Node is a unique environment when looking at LevelDB performance</strong>. It's not going to be a straightforward mapping of benchmark results obtained with other LevelDB bindings onto what we can achieve in Node.</p>
<p>Because there are so many entries, instead of recording the time for individual writes, I've recorded average time for batches of 1000 writes. Below you can see what the write-times look like when plotted over time. There are a bunch of outliers that are above the maximum Y of 0.6ms, but not enough to warrant distracting from the interesting behaviour below 0.6ms so I chopped it off there.</p>
<p><strong>It is important to note that I'm using the default options</strong> here and this is where I'll probably cop some flak. Basho in particular advocate a healthy amount of "tuning" to achieve appropriate performance. In particular the write-buffer defaults to only 4M and you can push data in faster (at the cost of compactions later on) by increasing this. I think the forks may even have additional tunables of their own that you can fiddle with. But, this whole tuning thing is a rabbit hole I don't dare go down right now!</p>
<p>I'm running this on an i7-2630QM CPU, plenty of RAM and an SSD.</p>
<p>You can see that we managed to push in the 10M entries in just over 95 seconds with the plain <strong>Google LevelDB (v1.10.0)</strong>.</p>
<p><img src="https://r.va.gg/images/2013/06/write_sorted_times_g.png" height=500 width=800 align="center" /></p>
<hr>
<p>Next up we have the HyperDex fork. The main difference here is that we have it working slightly faster in total and the write-times have been trimmed down a bit to be more consistent. Not a bad effort with default settings, quite a nice picture.</p>
<p><img src="https://r.va.gg/images/2013/06/write_sorted_times_h.png" height=500 width=800 align="center" /></p>
<hr>
<p>Lastly we can see what Basho have done. They've been on this case for a lot longer than HyperDex have and their fork, internally at least, diverges quite a bit from Google's LevelDB.</p>
<p>We can see that the write-time has been considerably flattened; which is in line with what Basho claim and are aiming for, the consistency here is <strong>very</strong> impressive. Unfortunately we've ended up with a total time that is <strong>double</strong> what it took Google's LevelDB to get the 10M entries in!</p>
<p>No doubt this is probably something to do with the tunables, or perhaps I've messed something up, anything's possible!</p>
<p><img src="https://r.va.gg/images/2013/06/write_sorted_times_b.png" height=500 width=800 align="center" /></p>
<hr>
<h2 id="so-">So?</h2>
<p>If you take anything away from this, here's what I think it should be: <strong>Do your own benchmarks if performance <em>really</em> is an issue for you</strong>. You're going to need some kind of benchmark suite that is tailored to your particular application. This will not only let you choose the appropriate storage system but it will give you something to work with when you start to get in to the mire that is "tunables".</p>
<p>It's likely I won't be able to leave this alone and will be posting more benchmarks with some tweaking and tuning. I'd love to have input from others on this too of course! The code for this is all in the LevelDOWN repo with both of these forks under appropriately named branches.</p>
LevelUP v0.9 Released2013-05-21T00:00:00.000Zhttps://r.va.gg/2013/05/levelup-v0.9-released.html
<p><img src="https://twimg0-a.akamaihd.net/profile_images/3360574989/92fc472928b444980408147e5e5db2fa_bigger.png" alt="LevelDB"></p>
<p>As per my <a href="https://r.va.gg/2013/05/levelup-v0.9-some-major-changes.html">previous post</a>, <strong><a href="https://github.com/rvagg/node-levelup">LevelUP</a> v0.9 has been released</strong>!</p>
<p>I'm doing a quick post about this release because it's got more changes in it than we normally see, including some things worth explaining.</p>
<h3 id="relationship-to-leveldown">Relationship to LevelDOWN</h3>
<p>The biggest change is the removal of <a href="https://github.com/rvagg/node-leveldown/">LevelDOWN</a> as a dependency, you should <a href="https://r.va.gg/2013/05/levelup-v0.9-some-major-changes.html">review what I've already said about this</a> as this will impact you if you're currently using LevelUP. In short, you'll either need to explicitly <code>npm install leveldown</code> or switch to using the new <a href="https://github.com/level/level">Level</a> package which bundles them both.</p>
<p>Along with this change, we also get better <a href="http://browserify.org/">Browserify</a> support. See <a href="https://github.com/maxogden/level.js">level.js</a> for more information on this.</p>
<h3 id="chained-batch">Chained batch</h3>
<p>The other major change is the introduction of a new <strong>chained batch</strong> syntax, additional to the existing batch syntax. This method of creating and writing batch operations is much closer to the way LevelDB does batches and under certain circumstances you may find improved performance from using this method.</p>
<p>If you call <code>db.batch()</code> with no arguments, you'll get a <code>Batch</code> object back which has the following operations: <code>put()</code>, <code>del()</code>, <code>clear()</code> and <code>write()</code>. The first three are chainable so you can call them one after the other to build your batch. <code>write()</code> is the only method that takes a callback because it submits the batch. Until you call <code>write()</code>, the batch is transient and can be discarded.</p>
<p>Example from the <a href="https://github.com/rvagg/node-levelup#readme">README</a>:</p>
<div class="highlight"><pre><span class="nx">db</span><span class="p">.</span><span class="nx">batch</span><span class="p">()</span>
<span class="p">.</span><span class="nx">del</span><span class="p">(</span><span class="s1">'father'</span><span class="p">)</span>
<span class="p">.</span><span class="nx">put</span><span class="p">(</span><span class="s1">'name'</span><span class="p">,</span> <span class="s1">'Yuri Irsenovich Kim'</span><span class="p">)</span>
<span class="p">.</span><span class="nx">put</span><span class="p">(</span><span class="s1">'dob'</span><span class="p">,</span> <span class="s1">'16 February 1941'</span><span class="p">)</span>
<span class="p">.</span><span class="nx">put</span><span class="p">(</span><span class="s1">'spouse'</span><span class="p">,</span> <span class="s1">'Kim Young-sook'</span><span class="p">)</span>
<span class="p">.</span><span class="nx">put</span><span class="p">(</span><span class="s1">'occupation'</span><span class="p">,</span> <span class="s1">'Clown'</span><span class="p">)</span>
<span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="kd">function</span> <span class="p">()</span> <span class="p">{</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'Done!'</span><span class="p">)</span> <span class="p">})</span>
</pre></div>
<h3 id="some-love-for-writestream">Some love for WriteStream</h3>
<p>WriteStream got some attention in this release. On the main <code>createWriteStream()</code> method and on individual <code>write()</code> calls, you can now pass some new options:</p>
<ul>
<li><code>'type'</code> can switch from the default <code>'put'</code> to <code>'del'</code> so you can make a WriteStream that only deletes when you <code>write({ key: 'foo' })</code>, or you can make individual writes delete: <code>write({ type: 'del', key: 'foo' })</code>.</li>
<li><code>'keyEncoding'</code> and <code>'valueEncoding'</code> will switch from default encodings for the current LevelUP instance. Again, you can specify them on the main <code>createWriteStream()</code> or on individual <code>write()</code> calls.</li>
</ul>
<h3 id="other-changes">Other changes</h3>
<ul>
<li>A <a href="https://github.com/rvagg/node-levelup/pull/128">race condition</a> was fixed that allowed a <code>put()</code> to write to the store before an iterator was obtained when calling `createReadStream().</li>
<li>ReadStream no longer emits a <code>'ready'</code> event.</li>
<li>The <code>db</code> property on LevelUP instances can be used to get access to LevelDOWN or whatever LevelDOWN-substitute you are using (this was <code>_db</code>).</li>
<li>Some very LevelDB-specific methods have been deprecated on LevelUP and the documentation now recommends either directly using LevelDOWN or calling via the <code>db</code> property. Specifically:<ul>
<li><code>db.db.approximateSize()</code></li>
<li><code>leveldown.repair()</code></li>
<li><code>leveldown.destroy()</code></li>
</ul>
</li>
<li>LevelDOWN got a new LevelDB method: <code>getProperty()</code> that currently understands 3 properties:<ul>
<li><code>db.db.getProperty('leveldb.num-files-at-levelN')</code>: returns the number of files at level <em>N</em>, where N is an integer representing a valid level (e.g. "0")').</li>
<li><code>db.db.getProperty('leveldb.stats')</code>: returns a multi-line string describing statistics about LevelDB's internal operation.</li>
<li><code>db.db.getProperty('leveldb.sstables')</code>: returns a multi-line string describing all of the <em>sstables</em> that make up contents of the current database.</li>
</ul>
</li>
<li>Significantly improved ReadStream performance improvements (up to 50% faster).</li>
<li>Some LevelDOWN memory leaks discovered and fixed.</li>
<li>LevelDOWN upgraded to LevelDB@1.10.0, <a href="https://groups.google.com/forum/#!topic/node-levelup/bly-MiUzrZw">details here</a>.</li>
</ul>
<h3 id="who-you-should-thank">Who you should thank</h3>
<p>A lot of people put in work to this release. There's a <a href="https://github.com/rvagg/node-levelup#contributors">team of people</a> that can claim ownership of LevelUP, LevelDOWN and related projects and most of them have been involved in this release. You should follow these people on Twitter and GitHub!</p>
<ul>
<li><strong>Dominic Tarr</strong> (<a href="https://github.com/dominictarr">GitHub/dominictarr</a> / <a href="http://twitter.com/dominictarr">Twitter/@dominictarr</a>) contributed to the ReadStream fixes and is just a generally valuable & awesome sage in the LevelDB + Node community.</li>
<li><strong>Julian Gruber</strong> (<a href="https://github.com/juliangruber">GitHub/juliangruber</a> / <a href="http://twitter.com/juliangruber">Twitter/@juliangruber</a>) contributed the encoding options for WriteStreams and most of the work on the new chained <code>batch()</code>.</li>
<li><strong>Matteo Collina</strong> (<a href="https://github.com/mcollina">GitHub/mcollina</a> / <a href="https://twitter.com/matteocollina">Twitter/@matteocollina</a>) contributed the <code>'type'</code> options for WriteStreams and most of the work on performance improvements to ReadStreams.</li>
<li><strong>David Björklund</strong> (<a href="https://github.com/kesla">GitHub/kesla</a> / <a href="http://twitter.com/david_bjorklund">Twitter/@david_bjorklund</a>) also contributed work on ReadStream performance.</li>
<li><strong>Max Ogden</strong> (<a href="https://github.com/maxogden">GitHub/maxogden</a> / <a href="http://twitter.com/maxogden">Twitter/@maxogden</a>) and <strong>Anton Whalley</strong> (<a href="https://github.com/No9">GitHub/No9</a> / <a href="https://twitter.com/antonwhalley">Twitter/@antonwhalley</a>) both worked on extracting most of the LevelDOWN test suite into <a href="https://github.com/rvagg/node-abstract-leveldown">AbstractLevelDOWN</a> to form a LevelDOWN-spec that's also runnable in browser environments.</li>
</ul>
<p>And others, who you can find in <a href="https://github.com/rvagg/node-levelup/pull/129">this 0.9 WIP thread</a>, plus additional users who reported & found issues.</p>
LevelUP v0.9 - Some Major Changes2013-05-20T00:00:00.000Zhttps://r.va.gg/2013/05/levelup-v0.9-some-major-changes.html
<p><img src="https://twimg0-a.akamaihd.net/profile_images/3360574989/92fc472928b444980408147e5e5db2fa_bigger.png" alt="LevelDB"></p>
<p><a href="https://github.com/rvagg/node-levelup">LevelUP</a> is still quite young and bound to go through some major shifts. It's best to not be too tied to immature APIs early in a project's lifetime.</p>
<p>That said, we're very interested in stability so we try to keep breaking changes to a minimum. However, we're about to publish version 0.9 and there's one change that's not exactly a "breaking" change in the normal sense, but it is something that I need to explain because it will impact on almost everyone currently using LevelUP.</p>
<h3 id="severing-the-dependency-on-leveldown">Severing the dependency on LevelDOWN</h3>
<p>LevelUP depends on <a href="https://github.com/rvagg/node-leveldown/">LevelDOWN</a> to do its <em>LevelDB thing</em>. LevelDOWN was once part of LevelUP until we split it off to a discrete project that focuses entirely on acting as a direct C++ bridge between LevelDB and Node. We get to focus on making LevelUP an awesome LevelDB-ish interface without being tied directly to LevelDB implementation details (e.g. Iterators vs Streams).</p>
<p>In fact, a new project was spawned to define the LevelDOWN interface that LevelUP requires. <a href="https://github.com/rvagg/node-abstract-leveldown">AbstractLevelDOWN</a> is a set of strict tests for the functionality that LevelUP uses and it also implements a basic abstract shell that can be extended to create additional back-ends for LevelUP.</p>
<p>So far, there are 3 projects worth mentioning that extend AbstractLevelDOWN:</p>
<ul>
<li><p><strong><a href="https://github.com/maxogden/level.js">level.js</a></strong> operates on top of <a href="https://developer.mozilla.org/en-US/docs/IndexedDB">IndexedDB</a> (which is in turn implemented on top of <a href="https://code.google.com/p/leveldb/">LevelDB</a> in Chrome!).</p>
</li>
<li><p><strong><a href="https://github.com/No9/node-leveldown-gap">leveldown-gap</a></strong> is another browser implementation that uses <a href="https://developer.mozilla.org/en-US/docs/Web/Guide/DOM/Storage#localStorage">localStorage</a> and is designed to be able to work in <a href="http://phonegap.com/">PhoneGap</a> applications.</p>
</li>
<li><p><strong><a href="https://github.com/rvagg/node-memdown">MemDOWN</a></strong> is a pure in-memory implementation that doesn't touch the disk. It's obviously not good for persistent data but sometimes that's not what you need.</p>
</li>
</ul>
<p>Plus some other efforts to adapt other embedded and non-embedded data stores to the LevelDOWN interface. Additionally, there are other versions of LevelDB that can be used, including the fork that <a href="http://basho.com/">Basho</a> maintains for use in <a href="http://basho.com/riak/">Riak</a>. (I have a branch of LevelDOWN that uses this version of LevelDB that I'll release as soon as I can explain and demonstrate the performance differences to vanilla LevelDB for Node users).</p>
<p>In short, LevelUP doesn't <em>need</em> LevelDOWN in the way it once did and LevelUP is turning into a more generic interface to sorted key/value storage systems, albeit with a distinct LevelDB-flavour.</p>
<p>Since version 0.8 we've supported a <code>'db'</code> option when you create a LevelUP instance. This option can be used to provide an alternative LevelDOWN-compatible back-end. Unfortunately, LevelDOWN being defined as a strict dependency of LevelUP means that each time you install it you have to compile LevelDOWN, even if you don't want it. So, we've removed it as a dependency but it's still <em>wired up</em> so that that the only thing you need to do is actually install LevelDOWN alongside LevelUP and it'll take care of the rest.</p>
<div class="highlight"><pre><span class="nv">$ </span>npm install levelup leveldown
</pre></div>
<p>From version 0.9 onwards, you'll need to do this, or you'll see an (informative) error.</p>
<h3 id="introducing-level-">Introducing "Level"</h3>
<p>To make life easier, we're publishing an additional package in npm that will make this easier by bundling both LevelUP and LevelDOWN as dependencies and exposing LevelUP directly. The <strong><a href="https://github.com/level/level">Level</a></strong> package is a very simple wrapper that exists purely as a convenience. It'll track the same versioning as LevelUP so it's a straight substitution.</p>
<div class="highlight"><pre><span class="nv">$ </span>npm install level
</pre></div>
<p>You can simply change your <code>"dependencies"</code> from <strong>"levelup"</strong> to <strong>"level"</strong>, plus you can use it just like LevelUP:</p>
<div class="highlight"><pre><span class="kd">var</span> <span class="nx">levelup</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'level'</span><span class="p">)</span>
<span class="kd">var</span> <span class="nx">db</span> <span class="o">=</span> <span class="nx">levelup</span><span class="p">(</span><span class="s1">'./my.db'</span><span class="p">)</span>
<span class="nx">db</span><span class="p">.</span><span class="nx">put</span><span class="p">(</span><span class="s1">'yay!'</span><span class="p">,</span> <span class="s1">'it works!'</span><span class="p">)</span>
</pre></div>
<h3 id="switching-things-up">Switching things up</h3>
<p>Now we have a properly pluggable back-end, expect to see a growing array of choice and innovation. The most exciting space at the moment is browser-land. Consider <strong>level.js</strong>:</p>
<div class="highlight"><pre><span class="kd">var</span> <span class="nx">levelup</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'levelup'</span><span class="p">)</span>
<span class="p">,</span> <span class="nx">leveljs</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'level-js'</span><span class="p">)</span>
<span class="nb">window</span><span class="p">.</span><span class="nx">db</span> <span class="o">=</span> <span class="nx">levelup</span><span class="p">(</span><span class="s1">'foo'</span><span class="p">,</span> <span class="p">{</span> <span class="nx">db</span><span class="o">:</span> <span class="nx">leveljs</span> <span class="p">})</span>
<span class="nx">db</span><span class="p">.</span><span class="nx">put</span><span class="p">(</span><span class="s1">'name'</span><span class="p">,</span> <span class="s1">'LevelUP string'</span><span class="p">,</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">db</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="s1">'name'</span><span class="p">,</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">err</span><span class="p">,</span> <span class="nx">value</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'name='</span> <span class="o">+</span> <span class="nx">value</span><span class="p">)</span>
<span class="p">})</span>
<span class="p">})</span>
</pre></div>
<p>Yep, that's browser code. Simply <code>npm install levelup level-js</code> and run the module through <a href="http://browserify.org/">Browserify</a> and you get the full LevelUP API in your browser!</p>
<hr>
<p>Stay tuned! This is just one step in the quest for a truly modular database system that lets you build a database that suits your applications and not the other way around.</p>
Node.ninjas Presentation - LevelDB and Node Sitting in a Tree2013-05-09T00:00:00.000Zhttps://r.va.gg/2013/05/node.ninjas-presentation-leveldb-and-node-sitting-in-a-tree.html
<p>I'm giving a presentation at <a href="http://www.meetup.com/sydney-node-ninjas/">Node.ninjas</a> tonight in Sydney. I've put together a talk about LevelDB and Node that covers:</p>
<ol>
<li>What LevelDB <em>is</em> and the basics of how it works</li>
<li>A quick introduction to the core LevelDB libraries in Node: <a href="https://github.com/rvagg/node-levelup">LevelUP</a> and <a href="https://github.com/rvagg/node-leveldown/">LevelDOWN</a></li>
<li>Some preaching about the awesomeness of modularity around a small, extensible core; including a whirlwind tour of the current, flourishing, LevelDB+Node ecosystem</li>
</ol>
<p>It's this last point that excites me the most. There's some very smart people building some very clever pieces to the <em>Node Database</em> puzzle. What's more, people are actually building functional databases in Node now, I've just collected a list from npm of what looks like functional databases that use LevelDB:</p>
<ul>
<li>Rumours</li>
<li>LevelGraph</li>
<li>PushDB</li>
<li>NeutrinoDB</li>
<li>PlumbDB</li>
<li>Syncstore</li>
</ul>
<p>And a few more that look like a work in progress. Plus, I'm sure there's more people out there we've never even heard of who are cooking up some amazing things using the LevelDB+Node combination!</p>
<p><strong>The slides to my talk are <a href="https://r.va.gg/presentations/node.ninjas/">here</a>.</strong></p>
LevelDB and Node: Getting Up and Running2013-05-04T00:00:00.000Zhttps://r.va.gg/2013/05/leveldb-and-node-getting-up-and-running.html
<p>This is the second article in a three-part series on LevelDB and how it can be used in Node.</p>
<ul class="parts">
<li><a href="http://dailyjs.com/2013/04/19/leveldb-and-node-1/">Part 1: What is LevelDB Anyway?</a></li>
<li><a href="http://dailyjs.com/2013/05/03/leveldb-and-node-2/"><strong>Part 2: Getting Up and Running</strong></a></li>
</ul>
<p>Our first article covered the basics of LevelDB and its internals. If you haven't already read it you are encouraged to do so as we will be building upon this knowledge as we introduce the Node interface in this article.</p>
<p><img src="http://dailyjs.com/images/posts/leveldb.png" alt="LevelDB"></p>
<p>There are two primary libraries for using LevelDB in Node, <strong><a href="https://github.com/rvagg/node-leveldown">LevelDOWN</a></strong> and <strong><a href="https://github.com/rvagg/node-levelup">LevelUP</a></strong>.</p>
<p><strong>LevelDOWN</strong> is a pure C++ interface between Node.js and LevelDB. Its API provides limited <em>sugar</em> and is mostly a straight-forward mapping of LevelDB's operations into JavaScript. All I/O operations in LevelDOWN are asynchronous and take advantage of LevelDB's thread-safe nature to parallelise reads and writes.</p>
<p><strong>LevelUP</strong> is the library that the majority of people will use to interface with LevelDB in Node. It wraps LevelDOWN to provide a more Node.js-style interface. Its API provides more <em>sugar</em> than LevelDOWN, with features such as optional arguments.</p>
<p>LevelUP exposes iterators as Node.js-style object streams. A LevelUP <strong>ReadStream</strong> can be used to read sequential entries, forward or reverse, to and from any key.</p>
<p>LevelUP handles JSON and other encoding types for you. For example, when operating on a LevelUP instance with JSON value-encoding, you simply pass in your objects for writes and they are serialised for you. Likewise, when you read them, they are deserialised and passed back in their original form.</p>
<p><strong>Continue reading this article on <a href="http://dailyjs.com/2013/05/03/leveldb-and-node-2/">DailyJS.com</a></strong></p>
LevelDB and Node: What is LevelDB Anyway?2013-05-01T06:30:00.000Zhttps://r.va.gg/2013/05/leveldb-and-node-what-is-leveldb-anyway.html
<p>This is the first article in a three-part series on LevelDB and how it can be used in Node.</p>
<p>This article will cover the LevelDB basics and internals to provide a foundation for the next two articles. The second and third articles will cover the core LevelDB Node libraries: <a href="https://github.com/rvagg/node-levelup">LevelUP</a>, <a href="https://github.com/rvagg/node-leveldown">LevelDOWN</a> and the rest of the LevelDB ecosystem that's appearing in Node-land.</p>
<p><img src="http://dailyjs.com/images/posts/leveldb.png" alt="LevelDB"></p>
<h3 id="what-is-leveldb-">What is LevelDB?</h3>
<p>LevelDB is an <em>open-source</em>, <em>dependency-free</em>, <em>embedded key/value data store</em>. It was developed in 2011 by Jeff Dean and Sanjay Ghemawat, researchers from Google. It's written in C++ although it has third-party bindings for most common programming languages. Including JavaScript / Node.js of course.</p>
<p>LevelDB is based on ideas in Google's BigTable but does not share code with BigTable, this allows it to be licensed for open source release. Dean and Ghemawat developed LevelDB as a replacement for SQLite as the backing-store for Chrome's IndexedDB implementation.</p>
<p>It has since seen very wide adoption across the industry and serves as the back-end to a number of new databases and is now the recommended storage back-end for Riak.</p>
<p><strong>Continue reading this article on <a href="http://dailyjs.com/2013/04/19/leveldb-and-node-1/">DailyJS.com</a></strong></p>
Node.js Dublin Presentation - LevelDB2013-05-01T06:00:00.000Zhttps://r.va.gg/2013/05/node.js-dublin-presentation-leveldb.html
<p>I visited lovely Dublin last month to attend <a href="http://peerconf.com/">PeerConf</a>. While there I got to meet a great bunch of Irish programmers at <a href="http://www.nodejsdublin.com/">Node.js Dublin</a>, a semi-regular Node.js meet-up that happens in the <a href="https://www.engineyard.com/">Engine Yard</a> office in Dublin.</p>
<p>I was invited to give a presentation on LevelDB and the work that I've been doing on it in Node.js. I was followed by <a href="https://github.com/dominictarr">Dominic Tarr</a> who's doing some amazing work on top of LevelDB.</p>
<p>You can view my slides <a href="https://r.va.gg/presentations/nodejsdub/">here</a> but a written version is currently being spread over 3 parts on <a href="http://dailyjs.com">DailyJS</a>. More about that soon!</p>
Announcing Bean v1.0.02012-09-08T12:41:12.000Zhttps://r.va.gg/2012/09/bean-v1.html
<p>In my <a href="http://rod.vagg.org/2012/08/bean_v1/" title="Towards Bean v1.0 (or: How event managers do their thing)">previous post</a> about Bean I discussed in detail the work that has gone in to a v1 release and how it will differ from the v0.4 branch.</p>
<p>Bean version 1.0.0 has now been released, you can download it from the <a href="https://github.com/fat/bean">GitHub repository</a> or you can fetch it from <a href="https://npmjs.org/package/bean">npm</a> for your Ender builds.</p>
<p>Here's a quick summary of the changes, but for a more in-depth look you should refer to my <a href="http://rod.vagg.org/2012/08/bean_v1/" title="Towards Bean v1.0 (or: How event managers do their thing)">previous post</a>.</p>
<blockquote>
<p><b><code>on()</code> argument ordering</b>: the new signature is now <code>.on(events[, selector], handlerFn)</code>, which will work on both Bean as a standalone library and when bundled in Ender. In Ender, the following aliases also pass through <code>on()</code> so the same arguments work: <code>addListener()</code>, <code>bind()</code>, <code>listen()</code> and <code>one()</code> (which of course will only trigger once). Plus all the specific shortcuts such as <code>click()</code>, <code>keyup()</code> etc. although these methods have the first argument hardwired.</p>
<p><code>add()</code> is left intact with the same argument ordering for standalone Bean and <code>delegate()</code> has the same signature, the same as jQuery's equivalent.</p>
<p><b><code>off()</code> is the new <code>remove()</code></b>: although <code>remove()</code> is still available in standalone Bean.</p>
<p><b>Bean attaches a single handler to the DOM for each event type on each element</b>: as outlined above, Bean will iterate over all handlers for each triggered and (mostly) reuse the same Event object for each call.</p>
<p><b><code>Event.stopImmediatePropagation()</code>:</b> is available across all supported browsers, it will stop the processing of all handlers for the current event at the current element (i.e. the event will still bubble).</p>
<p><b>The selector engine argument to <code>add()</code> is now completely removed</b>: you used to have to pass a selector engine in as the last argument for delegated events. Now you must set it once at start-up with <code>setSelectorEngine()</code>. This is automatically taken care of for you in an Ender build.</p>
<p><b>A duplicate-handler check is no longer performed when you <i>add</i></b>: performance testing showed that this was a massive slow-down and is simply not something that Bean should be responsible for. If you want to add the same handler twice then that's your business and responsibility.</p>
<p><b>Namespace matching for event <code>fire()</code>ing now matches namespaces using an <i>and</i> instead of an <code>or</code>:</b> so for example, firing namespaces 'a.b' will fire any event with <i>both</i> 'a' and 'b' rather than <i>either</i> 'a' or 'b'. This is compatible with jQuery and is arguably a much more sensible and helpful way to deal with namespaces. You can find some discussion on this <a href="https://github.com/fat/bean/pull/68">on GitHub</a>.</p>
<p><b>Lots of internal improvements for speed, code size, etc.</b>.</p>
</blockquote>
<p>There was one remaining question to be resolved—whether <code>Event.stop()</code> would also trigger <code>Event.stopImmediatePropagation()</code>. I've decided to <b>not</b> include it and leave it to the user to decide whether they want to prevent triggering of other listeners on the same event/element.</p>
<p>And that's it! Please give it a spin and open an issue on GitHub if you have any bugs to report or questions to be answered.</p>
How Ender bundles libraries for the browser2012-08-24T03:21:38.000Zhttps://r.va.gg/2012/08/ender-bundling.html
<p>I was asked an interesting Ender question on IRC (#enderjs on Freenode) and as I was answering it, it occurred to me that the subject would be an ideal way to explain how Ender's multi-library bundling works. So here is that explanation!</p><p>The original question went something like this:</p><blockquote><p>When a browser first visits my page, they only get served Bonzo (a DOM manipulation library) as a stand-alone library, but on returning visits they are also served Qwery (a selector engine), Bean (an event manager) and a few other modules in an Ender build. Can I integrate Bonzo into the Ender build on the browser for repeat visitors?</p></blockquote><h3>Wait, what's Ender?</h3><p>Let's step back a bit and start with some basics. The way I generally explain Ender to people is that it's two different things:</p><ol><li>It's a build tool, for bundling JavaScript libraries together into a single file. The resulting file constitutes a new "framework" based around the jQuery-style DOM element collection pattern: <code>$('selector').method()</code>. The constituent libraries provide the functionality for the <em>methods</em> and may also provide the selector engine functionality.</li><li>It's an <em>ecosystem</em> of JavaScript libraries. Ender promotes a small collection of libraries as a base, called <strong>The Jeesh</strong>, which together provide a large portion of the functionality normally required of a JavaScript framework, but there are many more libraries compatible with Ender that add extra functionality. Many of the libraries available for Ender are also usable outside of Ender as stand-alone libraries.</li></ol><p><em><strong>Continue reading this article on <a href="http://dailyjs.com/2012/08/23/ender-tutorial/">DailyJS.com</a></strong></em></p>
Towards Bean v1.0 (or: How event managers do their thing)2012-08-10T13:20:56.000Zhttps://r.va.gg/2012/08/bean_v1.html
<p><b><a href="https://github.com/fat/bean">Bean</a></b> is the event manager included in <b><a href="http://ender.no.de/">Ender's</a></b> starter pack, <i>The Jeesh</i>. If you want to do jQuery-style <code>bind()</code>, <code>on()</code> etc. with Ender, then use Bean.</p>
<p>At the time of writing, we're on version <i>0.4.11</i>. There's also been a <i>0.5-wip</i> ("work in progress") branch for a while now that's included some improvements I've been holding off for a major release. I also put together a <a href="https://github.com/fat/bean/issues/milestones">0.5 milestone</a> on GitHub with some ideas. The major item impacting on the external API is a switch to the <code>on()</code> argument order found in <a href="http://api.prototypejs.org/dom/Event/on/">Prototype</a>, <a href="http://api.jquery.com/on/">jQuery</a> and <a href="https://github.com/madrobby/zepto/blob/753e80114f0618bd7ce865508e0ff2085d0bfb5f/src/event.js#L166">Zepto</a>. Considering the significance of the changes in the new branch, I think that perhaps a <b>1.0</b> release would be warranted.</p>
<h3>Delegated <code>on()</code> argument ordering</h3>
<p>Until now, Bean's <code>add()</code> has followed the same argument ordering as jQuery's <code><a href="http://api.jquery.com/bind/">bind()</a></code> for standard events, and <code><a href="http://api.jquery.com/delegate/">delegate()</a></code> for delegated events; so the signature looks something like this: <code>.add([selector, ]events, handlerFn)</code> (.on() exists in the Ender bridge and does the same thing). The proposal was to change this to match the other major libraries', arguably more sensible, <code>.on(events[, selector], handlerFn)</code>. This is now in the <i>0.5-wip</i> branch.</p>
<h3>Performance</h3>
<p>Speed was another issue that I wanted to address for a new major release. Benchmarks have shown that Bean is under-performing in some areas and I believed it could do better. The process of analysing and addressing Bean's performance has been quite instructional and I've narrowed it down to some key trade-offs that authors of event libraries have deal with. One of the reasons I wanted to write this post was to outline some of these and solicit some feedback from the wider Bean-using community.</p>
<h4>Performance trade-off #1: record keeping</h4>
<p>When you call <code>Element.attachEvent()</code> (IE8 and below) or <code>Element.addEventListener()</code> (new browsers) you pass in a handler function that's called when the event in question is triggered. To stop that function being triggered you have to call <code>Element.detachEvent()</code> or <code>Element.removeEventListener()</code> and pass in that same function so the browser knows which handler you want to remove. Event managers like Bean and jQuery make that easier so you can do things like <code>bean.remove(element, 'click')</code> to remove all handlers; but Bean needs to know which handlers it needs to remove so it must keep records. The biggest change back in v0.4 of Bean was a switch to an internal registry that didn't molest DOM elements, external objects or external functions to attach identifiers so they could be later recalled. Previously, a <code><strong>uid</code> property was set on each DOM element that you set a handler on and your handler function itself had a <code></strong>uid</code> property set on it. jQuery does this too, it has a global <code>jQuery.guid</code> integer that it increments and attaches to pretty much everything. Don't be surprised when you find a <code>guid</code> property on your object/function/element once jQuery has got its fingers on it. This type of record keeping is fast and easy, but molesting other people's objects isn't very cool and there are alternatives.</p>
<p>My first major contribution to Bean was to switch it over to a registry similar to the one Deigo Perini has implemented in <a href="https://github.com/dperini/nwevents/">NWEvents</a>. Bean now iterates and compares rather than looking up directly. It adds some overhead but I managed to squeeze in enough performance gains in other areas to make v0.4 generally faster than v0.3 even with the registry switch.</p>
<h4>Performance trade-off #2: synthesising the Event object</h4>
<p>The DOM Level 3 Events specification outlines a base <a href="http://www.w3.org/TR/DOM-Level-3-Events/#interface-Event">Event object interface</a>, along with specific event types that extend this and add extra attributes and methods. This is the object that you get when your event handler is triggered by the DOM, it's the object that you read <code>keyCode</code> from for keyboard events and the object that you call <code>preventDefault()</code> and <code>stopPropagation()</code> on.</p>
<p>The problem we have is that nobody actually implements the full spec as-is and we also have to deal with older browsers which have all sorts of interesting attributes and methods on their Event objects. The stand-out difference is that in IE8 and below, instead of calling <code>Event.preventDefault()</code> to prevent the default browser behaviour (e.g. following a link click or accepting a keypress), you have to <code>Event.returnValue = false</code>. And, instead of calling <code>Event.stopPropagation()</code> to stop the event from bubbling up the DOM to parent elements, you have <code>Event.cancelBubble = true</code>.</p>
<p>So, the standard practice is for event managers to either create an Event object for you and set up the properties and methods based on the underlying <i>actual</i> Event object (as in Bean, jQuery and most others), or <i>fix</i> the Event object (as in Prototype). The performance trade-off here is that this is not cheap to do, especially for <i>every</i> event you need to react to. But there are ways to speed it up.</p>
<p>In Bean v0.4 we introduced a property "whitelist" which provided significant performance gains. In v0.3 and prior, Bean would try and copy every property and method that it found on the original Event object over to a new object (<code>{}</code>). It turns out that accessing some of these properties on some browsers comes with a significant performance penalty, and often you just don't need them because they are specific quirks of individual browsers. Since v0.4, Bean has been only looking at a list of properties that it expects to find on particular types of event objects and ignoring the rest. In the 0.5-wip branch, I started caching event "fixers" for each event type as they were encountered, so it's a little faster to figure out exactly what needs to be done as events are triggered.</p>
<p>But, it's still costly, so that's where the next performance trade-off comes in.</p>
<h4>Performance trade-off #3: hijacking event handler management</h4>
<p>Given that synthesising the Event object is so costly and you end up doing it multiple times for a single event if you have more than one handler for that event, event managers have a trick up their sleeve to alleviate the pain. NWMatcher, jQuery and others don't directly attach your event handler to the DOM, instead, they attach a single internal handler that is responsible for triggering any number of handlers you register for a given event on a particular element.</p>
<p>Consider the following code:</p>
<div class="highlight"><pre><span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="mi">100</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">$</span><span class="p">(</span><span class="s1">'#el'</span><span class="p">).</span><span class="nx">bind</span><span class="p">(</span><span class="s1">'click'</span><span class="p">,</span> <span class="kd">function</span> <span class="p">()</span> <span class="p">{</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">i</span><span class="p">)</span> <span class="p">})</span>
<span class="p">}</span>
</pre></div>
<p>This code would work in Bean and jQuery, the difference is that Bean v0.4 and prior adds 100 handlers directly to the DOM element to listen to that event while jQuery adds just one and iterates over the others when the event is triggered. The new version of Bean does the same.</p>
<p>The reason this helps with performance is that we don't have to make a new Event object each time the event is triggered, we can reuse the same one across handlers.</p>
<p>There's another major advantage to this approach, and perhaps a more important reason to implement an event manager this way: you get to hide some odd browser quirks. As <a href="https://twitter.com/kitcambridge">Kit Cambridge</a> <a href="https://twitter.com/kitcambridge/status/230775465542049792">noted recently</a>, older versions of Internet Explorer generally fire their handlers in LIFO order, yet W3C specs for <code>addEventListener()</code> specifies FIFO order! In fact, it's even worse because the <a href="https://twitter.com/kitcambridge/status/230775716239794176">Microsoft documentation</a> says that they may actually be triggered in <i>random order</i>! But, if you only have a single real handler then you get complete control over order.</p>
<p>The benefits go further though, we get to implement some nice features that are completely missing from older browsers and even some current browsers. The most notable is <code>Event.stopImmediatePropagation()</code>. This is a method that was introduced with DOM Level 3, so it's missing from IE8 and below, but surprisingly it's also missing from the current version of Opera! Perhaps the pressure is off because jQuery implements it as part of their relatively complete DOM Level 3 Events implementation using this single-DOM-handler method.</p>
<h5><code>stopImmediatePropagation()</code></h5>
<p>Bean has included a custom <code>Event.stop()</code> method since v0.4, it's modelled off the <a href="http://api.prototypejs.org/dom/Event/stop/">same method</a> in Prototype. It's also found in MooTools and and some other libraries. This method combines both <code>Event.stopPropagation()</code> and <code>Event.preventDefault()</code> in a short and sweet little utility method. But, <i>"stop"</i> is slightly misleading, because you can stop the default behaviour of the browser and you can stop the event bubbling up the DOM, but you <b>can't stop other event handlers for this event <i>at this element</i> from firing</b>. That's where the new <code>Event.stopImmediatePropagation()</code> comes in: it halts the processing of the event handler list for the current event at the current element (i.e. it can be used at any point in the bubbling process and it'll stop processing just the handlers at the element it was called at).</p>
<p>If an event manager takes the single-DOM-handler approach, it has to care about <code>stopImmediatePropagation()</code> because it no longer has an affect in the browser since the browser only has a single handler to worry about. But, you also get the benefit that it now applies to any browser the event manager supports.</p>
<p>At the time of writing this article I haven't decided whether I think that <code>Event.stop()</code> should also bundle <code>Event.stopImmediatePropagation()</code>. I'm leaning towards including it because <i>"stop"</i> should mean <b>stop</b> and the combination of all three methods would certainly do this.</p>
<h3>List of changes for Bean 1.0</h3>
<p><b><code>on()</code> argument ordering</b>: the new signature is now <code>.on(events[, selector], handlerFn)</code>, which will work on both Bean as a standalone library and when bundled in Ender. In Ender, the following aliases also pass through <code>on()</code> so the same arguments work: <code>addListener()</code>, <code>bind()</code>, <code>listen()</code> and <code>one()</code> (which of course will only trigger once). Plus all the specific shortcuts such as <code>click()</code>, <code>keyup()</code> etc. although these methods have the first argument hardwired.</p>
<p><code>add()</code> is left intact with the same argument ordering for standalone Bean and <code>delegate()</code> has the same signature, the same as jQuery's equivalent.</p>
<p><b><code>off()</code> is the new <code>remove()</code></b>: although <code>remove()</code> is still available in standalone Bean.</p>
<p><b>Bean attaches a single handler to the DOM for each event type on each element</b>: as outlined above, Bean will iterate over all handlers for each triggered and (mostly) reuse the same Event object for each call.</p>
<p><b><code>Event.stopImmediatePropagation()</code>:</b> is available across all supported browsers, it will stop the processing of all handlers for the current event at the current element (i.e. the event will still bubble).</p>
<p><b>The selector engine argument to <code>add()</code> is now completely removed</b>: you used to have to pass a selector engine in as the last argument for delegated events. Now you must set it once at start-up with <code>setSelectorEngine()</code>. This is automatically taken care of for you in an Ender build.</p>
<p><b>A duplicate-handler check is no longer performed when you <i>add</i></b>: performance testing showed that this was a massive slow-down and is simply not something that Bean should be responsible for. If you want to add the same handler twice then that's your business and responsibility.</p>
<p><b>Namespace matching for event <code>fire()</code>ing now matches namespaces using an <i>and</i> instead of an <code>or</code>:</b> so for example, firing namespaces 'a.b' will fire any event with <i>both</i> 'a' and 'b' rather than <i>either</i> 'a' or 'b'. This is compatible with jQuery and is arguably a much more sensible and helpful way to deal with namespaces. You can find some discussion on this <a href="https://github.com/fat/bean/pull/68">on GitHub</a>.</p>
<p><b>Lots of internal improvements for speed, code size, etc.</b>.</p>
<h3>Deconstructing performance (benchmarks)</h3>
<p>We've had a benchmark suite since v0.4 to help measure the impact of changes, so I've extended it to help compare some versions of Bean. The benchmarks use <a href="http://benchmarkjs.com/">benchmark.js</a>.</p>
<p>There are 3 versions of Bean included here:</p>
<ul>
<li><b>Bean 0.4</b>: The current release of Bean, specifically version 0.4.11-1, source <a href="https://github.com/fat/bean/tree/3ded4e905ef89a344729953be69e438583968679">here</a>.</li>
<li><b>Bean 0.5a</b>: An unreleased version of Bean in the 0.5-wip branch. Specifically most of the changes listed above are included here <i>except</i> for the single-DOM-handler change. This is here to assess the impact of this change and deciding whether it's a worthwhile "improvement". Source <a href="https://github.com/fat/bean/tree/989f33520c1ef6cb07e138a4176b14f3184adef6">here</a>.</li>
<li><b>Bean 1.0a</b>: The main difference between this and 0.5a is the single-DOM-handler change. Source <a href="https://github.com/fat/bean/tree/4bcf05ffe12cfcf0bf8744d2ebc1627c554eed92">here</a>.</li>
</ul>
<p>I'll have some notes about my own analysis of these numbers below but first I should mention that these benchmarks are not particularly helpful in showing how the libraries perform with real use patterns. I consider them to mainly be proxies for identifying the performance of particular behaviours within the libraries. You'll note that there are a lot of tests for <code>add()</code> / <code>on()</code>, that's simply because that's the easiest to test reliably and also because I haven't been bothered coming up useful with tests for other things. It's very difficult to test the actual event triggering which would be the most interesting bit, although the <code>fire()</code> tests give us a little bit of insight. The tests at the bottom try to capture a full add/fire/remove cycle, but even this isn't even particularly helpful. These benchmarks can be found in the Bean repo so if you want to tinker then feel free, I'd love to have additional input.</p>
<p>So, more so than most benchmarks, take these with a very large grain of salt or two!</p>
<p><i>(The numbers are ops/sec, so higher is better in all cases)</i></p>
<style>
table.results { font-family: "Lucida Grande", "Lucida Sans Unicode", Verdana, sans-serif; margin-bottom: 1.4em; }
table.results th:first-child, table.results td { text-align: right; }
table.results tbody tr:nth-child(2n+1) { background-color: rgba(0,0,0,0.075); }
table.results tr > * { padding: 0 3px; }
</style>
<h4>Chrome</h4>
<table class="results">
<thead>
<tr><th></th>
<th>Bean 0.4</th>
<th>Bean 0.5a</th>
<th>Bean 1.0a</th>
<th>NWEvents</th>
<th>jQuery</th>
</tr>
</thead>
<tbody>
<tr><th>add(element, event, fn)</th><td>25,760</td><td>66,580</td><td>185,147</td><td>18,133</td><td>142,161</td></tr>
<tr><th>add(unique element, event, fn)</th><td>33,024</td><td>99,208</td><td>36,481</td><td>18,634</td><td>50,554</td></tr>
<tr><th>add(element, custom, fn)</th><td>28,728</td><td>56,607</td><td>165,189</td><td>11,248</td><td>119,593</td></tr>
<tr><th>add(unique element, custom, fn)</th><td>36,150</td><td>78,260</td><td>34,308</td><td>24,409</td><td>44,761</td></tr>
<tr><th>add(element, event.namespace, fn)</th><td>30,082</td><td>64,435</td><td>189,468</td><td></td><td>136,486</td></tr>
<tr><th>add(unique element, event.namespace, fn)</th><td>33,702</td><td>101,915</td><td>34,678</td><td></td><td>33,637</td></tr>
<tr><th>add(element, selector, event, fn)</th><td>25,180</td><td>42,274</td><td>119,339</td><td>2,909</td><td>76,171</td></tr>
<tr><th>add(unique element, selector, event, fn)</th><td>27,328</td><td>91,156</td><td>30,308</td><td>1,069</td><td>35,696</td></tr>
<tr><th>add({})</th><td>15,594</td><td>27,312</td><td>59,434</td><td></td><td></td></tr>
<tr><th>fire(event)</th><td>576</td><td>492</td><td>6,860</td><td>9,797</td><td>21,821</td></tr>
<tr><th>fire(custom)</th><td>165,222</td><td>164,418</td><td>161,243</td><td>240,961</td><td>86,291</td></tr>
<tr><th>fire(namespace)</th><td>29,742</td><td>28,721</td><td>27,666</td><td></td><td></td></tr>
<tr><th>element add / click / remove</th><td>18,579</td><td>17,425</td><td>14,760</td><td>1,748</td><td>2,775</td></tr>
<tr><th>element add / fire / remove</th><td>31,230</td><td>28,344</td><td>15,802</td><td>1,127</td><td>2,763</td></tr>
<tr><th>object add / fire / remove</th><td>58,927</td><td>53,139</td><td>49,549</td><td>107,700</td><td>18,619</td></tr>
</tbody>
</table>
<h4>Firefox</h4>
<table class="results">
<thead>
<tr><th></th>
<th>Bean 0.4</th>
<th>Bean 0.5a</th>
<th>Bean 1.0a</th>
<th>NWEvents</th>
<th>jQuery</th>
</tr></thead>
<tbody>
<tr><th>add(element, event, fn)</th><td>20,404</td><td>45,030</td><td>100,546</td><td>13,826</td><td>63,159</td>
</tr><tr><th>add(unique element, event, fn)</th><td>16,708</td><td>67,417</td><td>19,625</td><td>16,810</td><td>29,130</td>
</tr><tr><th>add(element, custom, fn)</th><td>16,691</td><td>42,601</td><td>134,535</td><td>13,368</td><td>59,774</td>
</tr><tr><th>add(unique element, custom, fn)</th><td>24,159</td><td>55,312</td><td>21,235</td><td>13,475</td><td>27,877</td>
</tr><tr><th>add(element, event.namespace, fn)</th><td>17,414</td><td>53,639</td><td>101,427</td><td></td><td>55,321</td>
</tr><tr><th>add(unique element, event.namespace, fn)</th><td>23,735</td><td>59,751</td><td>22,034</td><td></td><td>27,576</td>
</tr><tr><th>add(element, selector, event, fn)</th><td>18,766</td><td>54,571</td><td>92,602</td><td>2,317</td><td>36,753</td>
</tr><tr><th>add(unique element, selector, event, fn)</th><td>22,094</td><td>56,026</td><td>16,705</td><td>964</td><td>22,102</td>
</tr><tr><th>add({})</th><td>9,126</td><td>17,104</td><td>32,093</td><td></td><td></td>
</tr><tr><th>fire(event)</th><td>260</td><td>266</td><td>3,391</td><td>3,120</td><td>11,154</td>
</tr><tr><th>fire(custom)</th><td>61,845</td><td>59,950</td><td>61,742</td><td>93,033</td><td>45,978</td>
</tr><tr><th>fire(namespace)</th><td>28,910</td><td>27,379</td><td>23,127</td><td></td><td></td>
</tr><tr><th>element add / click / remove</th><td>7,644</td><td>6,220</td><td>6,005</td><td>1,284</td><td>4,845</td>
</tr><tr><th>element add / fire / remove</th><td>11,288</td><td>10,954</td><td>7,458</td><td>788</td><td>9,115</td>
</tr><tr><th>object add / fire / remove</th><td>45,165</td><td>37,934</td><td>37,306</td><td>38,097</td><td>12,490</td>
</tr></tbody>
</table>
<h4>IE9</h4>
<table class="results">
<thead>
<tr><th></th>
<th>Bean 0.4</th>
<th>Bean 0.5a</th>
<th>Bean 1.0a</th>
<th>NWEvents</th>
<th>jQuery</th>
</tr></thead>
<tbody>
<tr><th>add(element, event, fn)</th><td>925</td><td>944</td><td>209,714</td><td>4,321</td><td>117,343</td>
</tr><tr><th>add(unique element, event, fn)</th><td>13,559</td><td>113,944</td><td>10,568</td><td>3,012</td><td>58,929</td>
</tr><tr><th>add(element, custom, fn)</th><td>946</td><td>1,004</td><td>219,631</td><td>4,329</td><td>128,570</td>
</tr><tr><th>add(unique element, custom, fn)</th><td>7,557</td><td>123,288</td><td>12,620</td><td>3,191</td><td>32,610</td>
</tr><tr><th>add(element, event.namespace, fn)</th><td>880</td><td>826</td><td>87,932</td><td></td><td>53,737</td>
</tr><tr><th>add(unique element, event.namespace, fn)</th><td>11,823</td><td>103,977</td><td>12,001</td><td></td><td>28,053</td>
</tr><tr><th>add(element, selector, event, fn)</th><td>655</td><td>802</td><td>57,619</td><td>382</td><td>21,159</td>
</tr><tr><th>add(unique element, selector, event, fn)</th><td>11,649</td><td>96,597</td><td>11,404</td><td>139</td><td>24,756</td>
</tr><tr><th>add({})</th><td>53</td><td>49</td><td>17,735</td><td></td><td></td>
</tr><tr><th>fire(event)</th><td>290,543</td><td>286,385</td><td>293,547</td><td>71,396</td><td>22,794</td>
</tr><tr><th>fire(custom)</th><td>229,241</td><td>223,189</td><td>216,943</td><td>78,395</td><td>23,081</td>
</tr><tr><th>fire(namespace)</th><td>17,507</td><td>11,848</td><td>16,018</td><td></td><td></td>
</tr><tr><th>element add / click / remove</th><td>10,228</td><td>9,697</td><td>9,260</td><td>478</td><td>8,345</td>
</tr><tr><th>element add / fire / remove</th><td>13,062</td><td>10,587</td><td>18,577</td><td>155</td><td>6,094</td>
</tr><tr><th>object add / fire / remove</th><td>30,924</td><td>29,096</td><td>28,904</td><td>39,761</td><td>7,634</td>
</tr></tbody>
</table>
<p>First, let me say that the IE results don't make a whole lot of sense so I'm going to suggest that the Chrome and Firefox benchmarks are the best indicators of general performance characteristics across browsers. The IE results have similar patterns to the others but there's way too much strangeness in there for me to take them seriously! IE8 has difficulty running all the benchmarks without locking up and I don't care enough to persevere there so I'm ignoring that too. Safari crashes and Opera has very similar results to Firefox and Chrome.</p>
<p><i>(Just to clarify, it's only the benchmarks that have trouble running in older versions of IE, the Bean test suite still runs on IE6 and above and has been beefed up even more in the 0.5-wip branch.)</i></p>
<h4>Some observations</h4>
<ul>
<li>The gains for <code>add()</code> from Bean v0.4 to v0.5a are largely from removing the <b>duplicate handler check</b>.</li>
<li>The reason for the duplicate tests for <b><i>"element"</i> vs <i>"unique element"</i> in the <code>add()</code> benchmarks</b> is to demonstrate the costs and benefits involved the single-DOM-handler model. You can see that the numbers switch between the non-unique / unique tests for Bean v0.5a and v1.0a. Also jQuery suffers significantly when you feed it unique elements because it has to add DOM handlers each time.</li>
<li>The poor performance for Bean v0.4 and v0.5a in <b><code>fire()</code> benchmarks</b> is mostly attributed to <b>Event object synthesising</b>, rather than the speed of the browser-native handler list management. This is important because firing native-style events (e.g. <code>fire('click')</code>, which is what we're testing here) is not a common activity but we're having to synthesize the event object each time a handler is triggered. So, this is where Bean finds the most <i>win</i> in switching to a single-DOM-handler model.</li>
<li>Bean loses performance between v0.5a and v1.0a in the unique element <code>add()</code> benchmarks, this can mostly be explained by the <b>overhead of managing the root handler that it needs to attach to the DOM</b>. The handler is stored in the internal registry and each time you <code>add()</code> it needs to work out if you already have a root handler attached to the DOM or not for the given event / element. jQuery gets to take some shortcuts by polluting the DOM and handler functions with <code>guid</code> properties. However, the numbers suggest to me that there is some additional performance that could be squeezed out of Bean in this area.</li>
<li>Bean is fairly liberal with its <b>whitelist of properties</b> to copy from the original Event object, jQuery is a bit more restrictive with its similar system, this may slow Bean down very slightly.</li>
<li><b>Delegated events</b> are not represented well here, but the results would be very interesting because of the additional work required.</li>
</ul>
<h3>File size</h3>
<p>A lot of users of Bean are file-size-sensitive, so it's important to highlight that there are costs to these performance improvements. Minified, gzipped, the sizes for each of these versions of Bean are:</p>
<table class="results">
<tbody>
<tr><th>Bean 0.4</th><td>3870 bytes</td></tr>
<tr><th>Bean 0.5a</th><td>3959 bytes</td></tr>
<tr><th>Bean 1.0a</th><td>4176 bytes</td></tr>
</tbody>
</table>
<p>I've tried <i>really</i> hard to keep the size under 4kb but the additional overhead in managing the single-DOM-handler is too much to achieve that, even though I've managed to shave many precious bytes off in other areas of the code in the process (which unfortunately can't be seen in these numbers!).</p>
<p>We're still well under the minified, gzipped size of the jQuery events module by itself, even though we implement very similar functionality and jQuery gets to leverage lots of internal sugar not contained within the events module.</p>
<h3>Request for feedback</h3>
<p>After all that, what I really want is feedback! At this point I'm happy to release a proper version 1.0, I think it's major enough to warrant a jump past 0.5. I'd really like to hear feedback from people that have doubts that the changes are worth it, particularly the single-DOM-handler change.</p>
<h3>Using the 1.0 pre-release</h3>
<p>I've started using it in production and am very happy with the results so far, I'd love to have feedback from anyone else who wants to give it a spin.</p>
<p>The new version of Bean is in npm with the tag <i>dev</i> so you can include it in your Ender builds by referring to <i>bean@dev</i> as the package name.</p>
<p>For stand-alone, you can grab it from the <a href="https://github.com/fat/bean/tree/0.5-wip">0.5-wip branch</a> on GitHub.</p>
<p>Thanks for getting this far!</p>
mod_geoip2_xff update2012-07-06T02:47:17.000Zhttps://r.va.gg/2012/07/mod_geoip2_xff-update.html
<p>Thanks to a contribution from <a href="https://plus.google.com/105599514712357912650/posts">Kevin Gaudin</a>, I have a new release of my <a href="http://www.maxmind.com/app/mod_geoip">mod_geoip2</a> fork. (The history starts <a href="http://rod.vagg.org/2012/04/a-mod_geoip2-that-properly-handles-x-forwarded-for/">here</a>.)</p>
<p>You can find the source here: <a href="https://github.com/rvagg/mod_geoip2_xff">https://github.com/rvagg/mod_geoip2_xff</a></p>
<p>Kevin's addition provides a fall-back to the standard remote IP address of the client if no public IP address is found in the <em>X-Forwarded-For</em> header. Previously, my implementation just fell back to the default mod_geoip2 behaviour of just taking the first IP address in the <em>X-Forwarded-For</em> header, or the last if you set <em>GeoIPUseLastXForwardedForIP</em> in your config.</p>
<p>I also took the opportunity to clean things up a little and introduce a config option to turn on the special <em>X-Forwarded-For</em> handling. You now have to set <strong>GeoIPUseLeftPublicXForwardedForIP</strong> to <strong>On</strong> to activate it.</p>
<p>Thanks to Kevin, and additional contributions are welcome!</p>
<p><strong>Update July 7th 2012</strong>: Since I was in C-mode, I went ahead and implemented something I've tried to get working in the past: <strong>hostname lookups on the X-Forwarded-For host!</strong> I got intimate with APR and worked out how to use Apache to do the resolution so there isn't the lengthy timeout of raw syscalls. If you set <strong>GeoIPEnableHostnameLookups</strong> to <strong>On</strong>, you'll get a <strong>GEOIP_HOST</strong> environment variable to use.</p>
<p>I've also decided to start making tarballs available off GitHub for your convenience: <a href="https://github.com/rvagg/mod_geoip2_xff/downloads">https://github.com/rvagg/mod_geoip2_xff/downloads</a></p>
Data URI + SVG2012-05-23T05:16:22.000Zhttps://r.va.gg/2012/05/data-uri-svg.html
<p>Data URIs are great when you want to serve small resources that there's no point serving up in a combined sprite. Consider <a href="http://microjs.com">microjs.com</a> which serves up an HTML file plus a single JavaScript file containing the latest data used to build the site. The build logic is in an embedded script, the CSS is also embedded, so it's pretty lean considering what you see and the amount of data displayed. But, notice the 3 icons for each project, 2 GitHub icons and a Twitter icon. They are PNG images, combined as a sprite but to avoid an additional HTTP request to fetch them they are simply embedded in the CSS which is embedded on the page:</p>
<div class="highlight"><pre><span class="nc">.title</span> <span class="nc">.stat</span> <span class="nt">span</span> <span class="p">{</span>
<span class="k">background-image</span><span class="o">:</span> <span class="k">url</span><span class="p">(</span><span class="err">"</span><span class="n">data</span><span class="o">:</span><span class="n">image</span><span class="o">/</span><span class="n">png</span><span class="p">;</span><span class="n">base64</span><span class="o">,</span><span class="n">iVBORw0KGgoAAAANSUhE</span><span class="o">...</span>
<span class="p">}</span>
</pre></div>
<p>Easy and quick and fairly well supported across browsers.</p>
<p>But Data URIs can do so much more, including embed SVG!</p>
<div class="highlight"><pre>url("data:image/svg+xml,<span class="nt"><svg</span> <span class="na">viewBox=</span><span class="s">'0 0 40 40'</span> <span class="na">height=</span><span class="s">'25'</span> <span class="na">width=</span><span class="s">'25'</span>
<span class="na">xmlns=</span><span class="s">'http://www.w3.org/2000/svg'</span><span class="nt">><path</span> <span class="na">fill=</span><span class="s">'rgb(91, 183, 91)'</span> <span class="na">d=</span><span class="s">'M2.379,</span>
<span class="s">14.729L5.208,11.899L12.958,19.648L25.877,6.733L28.707,9.561L12.958,25.308Z'</span>
<span class="nt">/></svg></span>")
</pre></div>
<p>The above will produce a 25px square image but the SVG is drawn in a 40x40 coordinate box, because I'm using a <a href="http://raphaeljs.com/icons/">Raphaël Icon</a> paths (you can try it yourself by replacing the <code>d=''</code> content with the path data you get when you click on any of the icons on the <a href="http://raphaeljs.com/icons/">Raphaël Icons</a> page.)</p>
<p>SVG of course gives you perfectly scalable graphics, embedding in a Data URI in your CSS lets you use them in the same way that you use other CSS images, minus the need to fetch them via an additional HTTP request.</p>
<p><strong>What's the catch?</strong></p>
<p>It's the web, of course there's a catch, and of course it involves Internet Explorer!</p>
<p>For a start you don't get SVG support in IE8 and below, which is a bit of a problem right now because IE8 is still very much with us due to the fact that IE9 isn't available for Windows XP users. But there's more than that. IE adheres to the <a href="http://www.ietf.org/rfc/rfc2397.txt">spec</a> more strongly than other browsers in that there are 2 types of encoding for Data URIs, <em>base64</em> and <em>non-base64</em>. If you leave the <code>;base64</code> off your string then most browsers let you get away with anything that doesn't conflict with standard CSS, so basically don't use <code>"</code>, or if you do, escape them with simple <code>\"</code>. What the Data URI spec says is:</p>
<p><blockquote>...the data (as a sequence of octets) is represented using ASCII encoding for octets inside the range of safe URL characters and using the standard %xx hex encoding of URLs for octets outside that range.</blockquote>
And IE doesn't let you have it any other way. So you either encode your SVG into Base64 or escape it with <code>%xx</code>'s, which kind of loses some of the elegance of SVG in CSS. But at least you'll get IE9+ support.</p>
<p>So here's some examples to <a href="http://jsfiddle.net/rvagg/exULa/">fiddle</a> with. Click through to the CSS tab to see the gory details. The first icon is Base64 encoded, the second icon is URL escaped (<code>%xx</code>), the rest are just plain SVG, so you'll get different results viewing in IE9 vs the rest.</p>
<p>SVG in Data URIs is an elegant solution (and a bit of fun) but only really useful at the moment if you don't need to support IE8 and below.</p>
<p><strong>Update 17th Sept 2012</strong></p>
<p>Below in the comments, Ben reports on his (much more rigorous) research into browser support; refer to that if you're serious about using SVG in Data URIs. An interesting result of his work comes from the <a href="https://code.google.com/p/chromium/issues/detail?id=137277">issue</a> he filed with Chromium (I don't know if this is a generic WebKit thing or not but you could easily test if you're interested). It turns out that Chromium/WebKit requires Base64 Data URIs to be multiples of 4 characters, so you just need to pad with <code>==</code>.</p>
A mod_geoip2 that properly handles X-Forwarded-For2012-04-22T04:55:42.000Zhttps://r.va.gg/2012/04/a-mod_geoip2-that-properly-handles-x-forwarded-for.html
<p>This is just a short follow-up to my original post on<em> <a title="Wrangling the X-Forwarded-For Header" href="http://rod.vagg.org/2011/07/wrangling-the-x-forwarded-for-header/">Wrangling the X-Forwarded-For Header</a></em> where I promised that one of the things I would follow up with was how to get MaxMind's mod_geoip2 to handle the X-Forwarded-For according to the rule:</p>
<p><p style="text-align: center;"><strong><em>Always use the leftmost non-private address</em></strong>.</p>
Well, since it's turning out to be such a popular post I thought I'd better get it done to help anyone else out that's searching around for solutions. So, I've put up the code on my GitHub account here:<strong></strong></p>
<p><p style="text-align: center;"><strong><a href="https://github.com/rvagg/mod_geoip2_xff">https://github.com/rvagg/mod_geoip2_xff</a></strong></p>
I'm maintaining a <em>maxmind</em> branch that contains the original code from MaxMind and the <em>master</em> contains my changes, so you can see a nice <a href="https://github.com/rvagg/mod_geoip2_xff/compare/maxmind...master">diff</a> of what I've done.</p>
<p>I have to warn that I haven't done any serious C programming for more than 15 years or so, my code probably isn't fantastic, and I'm open to outside contributions from anyone with suggestions. The approach I've taken is to embed the regexes of my previous post into the module and walk through the IP addresses looking for a non-private match.</p>
<p>Since my initial release, based on MaxMind's 1.2.5, they've put out a 1.2.7 which includes the addition of a <em>GeoIPUseLastXForwardedForIP</em> flag. I can imagine what prompted this addition but as I said in my previous post this isn't the way to get the best IP address. As of writing, my current master branch is based on 1.2.7 and has this new flag but because the <em>first_public_ip_in_list</em> is done first it's mostly useless.</p>
<p>If anyone wants to hassle MaxMind on my behalf then feel free, I sent them an email a couple of months ago about this but received no answer.</p>
<p><strong><strong>Update 6-July-2012</strong></strong>: A new release with some changes, details <a href="http://rod.vagg.org/2012/07/mod_geoip2_xff-update/">here</a>.</p>
JavaScript and Semicolons2012-04-20T06:10:16.000Zhttps://r.va.gg/2012/04/javascript-and-semicolons.html
<p>In syntax terms, JavaScript is in the broad C-family of languages. The C-family is diverse and includes languages such as C (obviously), C++, Objective-C, Perl, Java, C# and the newer Go from Google and Rust from Mozilla. Common themes in these languages include:</p>
<p><ul>
<li>The use of curly braces to surround blocks.</li>
<li>The general insignificance of white space (spaces, tabs, new lines) except in very limited cases. Indentation is optional and is therefore a matter of style and preference, plus programs can be written on as few or as many lines as you want.</li>
<li>The use of semicolons to end statements, expressions and other constructs. Semicolons become the delimiter that the new line character is in white-space-significant languages.</li>
</ul>
JavaScript’s rules for curly braces, white space and semicolons are consistent with the C-family and its formal specification, known as the ECMAScript Language Specification makes this clear:</p>
<p><blockquote>Certain ECMAScript statements (empty statement, variable statement, expression statement, do-while statement, continue statement, break statement, return statement, and throw statement) must be terminated with semicolons.</blockquote>
But it doesn’t end there–JavaScript introduces what’s known as <strong>Automatic Semicolon Insertion (ASI)</strong>. The specification continues:</p>
<p><blockquote>Such semicolons may always appear explicitly in the source text. For convenience, however, such semicolons may be omitted from the source text in certain situations. These situations are described by saying that semicolons are automatically inserted into the source code token stream in those situations.</blockquote>
The general C-family rules for semicolons can be found in most teaching material for JavaScript and has been advocated by most of the prominent JavaScript personalities since 1995. In a <a href="https://brendaneich.com/2012/04/the-infernal-semicolon/">recent post</a>, JavaScript’s inventor, Brendan Eich, described ASI as “a syntactic error correction procedure”, (as in “<a href="https://brendaneich.com/2012/04/the-infernal-semicolon/#comment-12268">parsing error</a>”, rather than “user error”).</p>
<p><strong><em>The rest of this article about semicolons in JavaScript can be found on <a title="JavaScript and Semicolons" href="http://dailyjs.com/2012/04/19/semicolons/">DailyJS</a>.</em></strong></p>
Minifying HTML in the Servlet container2011-11-23T04:38:44.000Zhttps://r.va.gg/2011/11/minifying-html-in-the-servlet-container.html
<p>Google's <a title="mod_pagespeed" href="http://code.google.com/speed/page-speed/docs/module.html">mod_pagespeed</a> is great. I've been using it for a while now on <a title="FeedXL Horse Nutrition" href="http://feedxl.com">feedxl.com</a> but the only filter that I actually find really useful is <a href="http://code.google.com/speed/page-speed/docs/filter-whitespace-collapse.html">Collapse Whitespace</a>; the rest of the filters I either already do myself as part of the site build process or I don't want applied. But, I imagine that there are a lot of admins out there that would really benefit from all of the clever things it can do.</p>
<p>Unfortunately it's just an Apache2 module so it's a bit difficult to use the cleverness elsewhere. I recently launched a new service that serves content directly from Apache Tomcat without passing through an Apache2 web server like I would normally do (because there was just no need!). Having got used to the nice whitespace optimisations you can get from mod_pagespeed I decided to implement a simple version of my own for Tomcat. Dynamic content is somewhere that you're better off trying not to optimise your whitespace during generation, leave it for post-processing so your logic can be clear.</p>
<p>So, enter <strong>HTMLMinifyFilter</strong>. It's nowhere near as clever as mod_pagespeed but it'll do for basic needs. The core of it is a regular expression that will remove certain patterns and it's configurable so you decide which patterns to include.</p>
<div class="highlight"><pre><span class="kn">package</span> <span class="n">au</span><span class="o">.</span><span class="na">com</span><span class="o">.</span><span class="na">xprime</span><span class="o">.</span><span class="na">misc</span><span class="o">.</span><span class="na">webapp</span><span class="o">.</span><span class="na">filter</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.io.*</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.util.regex.*</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">javax.servlet.*</span><span class="o">;</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">HTMLMinifyFilter</span> <span class="kd">implements</span> <span class="n">Filter</span> <span class="o">{</span>
<span class="kd">private</span> <span class="n">Pattern</span> <span class="n">regex</span> <span class="o">=</span> <span class="kc">null</span><span class="o">;</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">doFilter</span><span class="o">(</span><span class="n">ServletRequest</span> <span class="n">req</span><span class="o">,</span> <span class="n">ServletResponse</span> <span class="n">res</span><span class="o">,</span> <span class="n">FilterChain</span> <span class="n">chain</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">IOException</span><span class="o">,</span> <span class="n">ServletException</span> <span class="o">{</span>
<span class="n">HttpServletResponse</span> <span class="n">response</span> <span class="o">=</span> <span class="o">(</span><span class="n">HttpServletResponse</span><span class="o">)</span> <span class="n">res</span><span class="o">;</span>
<span class="n">ResponseWrapper</span> <span class="n">wrapper</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">ResponseWrapper</span><span class="o">(</span><span class="n">response</span><span class="o">);</span>
<span class="n">chain</span><span class="o">.</span><span class="na">doFilter</span><span class="o">(</span><span class="n">req</span><span class="o">,</span> <span class="n">wrapper</span><span class="o">);</span>
<span class="n">String</span> <span class="n">html</span> <span class="o">=</span> <span class="n">wrapper</span><span class="o">.</span><span class="na">toString</span><span class="o">();</span>
<span class="k">if</span> <span class="o">(</span><span class="n">regex</span> <span class="o">!=</span> <span class="kc">null</span> <span class="o">&&</span> <span class="n">response</span><span class="o">.</span><span class="na">getContentType</span><span class="o">()</span> <span class="o">!=</span> <span class="kc">null</span> <span class="o">&&</span> <span class="n">response</span><span class="o">.</span><span class="na">getContentType</span><span class="o">().</span><span class="na">startsWith</span><span class="o">(</span><span class="s">"text/html"</span><span class="o">))</span>
<span class="n">html</span> <span class="o">=</span> <span class="n">regex</span><span class="o">.</span><span class="na">matcher</span><span class="o">(</span><span class="n">html</span><span class="o">).</span><span class="na">replaceAll</span><span class="o">(</span><span class="s">""</span><span class="o">);</span>
<span class="n">response</span><span class="o">.</span><span class="na">setContentLength</span><span class="o">(</span><span class="n">html</span><span class="o">.</span><span class="na">getBytes</span><span class="o">().</span><span class="na">length</span><span class="o">);</span>
<span class="n">PrintWriter</span> <span class="n">out</span> <span class="o">=</span> <span class="n">response</span><span class="o">.</span><span class="na">getWriter</span><span class="o">();</span>
<span class="n">out</span><span class="o">.</span><span class="na">write</span><span class="o">(</span><span class="n">html</span><span class="o">);</span>
<span class="n">out</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">destroy</span><span class="o">()</span> <span class="o">{</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">init</span><span class="o">(</span><span class="n">FilterConfig</span> <span class="n">config</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">ServletException</span> <span class="o">{</span>
<span class="n">StringBuffer</span> <span class="n">pattern</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">StringBuffer</span><span class="o">();</span>
<span class="n">appendIf</span><span class="o">(</span><span class="n">config</span><span class="o">,</span> <span class="s">"strip-linestart-whitespace"</span><span class="o">,</span> <span class="n">pattern</span><span class="o">,</span> <span class="s">"(?<=^)[ \\t]+"</span><span class="o">);</span>
<span class="n">appendIf</span><span class="o">(</span><span class="n">config</span><span class="o">,</span> <span class="s">"strip-lineend-whitespace"</span><span class="o">,</span> <span class="n">pattern</span><span class="o">,</span> <span class="s">"[ \\t]+(?:$)"</span><span class="o">);</span>
<span class="n">appendIf</span><span class="o">(</span><span class="n">config</span><span class="o">,</span> <span class="s">"strip-multiple-whitespace"</span><span class="o">,</span> <span class="n">pattern</span><span class="o">,</span> <span class="s">"([ \\t](?:[ \\t]))+"</span><span class="o">);</span>
<span class="n">appendIf</span><span class="o">(</span><span class="n">config</span><span class="o">,</span> <span class="s">"strip-blank-lines"</span><span class="o">,</span> <span class="n">pattern</span><span class="o">,</span> <span class="s">"(\\n[ \\t]*(?:\\n))+"</span><span class="o">);</span>
<span class="k">if</span> <span class="o">(</span><span class="n">pattern</span><span class="o">.</span><span class="na">length</span><span class="o">()</span> <span class="o">!=</span> <span class="mi">0</span><span class="o">)</span>
<span class="n">regex</span> <span class="o">=</span> <span class="n">Pattern</span><span class="o">.</span><span class="na">compile</span><span class="o">(</span><span class="n">pattern</span><span class="o">.</span><span class="na">toString</span><span class="o">(),</span> <span class="n">Pattern</span><span class="o">.</span><span class="na">MULTILINE</span><span class="o">);</span>
<span class="o">}</span>
<span class="kd">private</span> <span class="kt">void</span> <span class="nf">appendIf</span><span class="o">(</span><span class="n">FilterConfig</span> <span class="n">config</span><span class="o">,</span> <span class="n">String</span> <span class="n">configKey</span><span class="o">,</span> <span class="n">StringBuffer</span> <span class="n">pattern</span><span class="o">,</span> <span class="n">String</span> <span class="n">s</span><span class="o">)</span> <span class="o">{</span>
<span class="k">if</span> <span class="o">(</span><span class="n">config</span><span class="o">.</span><span class="na">getInitParameter</span><span class="o">(</span><span class="n">configKey</span><span class="o">)</span> <span class="o">!=</span> <span class="kc">null</span> <span class="o">&&</span> <span class="n">config</span><span class="o">.</span><span class="na">getInitParameter</span><span class="o">(</span><span class="n">configKey</span><span class="o">).</span><span class="na">equals</span><span class="o">(</span><span class="s">"true"</span><span class="o">))</span> <span class="o">{</span>
<span class="k">if</span> <span class="o">(</span><span class="n">pattern</span><span class="o">.</span><span class="na">length</span><span class="o">()</span> <span class="o">!=</span> <span class="mi">0</span><span class="o">)</span>
<span class="n">pattern</span><span class="o">.</span><span class="na">append</span><span class="o">(</span><span class="sc">'|'</span><span class="o">);</span>
<span class="n">pattern</span><span class="o">.</span><span class="na">append</span><span class="o">(</span><span class="n">s</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="kd">static</span> <span class="kd">class</span> <span class="nc">ResponseWrapper</span> <span class="kd">extends</span> <span class="n">HttpServletResponseWrapper</span> <span class="o">{</span>
<span class="kd">private</span> <span class="n">CharArrayWriter</span> <span class="n">output</span><span class="o">;</span>
<span class="kd">public</span> <span class="nf">ResponseWrapper</span><span class="o">(</span><span class="n">HttpServletResponse</span> <span class="n">response</span><span class="o">)</span> <span class="o">{</span>
<span class="kd">super</span><span class="o">(</span><span class="n">response</span><span class="o">);</span>
<span class="k">this</span><span class="o">.</span><span class="na">output</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">CharArrayWriter</span><span class="o">();</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="n">String</span> <span class="nf">toString</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">output</span><span class="o">.</span><span class="na">toString</span><span class="o">();</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="n">PrintWriter</span> <span class="nf">getWriter</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">PrintWriter</span><span class="o">(</span><span class="n">output</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
</pre></div>
<p><h3>How does it work?</h3>
We start off by wrapping our response in an object that will supply a CharArrayWriter so we can capture and process whatever the rest of the stack is doing (credit for this idea goes <a href="http://stackoverflow.com/questions/5009650/where-can-i-find-a-java-servlet-filter-that-applies-regex-to-the-output">here</a>). We can then process the output with our regular expression(s) and pass it to the real response.</p>
<p>Before I explain what the regular expressions do I want to caution that this won't be satisfactory in certain situations. It's not aware of <script>, <pre> or any other content where whitespace may be important, so unless you're sure stripping whitespace doesn't matter you may want to find a more intelligent solution.</p>
<p>I've split the regex up into 4 optional parts, you turn them on with init-parameters (explained later), matches of each of these are replaced with an empty string:</p>
<p><h4><strong>strip-linestart-whitespace - (?<=^)[ \t]+</strong></h4>
This regex will match whitespace at the beginning of any line. You'll notice that I'm not using \s for my whitespace match, this is because with multi-line pattern matching it'll also match \n and \r which we want to handle separately. The (?<=^) at the beginning is a non-capturing positive look-behind for <em>line-start</em>; so it'll match the start of the line but won't include it in our returned match-group so we only strip out the whitespace.</p>
<p>This option is likely to make the biggest impact on HTML minification on dynamic content because we love to use indentation to define structure.</p>
<p><h4><strong></strong>strip-lineend-whitespace - [ \t]+(?:$)</h4>
Same deal as the linestart regex but this time we have (?:$), a non-capturing positive look-ahead for <em>line end</em>.</p>
<p>This will pick up any sloppyness in your HTML (I wish I could do this in Microsoft Word when I have to edit other people's documents, you can't see it, <strong>but it's still there</strong>!).</p>
<p><h4>strip-multiple-whitespace - (<a href="?:[ \\t]"> \t</a>)+</h4>
Here we have a group of one or more whitespace characters followed by another whitespace character, non-captured, so we don't strip out all whitespace, remember that we are replacing matches with an empty string so we need the non-capturing second space to leave one intact.</p>
<p>This is probably going to be the most dangerous if you might have content where whitespace is important, e.g. <script>, <pre>.l</p>
<p><h4>strip-blank-lines - (\n[ \t]*(?:\n))+</h4>
This is very similar to the multiple-whitespace regex but we match a newline, followed by zero or more whitespace characters, followed by a non-captured newline, all repeated one or more times. So we'll get rid of any lines that don't contain content.</p>
<p><h3>Configuration</h3>
You simply put the filter into your classpath somewhere and wire it up in web.xml. You first define the filter reference and any parameters:</p>
<div class="highlight"><pre><span class="nt"><filter></span>
<span class="nt"><filter-name></span>htmlMinifyFilter<span class="nt"></filter-name></span>
<span class="nt"><filter-class></span>au.com.xprime.misc.webapp.filter.HTMLMinifyFilter<span class="nt"></filter-class></span>
<span class="nt"><init-param></span>
<span class="nt"><param-name></span>strip-linestart-whitespace<span class="nt"></param-name></span>
<span class="nt"><param-value></span>true<span class="nt"></param-value></span>
<span class="nt"></init-param></span>
<span class="nt"><init-param></span>
<span class="nt"><param-name></span>strip-lineend-whitespace<span class="nt"></param-name></span>
<span class="nt"><param-value></span>true<span class="nt"></param-value></span>
<span class="nt"></init-param></span>
<span class="nt"><init-param></span>
<span class="nt"><param-name></span>strip-multiple-whitespace<span class="nt"></param-name></span>
<span class="nt"><param-value></span>true<span class="nt"></param-value></span>
<span class="nt"></init-param></span>
<span class="nt"><init-param></span>
<span class="nt"><param-name></span>strip-blank-lines<span class="nt"></param-name></span>
<span class="nt"><param-value></span>true<span class="nt"></param-value></span>
<span class="nt"></init-param></span>
<span class="nt"></filter></span>
</pre></div>
<p>Any of the parameters can be set to <i>false</i> or omitted all together to turn it off.</p>
<p>Then you need to wire up the filter to any incoming URIs which is done just like servlet-mapping (but still hopelessly unhelpful, why can't we have proper regular expressions for these??). You'll notice that I'm only using a Writer so even though it checks for a text/html response before it does any rewriting you won't want it touching any binary data because we don't wrap getOutputStream(). So, either make sure the filter only gets applied to text/html URIs or modify the filter to be binary-safe. I only have a few URIs that I want to apply this to so I've put them in manually with one of these per URI:</p>
<div class="highlight"><pre><span class="nt"><filter-mapping></span>
<span class="nt"><filter-name></span>htmlMinifyFilter<span class="nt"></filter-name></span>
<span class="nt"><url-pattern></span>/myuri<span class="nt"></url-pattern></span>
<span class="nt"></filter-mapping></span>
</pre></div>
<p>But you can also do the simple url-pattern matching with <em>.ext or /</em>, etc.</p>
<p>And there you go! Cheap and easy HTML minification from within the Servlet container.</p>
Handling X-Forwarded-For in Java and Tomcat2011-07-30T04:49:50.000Zhttps://r.va.gg/2011/07/handling-x-forwarded-for-in-java-and-tomcat.html
<p>This is the first follow-up to my <a title="Wrangling the X-Forwarded-For Header" href="http://rod.vagg.org/2011/07/wrangling-the-x-forwarded-for-header/">post on X-Forwarded-For</a>, I'll assume you've at least scanned that article.</p>
<p><h3>Revision of the security issues</h3>
It's important to recap the security message of my previous post. <strong>Don't assume that the content of the X-Forwarded-For header is either correct or syntactically valid</strong>. The header is not hard to spoof and there are only certain situations where you may be able to trust parts of the content of the header.</p>
<p>So, my simple advice is not to use this header for anything <em>important</em>. Don't use it for authentication purposes or anything else that has security implications. It really should only be used for your own information purposes or to provide customised content for the user where it's OK to be basing that customisation on false information, because this will be a possibility.</p>
<p>We use it on <a href="http://feedxl.com/">FeedXL</a> for IP address geolocation using <a href="http://www.maxmind.com/app/country">GeoIP</a> to serve country specific information to visitors. Ultimately it doesn't really matter a whole lot if we get it wrong; while there are differences in the content the differences aren't major. It may cause some confusion but that confusion can be resolved if the customer wants to contact us. You sign up to FeedXL based on your country but we still let you select your country from a list even though we pre-select the one we guess from your IP address. And if you sign up to the wrong country then you won't get access to the correct database for your country; hardly a major security issue, more of an inconvenience. If you're spoofing X-Forwarded-For then you're probably not the kind of person who's going to get confused at the content, you're probably just poking around and are not really interested in our product anyway!</p>
<p><h3>Extracting a useful IP address</h3>
I ended my last post with a generalised rule for extracting the most likely useful IP address from the X-Forwarded-For header:</p>
<p><blockquote><strong><em>Always use the leftmost non-private address</em></strong>.</blockquote>
And I gave a couple of regular expressions to help with this process: <code>([0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3})</code> or<code> (\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})</code> to match an IP address. And <code>(^127.0.0.1)|(^10.)|(^172.1[6-9].)|(^172.2[0-9].)|(^172.3[0-1].)|(^192.168.)</code>. To match a private IP address.</p>
<p><h3>Java use cases</h3>
In my Java code I have 2 uses for the IP address from X-Forwarded-For, both of these come up because we're working behind a load balancer (Amazon's <a href="http://aws.amazon.com/elasticloadbalancing/">Elastic Load Balancing</a>) and don't have direct access to the remote host information:</p>
<p><ul>
<li>Looking up the country information in the GeoIP database using their Java API. Most of our use of GeoIP is with <a href="http://geolite.maxmind.com/download/geoip/api/mod_geoip2/">mod_geoip</a> in <a href="http://httpd.apache.org/">Apache </a>but we also want to occasionally use it from within a <a href="http://www.oracle.com/technetwork/java/javaee/servlet/index.html">servlet</a>. For example, on our sign-up page we pre-select the country at the top of the page based on your IP address, this is done within Java.</li>
<li>More interesting logging from <a href="http://tomcat.apache.org/">Tomcat</a>: if I want to have <a href="http://tomcat.apache.org/tomcat-6.0-doc/config/valve.html#Access_Log_Valve">AccessLogValve</a> turned on, the host information isn't very interesting behind a load balancer.</li>
</ul>
A generic parser would serve both of these purposes!</p>
<p><h3>Parsing X-Forwarded-For</h3>
I have created a simple utility class to do the parsing, called from wherever I need either an <strong>IP address</strong> or a <strong>hostname</strong>.</p>
<div class="highlight"><pre><span class="kn">package</span> <span class="n">au</span><span class="o">.</span><span class="na">com</span><span class="o">.</span><span class="na">xprime</span><span class="o">.</span><span class="na">webapp</span><span class="o">.</span><span class="na">util</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.net.Inet4Address</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.net.InetAddress</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.net.UnknownHostException</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.util.regex.Matcher</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.util.regex.Pattern</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">javax.servlet.http.HttpServletRequest</span><span class="o">;</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">InetAddressUtil</span> <span class="o">{</span>
<span class="kd">private</span> <span class="kd">static</span> <span class="kd">final</span> <span class="n">String</span> <span class="n">IP_ADDRESS_REGEX</span> <span class="o">=</span> <span class="s">"([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3})"</span><span class="o">;</span>
<span class="kd">private</span> <span class="kd">static</span> <span class="kd">final</span> <span class="n">String</span> <span class="n">PRIVATE_IP_ADDRESS_REGEX</span> <span class="o">=</span> <span class="s">"(^127\\.0\\.0\\.1)|(^10\\.)|(^172\\.1[6-9]\\.)|(^172\\.2[0-9]\\.)|(^172\\.3[0-1]\\.)|(^192\\.168\\.)"</span><span class="o">;</span>
<span class="kd">private</span> <span class="kd">static</span> <span class="n">Pattern</span> <span class="n">IP_ADDRESS_PATTERN</span> <span class="o">=</span> <span class="kc">null</span><span class="o">;</span>
<span class="kd">private</span> <span class="kd">static</span> <span class="n">Pattern</span> <span class="n">PRIVATE_IP_ADDRESS_PATTERN</span> <span class="o">=</span> <span class="kc">null</span><span class="o">;</span>
<span class="kd">private</span> <span class="kd">static</span> <span class="n">String</span> <span class="nf">findNonPrivateIpAddress</span><span class="o">(</span><span class="n">String</span> <span class="n">s</span><span class="o">)</span> <span class="o">{</span>
<span class="k">if</span> <span class="o">(</span><span class="n">IP_ADDRESS_PATTERN</span> <span class="o">==</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
<span class="n">IP_ADDRESS_PATTERN</span> <span class="o">=</span> <span class="n">Pattern</span><span class="o">.</span><span class="na">compile</span><span class="o">(</span><span class="n">IP_ADDRESS_REGEX</span><span class="o">);</span>
<span class="n">PRIVATE_IP_ADDRESS_PATTERN</span> <span class="o">=</span> <span class="n">Pattern</span><span class="o">.</span><span class="na">compile</span><span class="o">(</span><span class="n">PRIVATE_IP_ADDRESS_REGEX</span><span class="o">);</span>
<span class="o">}</span>
<span class="n">Matcher</span> <span class="n">matcher</span> <span class="o">=</span> <span class="n">IP_ADDRESS_PATTERN</span><span class="o">.</span><span class="na">matcher</span><span class="o">(</span><span class="n">s</span><span class="o">);</span>
<span class="k">while</span> <span class="o">(</span><span class="n">matcher</span><span class="o">.</span><span class="na">find</span><span class="o">())</span> <span class="o">{</span>
<span class="k">if</span> <span class="o">(!</span><span class="n">PRIVATE_IP_ADDRESS_PATTERN</span><span class="o">.</span><span class="na">matcher</span><span class="o">(</span><span class="n">matcher</span><span class="o">.</span><span class="na">group</span><span class="o">(</span><span class="mi">0</span><span class="o">)).</span><span class="na">find</span><span class="o">())</span>
<span class="k">return</span> <span class="n">matcher</span><span class="o">.</span><span class="na">group</span><span class="o">(</span><span class="mi">0</span><span class="o">);</span>
<span class="n">matcher</span><span class="o">.</span><span class="na">region</span><span class="o">(</span><span class="n">matcher</span><span class="o">.</span><span class="na">end</span><span class="o">(),</span> <span class="n">s</span><span class="o">.</span><span class="na">length</span><span class="o">());</span>
<span class="o">}</span>
<span class="k">return</span> <span class="kc">null</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kd">static</span> <span class="n">String</span> <span class="nf">getAddressFromRequest</span><span class="o">(</span><span class="n">HttpServletRequest</span> <span class="n">request</span><span class="o">)</span> <span class="o">{</span>
<span class="n">String</span> <span class="n">forwardedFor</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="na">getHeader</span><span class="o">(</span><span class="s">"X-Forwarded-For"</span><span class="o">);</span>
<span class="k">if</span> <span class="o">(</span><span class="n">forwardedFor</span> <span class="o">!=</span> <span class="kc">null</span> <span class="o">&&</span> <span class="o">(</span><span class="n">forwardedFor</span> <span class="o">=</span> <span class="n">findNonPrivateIpAddress</span><span class="o">(</span><span class="n">forwardedFor</span><span class="o">))</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span>
<span class="k">return</span> <span class="n">forwardedFor</span><span class="o">;</span>
<span class="k">return</span> <span class="n">request</span><span class="o">.</span><span class="na">getRemoteAddr</span><span class="o">();</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kd">static</span> <span class="n">String</span> <span class="nf">getHostnameFromRequest</span><span class="o">(</span><span class="n">HttpServletRequest</span> <span class="n">request</span><span class="o">)</span> <span class="o">{</span>
<span class="n">String</span> <span class="n">addr</span> <span class="o">=</span> <span class="n">getAddressFromRequest</span><span class="o">(</span><span class="n">request</span><span class="o">);</span>
<span class="k">try</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">Inet4Address</span><span class="o">.</span><span class="na">getByName</span><span class="o">(</span><span class="n">addr</span><span class="o">).</span><span class="na">getHostName</span><span class="o">();</span>
<span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="n">Exception</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>
<span class="o">}</span>
<span class="k">return</span> <span class="n">addr</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kd">static</span> <span class="n">InetAddress</span> <span class="nf">getInet4AddressFromRequest</span><span class="o">(</span><span class="n">HttpServletRequest</span> <span class="n">request</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">UnknownHostException</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">Inet4Address</span><span class="o">.</span><span class="na">getByName</span><span class="o">(</span><span class="n">getAddressFromRequest</span><span class="o">(</span><span class="n">request</span><span class="o">));</span>
<span class="o">}</span>
<span class="o">}</span>
</pre></div>
<p><em>(Download <a href="http://src.vagg.org/java/InetAddressUtil.java">here</a>)</em></p>
<p>Given an <code>HttpServletRequest</code> we can call either <code>getAddressFromRequest()</code> or <code>getHostnameFromRequest()</code> to get the data we need.</p>
<p>We first use the general IP address regular expression and on line 23 we loop through each match we find, starting from the left of the beginning of the string. This way we don't even look at the commas in the string and don't care if there are any spaces or not. We also get to avoid any nonsense data that may be in the string. If you spoof the header with a random string of characters then it'll be ignored. The code is quite strict in that it'll only bother with non-private IP addresses in the header, otherwise it will resort to the remote address of the request as a fall-back.</p>
<p>Our hostname resolution is also prepared for failure and will return the original IP address if it can't get you a hostname.</p>
<p>Instead of just calling <code>request.getRemoteAddr()</code> and <code>request.getRemoteHost()</code> from our own code, you'd simply wrap them in <code>InetAddressUtil.getAddressFromRequest(request)</code> and <code>InetAddressUtil.getHostnameFromRequest(request)</code>.</p>
<p><h3>Extending Tomcat logging</h3>
You enable request logging in Tomcat by attaching an AccessLogValve to your context or host. It mirrors the custom formatting options that you'll find in <a href="http://httpd.apache.org/docs/2.0/mod/mod_log_config.html">Apache's CustomLog</a>. So, you can print out a <strong>%h</strong> for the request hostname but behind a load balancer you'll just get the name or address of the load balancer that's forwarding the request. You could also just use <strong>%{X-Forwarded-For}i</strong> to get access to the raw header value, but this will either just be an IP address or a comma separated string of IP addresses. This may be useful for your purposes but not mine, I want a hostname!</p>
<p>Unfortunately, AccessLogValve doesn't lend itself to easy extension, there are two <code>createAccessLogElement()</code> methods that you'd ideally be able to overwrite in your own subclass and return a new custom <code>AccessLogElement</code> for the character you've chosen to represent your log element.</p>
<p>The best we can do is overwrite the protected <code>createLogElements</code> and copy the functionality from there and extend with our own. However, in my extension of AccessLogValve I've assumed that the Tomcat boys will eventually <a href="https://issues.apache.org/bugzilla/show_bug.cgi?id=51588">fix</a> the access modifiers for the <code>createLogElement()</code> methods so I've just copied the whole class, named it <code>AccessLogValve<em></code> and changed the modifiers myself. The plan being to remove this in the future and take the </em> of the extended class name in my code.</p>
<p>Here's my extended AccessLogValve</p>
<div class="highlight"><pre><span class="kn">package</span> <span class="n">au</span><span class="o">.</span><span class="na">com</span><span class="o">.</span><span class="na">xprime</span><span class="o">.</span><span class="na">catalina</span><span class="o">.</span><span class="na">valves</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.util.Date</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.catalina.connector.Request</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.catalina.connector.Response</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">au.com.xprime.webapp.util.InetAddressUtil</span><span class="o">;</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">AccessLogValve</span> <span class="kd">extends</span> <span class="n">org</span><span class="o">.</span><span class="na">apache</span><span class="o">.</span><span class="na">catalina</span><span class="o">.</span><span class="na">valves</span><span class="o">.</span><span class="na">AccessLogValve_</span> <span class="o">{</span>
<span class="kd">protected</span> <span class="kd">class</span> <span class="nc">ForwardedForAddrElement</span> <span class="kd">implements</span> <span class="n">AccessLogElement</span> <span class="o">{</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">addElement</span><span class="o">(</span><span class="n">StringBuffer</span> <span class="n">buf</span><span class="o">,</span> <span class="n">Date</span> <span class="n">date</span><span class="o">,</span> <span class="n">Request</span> <span class="n">request</span><span class="o">,</span> <span class="n">Response</span> <span class="n">response</span><span class="o">,</span> <span class="kt">long</span> <span class="n">time</span><span class="o">)</span> <span class="o">{</span>
<span class="n">buf</span><span class="o">.</span><span class="na">append</span><span class="o">(</span><span class="n">InetAddressUtil</span><span class="o">.</span><span class="na">getAddressFromRequest</span><span class="o">(</span><span class="n">request</span><span class="o">));</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="kd">protected</span> <span class="kd">class</span> <span class="nc">ForwardedForHostElement</span> <span class="kd">extends</span> <span class="n">ForwardedForAddrElement</span> <span class="o">{</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">addElement</span><span class="o">(</span><span class="n">StringBuffer</span> <span class="n">buf</span><span class="o">,</span> <span class="n">Date</span> <span class="n">date</span><span class="o">,</span> <span class="n">Request</span> <span class="n">request</span><span class="o">,</span> <span class="n">Response</span> <span class="n">response</span><span class="o">,</span> <span class="kt">long</span> <span class="n">time</span><span class="o">)</span> <span class="o">{</span>
<span class="n">buf</span><span class="o">.</span><span class="na">append</span><span class="o">(</span><span class="n">InetAddressUtil</span><span class="o">.</span><span class="na">getHostnameFromRequest</span><span class="o">(</span><span class="n">request</span><span class="o">));</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="kd">protected</span> <span class="n">AccessLogElement</span> <span class="nf">createAccessLogElement</span><span class="o">(</span><span class="kt">char</span> <span class="n">pattern</span><span class="o">)</span> <span class="o">{</span>
<span class="n">AccessLogElement</span> <span class="n">accessLogElement</span> <span class="o">=</span> <span class="kd">super</span><span class="o">.</span><span class="na">createAccessLogElement</span><span class="o">(</span><span class="n">pattern</span><span class="o">);</span>
<span class="k">if</span> <span class="o">(</span><span class="n">accessLogElement</span> <span class="k">instanceof</span> <span class="n">StringElement</span><span class="o">)</span> <span class="o">{</span>
<span class="k">switch</span> <span class="o">(</span><span class="n">pattern</span><span class="o">)</span> <span class="o">{</span>
<span class="k">case</span> <span class="sc">'f'</span> <span class="o">:</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">ForwardedForAddrElement</span><span class="o">();</span>
<span class="k">case</span> <span class="sc">'F'</span> <span class="o">:</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">ForwardedForHostElement</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="k">return</span> <span class="n">accessLogElement</span><span class="o">;</span>
<span class="o">}</span>
<span class="o">}</span>
</pre></div>
<p><em>(Download <a href="http://src.vagg.org/java/AccessLogValve.java">here</a> and AccessLogValve<em> <a href="<a href="http://src.vagg.org/java/AccessLogValve">http://src.vagg.org/java/AccessLogValve</a></em>.java">here</a>)</em></p>
<p>Which gives me <strong>%f</strong> for the X-Forwarded-For IP address and %F for the X-Forwarded-For address. My valve pattern looks like this:</p>
<p><code style="padding-left: 30px;">pattern="%F %f %h %l %u %t %r" %s %b "%{Referer}i" "%{User-Agent}i""</code></p>
<p>Simply compile, place together in a JAR, put it in your Tomcat lib directory then make sure you use the right class name when building your AccessLogValve descriptor. The lazy can find a JAR (including source) <a href="http://src.vagg.org/java/xprime_accesslogvalve.jar">here</a>.</p>
<p>Next I'll be getting dirty with C and hack mod_geoip to do something similar.</p>
Wrangling the X-Forwarded-For Header2011-07-29T06:15:36.000Zhttps://r.va.gg/2011/07/wrangling-the-x-forwarded-for-header.html
<p>Until recently, we've served pages directly from the server for <a title="FeedXL D. I. Y. Horse Nutrition" href="http://feedxl.com/">FeedXL.com</a> but we've since moved to a load balancing situation with multiple servers behind a load balancer.</p>
<p><h3><strong>AWS & ELB</strong></h3>
We use <a title="Amazon Web Services" href="http://aws.amazon.com">Amazon Web Services</a> to host FeedXL and are now using their <strong><a title="AWS Elastic Load Balancing" href="http://aws.amazon.com/elasticloadbalancing/">Elastic Load Balancing</a> (ELB)</strong> service to spread the load across 3 <em>Availability Zones</em> in the main datacentre we operate from. We're doing this primarily for high availability purposes rather than to handle heavy load but the added benefit is that it lets us scale up really easily if we have any sudden spikes in our traffic. We're using some small instances at the front using <a href="http://httpd.apache.org/">Apache</a> to handle the main traffic. The dynamic content is passed on to larger back-end instances running our webapp in <a href="http://tomcat.apache.org/">Tomcat</a>.</p>
<p>A couple of our important <a href="http://aws.amazon.com/ebs/">EBS</a> volumes were among the last to be restored during <a href="http://alestic.com/2011/04/ec2-outage">Judgement Day</a>, April 2011 and while we had regular snapshots we hesitated for too long before rebuilding our service in a different Availability Zone (or Region), partly because of lack of clear information about the outage from Amazon (we were continually given the impression that it wouldn't be long before things were back online, so why not wait just a tiny bit longer to restore to normal service than restore from slightly older snapshots?). Probably like many AWS customers impacted by the outage, we've increased our spend to boost our redundancy to better handler outages of this kind. We now span multiple Availability Zones and have increased the quality of our off-Region backups. I'm pretty sure that in the end Amazon has ended up doing very well from their rather embarrassing incident with many customers keen to avoid their own embarrassment the next time it happens.</p>
<p>However, switching to ELB hasn't been without hiccups.</p>
<p><h3>GeoIP</h3>
We rely very heavily on <a href="http://www.maxmind.com/app/country">GeoIP</a> from MaxMind to serve content customised to each country. We have a large amount of functionality built right in to our Apache configuration that uses both <a href="http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html">rewrites</a> and <a href="http://httpd.apache.org/docs/2.0/mod/mod_include.html">SSI</a> to make our static content relatively dynamic. We even do spelling correction for UK/US English depending on where you view our site from! The main reason we customise content though is because FeedXL is a different product for each country. We have to maintain country specific feeds databases and we also mostly deal with local currencies so our price details change a little depending on where you are. We've had a very good experience with GeoIP with only a few mismatches reported by customers and they've always been corporate networks where traffic is routed internationally (Australia->USA or NZ->AU for example) or satellite connections without a likely country of origin.</p>
<p>The way that <a href="http://geolite.maxmind.com/download/geoip/api/mod_geoip2/">mod_geoip</a> for Apache works is that it takes the request IP address and looks it up in its database to find the (most likely) country of origin, you then get environment variables in your Apache request: GEOIP_COUNTRY_CODE & GEOIP_COUNTRY_NAME. You can use these with mod_rewrite to do all sorts of crazy things, plus mod_include lets you do more straightforward things with your content. For example, if we want to make a North America specific announcement we might wrap our announcement block in <code><!--#if expr='"$GEOIP_COUNTRY_CODE" = "US" || "$GEOIP_COUNTRY_CODE" = "CA"' --> <em>... content ... </em><!--#endif --></code>.</p>
<p>However, one of the most important catches of load balancing is that your requests come to your web server from the load balancer itself and not the original client, so you don't get the raw IP address of the client built into your request. Instead, with ELB and most other load balancers you need to use the <a href="http://en.wikipedia.org/wiki/X-Forwarded-For"><strong>X-Forwarded-For</strong> </a>HTTP header.</p>
<p><h3>X-Forwarded-For</h3>
The X-Forwarded-For header was first introduced by <a href="http://www.squid-cache.org/">Squid</a> as a means of passing on the IP address of the client to the server. It has since been widely adopted by other proxy servers and load balancers so it's pretty much considered a <em>standard</em> even if it technically isn't.</p>
<p>What you are supposed to get as your header is this:</p>
<pre><code>X-Forwarded-For: clientIP, server1IP, server2IP, server3IP
</code></pre><p>The client IP address should be first, followed by first proxy server, followed by any other servers in a comma separated list. The final server that passes the request on to you won't be in the list, <span style="text-decoration: underline;">a proxy server or load balancer will only append the address of server it received the request from if the X-Forwarded-For header was passed to it</span> otherwise it just constructs a new X-Forwarded-For with just the client address in it. The address of the last server in the complete <em>chain</em> is simply the address of the client making the request to your server. But as usual in the web world there are no guarantees.</p>
<p>Apache kindly gives you an HTTP_X_FORWARDED_FOR environment variable (although I can't find official documentation on this so I'm not sure of the specifics of what conditions may prevent you from getting this variable). You could use this in custom modules or standard modules that use environment variables such as mod_rewrite. If you want to log with it then you could configure your <code>LogFormat</code> to print it out with <code>%{X-Forwarded-For}i</code> to make your logs more interesting than just showing the load balancer hostname as <code>%h</code>.</p>
<p>mod_geoip has a configuration switch, <code>GeoIPScanProxyHeaders On</code> that tells it to use X-Forwarded-For (or HTTP_X_FORWARDED_FOR) to determine the client IP address rather than just the remote address.</p>
<p>There are some important catches to consider before you proceed to use this header to do anything interesting:</p>
<p><ol>
<li>Most importantly, headers can be crafted by anyone, <strong>never trust a header value unless you are certain that it can't be spoofed</strong>. I'd actually just simplify that to just <em>never trust a header value</em>. So if you are going to use it then don't use it for anything that has security implications.</li>
<li>The client IP address that you get from the first entry may not actually be the address that you want. Most of the time the requests will probably come directly from the browser of your visitor but what if they are behind a proxy server within a private network themselves? The IP address you may end up with could be something like 10.1.34.121 which is of no value because it only tells you that they are sitting on a private network <em>somewhere</em> in the world.</li>
</ol></p>
<p><h3>Security Implications</h3>
This is pretty straightforward. If you're in the situation of handling traffic behind a load balancer then you may be able to guarantee that your traffic comes from the load balancer so the header is constructed by it, but consider the situation where X-Forwarded-For contains a chain of addresses, potentially from untrusted sources. If the header contains at least one <em>server</em> IP address then the <em>client</em> IP address will have been passed on by the upstream server with no way for your load balancer to verify its correctness; all it's doing is adding the address of the requesting host onto the end of the list.</p>
<p>There's also the possibility of direct connections to your web server(s). Are your servers walled off from the outside world with only the load balancer able to communicate with it? Is there a possibility that a client can make a direct connection to your server and construct its own X-Forwarded-For header? On AWS, all standard instances have a public IP address but you can set up your <a href="http://docs.amazonwebservices.com/AWSEC2/2007-08-29/DeveloperGuide/distributed-firewall-concepts.html">security groups </a>to only allow access to port 80 from your load balancer. This is probably a good idea for many reasons.</p>
<p>Basically, I would suggest working on the assumption that X-Forwarded-For is only <em>likely</em> to be correct, nothing more.</p>
<p><h3>Best Guess IP Address</h3>
When using X-Forwarded-For, the assumption normally made is that the first IP address in the list is the client address that you can use to do interesting things with, like IP address geolocation (à la GeoIP). But what about <a href="http://en.wikipedia.org/wiki/Private_network">private addresses</a>? What about the casual browser at McDonalds using their WiFi with a 10.x.x.x address or a company network with a 192.168.x.x internal address structure? You'll end up with a very unhelpful address that'll tell you nothing very interesting about the client.</p>
<p>There are 3 sets of address ranges in <a href="http://en.wikipedia.org/wiki/IPv4">IPv4</a> (lets ignore <a href="http://en.wikipedia.org/wiki/IPv6">IPv6</a> for now) that are reserved for private networks. Normally these are hidden behind <a href="http://en.wikipedia.org/wiki/Network_address_translation">NAT</a> gateways and often traffic is forced to either manually or automatically route through a proxy server of some kind. The address ranges are:</p>
<p><ul>
<li>10.0.0.0 – 10.255.255.255</li>
<li>172.16.0.0 – 172.31.255.255</li>
<li>192.168.0.0 – 192.168.255.255</li>
</ul>
You can thank these beauties for extending the life of IPv4 way beyond what it would otherwise have been.</p>
<p>If you have a client behind one of these networks and it's not routed through a proxy server then you'll probably just get the IP address of the NAT gateway which is likely to be the address you want to use. If the request is routed through a proxy server then you may get an X-Forwarded-For that looks something like this:</p>
<pre><code>X-Forwarded-For: 10.208.4.38, 58.163.175.187
</code></pre><p>Where the address you probably want is actually the (proxy) server address on the end rather than the private client address.</p>
<p>You may also have a chain of multiple servers, perhaps you have a downstream proxy server going through a larger upstream one before heading out of the network, so you may get something like this:</p>
<pre><code>X-Forwarded-For: 10.208.4.38, 58.163.1.4, 58.163.175.187
</code></pre><p>Or, the downstream proxy server could be within the private network, perhaps a departmental proxy server connecting to a company-wide proxy server and then this may happen:</p>
<pre><code>X-Forwarded-For: 10.208.4.38, 10.10.300.23, 58.163.175.187
</code></pre><p>This could of course be even more complex as you may have a longer chain of proxy servers (although I've never actually seen anyone chain more than 2 layers of proxy servers together in a network before).</p>
<p>So what general rule should we construct for extracting our usable client IP from these addresses?</p>
<p>Of course, I'm suggesting that the rule: <strong><em>always use the leftmost address</em></strong> is not correct as there is a good chance it may be a private IP address if there is more than 1 address in the list. Unfortunately this is the rule that mod_geoip adopts, if it finds a comma it just chops off the string at that comma. We immediately found this led to unsatisfactory results with ELB as we had more requests than we expected originating from private networks routed through proxy servers; and we heard about it in the form of error reports from our users (<em>"where's the log in link?"</em>--it's not normally displayed in countries where we haven't released FeedXL).</p>
<p>An alternative would be <strong><em>always use the rightmost address</em></strong> which would probably get you a pretty good guess in almost all cases. If there is more than one IP address in the list then the rightmost address will probably be the address where the request left whatever corporate or internal network the client was hidden behind, even if there are multiple layers. However, multiple layers of IP addresses suggests a fairly large network, possibly widely disbursed. There's also a chance that you have one proxy server piggybacking off a higher capacity upstream proxy server: for example, some ISPs run their own very large proxy servers that customers can use and may make ideal upstream connections for internal proxy servers with caching at both levels. The ISP proxy server is likely to be located in a very different place to the client though and if you're trying to pin down the IP address of the client using something like <a href="http://www.maxmind.com/app/city">GeoIP City</a> then you'll probably get the wrong city.</p>
<p>So, here's the rule that I suggest would be the best general case rule to allow you to extract the address most likely to be physically close to the real client:</p>
<p><p style="padding-left: 30px;"><strong><em>Always use the leftmost <span style="text-decoration: underline;">non-private</span> address</em></strong>.</p>
We can do this because the rules are clear about what is and what is not a private IP address (see above).</p>
<p><h3>Doing It the Regular Expression Way</h3>
First, remember that the X-Forwarded-For header is not very trustworthy. You don't want to even assume that it contains IP addresses! So, before you even check if an entry is a private IP address or not you should probably simply check if it's an IP address.</p>
<p>Here's a simple regular expression to match an IP address: <code>([0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3})</code> or alternatively, if you're working in an environment that supports \d then this will do the same thing: <code>(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})</code> (with or without the parentheses but as you'll see, they are useful for the next step).</p>
<p>Then you'll want to check if an IP address is private or not, here's a regular expression that'll do that for you, given a valid IP address: <code>(^127.0.0.1)|(^10.)|(^172.1[6-9].)|(^172.2[0-9].)|(^172.3[0-1].)|(^192.168.)</code>. This matches all of the addresses matched in the ranges above and 127.0.0.1 as a bonus (quite possible in our chain!).</p>
<p>So a general algorithm could be something like this: walk through the string starting from the first match of our general IP address regular expression through to the last. For each match, check if the matched component matches our private IP address regular expression, if it does then proceed to the next address in the list, if it doesn't match then we have the IP address we want. If we get to the end of the list without finding an IP address that isn't private then we have to have some kind of generic fall-back.</p>
<p>Exactly what your fall-back might be depends on your environment and whether your trust the server passing you the request or not. In the case of ELB, if it's working properly we should never need the fall-back case. For FeedXL our fall-back for any failure during the GeoIP process is to just assume that they are coming from the country where most of our customers are from (currently Australia).</p>
<p>I have 2 follow-up posts to make after this one, first I'll show how I deal with X-Forwarded-For in both Tomcat and our own Java software, then I'll show how I've hacked mod_geoip to use the algorithm outlined above with excellent results.</p>
<p><em><strong>Follow-up #1: <a title="Permanent Link to Handling X-Forwarded-For in Java and Tomcat" href="http://rod.vagg.org/2011/07/handling-x-forwarded-for-in-java-and-tomcat/" rel="bookmark">Handling X-Forwarded-For in Java and Tomcat</a></strong></em></p>
<p><strong><em>Follow-up #2: <a title="A mod_geoip2 that properly handles X-Forwarded-For" href="http://rod.vagg.org/2012/04/a-mod_geoip2-that-properly-handles-x-forwarded-for/"><strong><em>A mod_geoip2 that properly handles X-Forwarded-For</em></strong></a>
</em></strong></p>
<p><h3>Update July 30th 2011</h3>
I've just stumbled upon <a title="X-Forwarded-For Spoofer" href="https://addons.mozilla.org/en-US/firefox/addon/x-forwarded-for-spoofer/">this</a>, an "X-Forwarded-For Spoofer" Add-On for Firefox and I love the description, sums up the security concerns:</p>
<p><blockquote><em>Some clients add X-Forwarded-For to HTTP requests in an attempt to help servers identify the originating IP address of a request. Some clients, however, can set X-Forwarded-For to any arbitrary value. Some servers assume X-Forwarded-For is unassailable. No server should.</em></p>
<p><em>With this add-on, you can assign an arbitrary IP address to the X-Forwarded-For field, attempt to perform XSS by including HTML in this field, or even attempt SQL injection.</em></blockquote>
May be useful for testing and debugging your web application.</p>