On Monday, Stephen Chin from Oracle visited me at the 10gen offices as part of his NightHacking tour. In the video we talk about my sessions at JavaOne and the Agile presentation I'm giving at Devoxx, and I do some very basic hacking using the MongoDB Java driver, attempting to showcase gradle at the same time. It was a fun experience, even if it's scary being live-streamed and recorded!
So, I’ve finished my first full week in the new job and I’ve learnt lots of new stuff. Which is great, because that’s usually why you change jobs.
- They don’t use SQL. Who knew?
- There are different flavours. There’s a graphy one and key-value things and… others…
- They’re “scalable” (yes, yes, it’s web scale).
- Some/many/all(?) embrace the idea of eventual consistency
I was suspicious of the hype surrounding NoSQL, partly because it’s associated with the meaningless marketing term “Big Data” and partly because I’m a cynic that sneers at things that get too popular. Here’s what I think when I hear the following terms:
- Cloud - Fire your systems people and ditch your comms room!
- Big Data - Parse Twitter in order to learn how to read your customer’s minds!
- NoSQL - Stop paying Oracle!
- Functional - We couldn’t get good enough at mainstream programming languages so we switched to something more difficult!
I don’t know if it’s healthy to be this cynical, but I’m too old to jump on every bandwagon that comes along.
Anyway. Back to the people who now pay my bills.
It’s unfortunate that the lack of SQL is the thing that captured the imagination, rather than the lack of tables and a relational structure. SQL was never (in my mind) a particularly evil thing, it’s a pretty good language for saying “I want this stuff from this place that fits these criteria”, and that’s something we’re going to have to do at some point whatever the technology.
|Series of database tables and their relationships. Honest.|
(Yes, I’m experimenting again. This time with my shiny new iPad, a stylus and Penultimate. It’s good for ad-hoc drawings, but lacks the precision of the graphics tablet and flexibility of GIMP).
At the very high level, it seems like there are four (ish) types of NoSQL databases:
- Column Family
Column family databases feel to me, as a newbie to the field, similar to key/value, which I’ll come on to. I’ve mostly heard Cassandra used as an example of this type of NoSQL database. I guess the way I think of this, and of course I could be wrong/over-simplifying, is a unique key linked to a set of key/values:
Which I’m translating into groups of key/value pairs, with a the ID as a sort of header:
|Key/value pairs grouped by ID|
You need the key in order to look up all the details about me. The way I hear it, it’s great for writing data, but it’s less flexible for ad-hoc queries.
These types of NoSQL database (e.g. Riak) are pretty much as schema-less as you get - just dump key-value pairs into them. To be honest, the best description I found was on dba.stackexchange.com, so I’m not going to re-write that with my (at this point) limited understanding.
|Never ending lists of key/values|
From what I’ve heard so far, both Key/Value and Column Family databases embrace eventual consistency. I don’t know how much of that is a function of their data model and how much is decided by the individual products. For some people eventual consistency is deal-breaker, but in many cases it seems to me that it’s just a matter of getting your head around this and designing your application appropriately.
|Graph of nodes with annotated relationships|
I’d be interested in what the architectural trade-offs in using this model are.
Now MongoDB falls into category four, the document database. And as a NoSQL n00b, this is now the product and area I know most about, and am clearly going to be more excited about since 10gen are indoctrinating me in the MongoDB way.
Documents are a familiar structure for developers, especially if they’ve been working with JSON. So, a document might be:
To me, this looks like it maps onto to my domain-shaped Object Model more easily than a relational database, which always needs some sort of O-R mapping (whether you do this with hibernate or use Spring to do it yourself, you’re still mapping tables into objects and vice versa). What I like about the document format is the nested sub-documents for data that belongs together. In relational databases you often end up denormalising for performance anyway, so why not just accept that up front and have it as part of the thing you’re storing?
|A document with sub-documents. Think XML/JSON.|
This does have a cost, of course - nothing is without trade-offs. Every time you request this document, you get the whole lot. You can’t have the person without the address. So, you do need to understand the relationships (still) and whether you’re usually going to want to get all that data at the same time or whether you might want to make two separate calls.
Which brings me on to another thing which is familiar from relational days - foreign keys. A field in your document can be the ID of another document, so you can follow the links through and retrieve other documents associated with the starting one. Again, there are trade-offs here - each link you follow is a different request to the database. These database requests can be very quick, but if you wanted this data every time, you’d probably want it embeded in your first document to save the additional call. I guess it’s a latency vs throughput question really - a single query which returns a chunky document, or multiple queries that return smaller ones.
|Documents can link to other documents.|
So schema design is still important in document databases even if you don’t have a relational schema. No new technology is an excuse to stop thinking about the problem you’re trying to solve and understanding the tradeoffs in design.
One of the advantages, it seems, of something like MongoDB over some of the key/value databases is the ability to write ad-hoc queries and to tune for those queries. The data is structured (it’s in a document) and it doesn’t have to be in the same structure every time - not every document relating to a person needs all the fields that another person might have. But you can still query for people who have blue cars or people who live in London, or people who’s surnames begin with G. If you find yourself doing the same query a number of times, you can add indexes to MongoDB the same way you would a relational database.
Semms like I’m getting into more of the nitty-gritty MongoDB details, so I’ll stop there and leave that for another time.
Classing a whole swathe of products as “NoSQL” is misleading and confusing. The only thing they all share in common is that they are not traditional relational databases. Other than that, some of them are as different from each other as they are from relational databases. I haven’t even mentioned caching technologies - these products have functionality which overlaps with NoSQL databases as well. But even then, the purposes are somewhat different, and not even mutually exclusive.
As with anything, it’s really important to understand the strengths and weaknesses of a technology, and the demands of your domain. These different ways of organising data, and different products, are going to perform really well in certain circumstances, and pretty poorly when used in others. Getting an understanding of what those strengths and weaknesses are is going to be important in making the correct product/architecture/design decisions.
None of this information is new, there’s a lot of material on the web about the different types of NoSQL databases. I’m writing it more for my own benefit than anything else, my memory is notoriously shocking. For more in-depth (and probably more accurate reading) there’s:
- Martin Fowler’s NoSQL Distilled
- …and his introduction to the subject
- Tim Berglund (@tlberglund) did a great overview of three types at JAX London last week. There’s a video of the same content (different conference) here.
- http://nosql-database.org/ appears to list all the products that fall under the massive umbrella, but isn’t the most usable of sites.
- And yes, I used Wikipedia. Which is probably where I went wrong…
A hastily thrown-together list of some of the places to get more information on how to write performant code.
- Java Performance Tuning Course (Kirk Pepperdine, Java Champion)
- Java Performance Tuning (Kirk and Jack Shirazi)
- Non Functional Requirements (Simon Brown, Coding the Architecture)
- Performance Myths (Martin Thompson)
- Understanding Java Garbage Collection (Gil Tene, Azul)
- JHiccup for testing how long your system takes to do nothing (also Azul)
- Vanilla Java - Core Java for simpler, faster applications (Peter Lawrey)
- JClarity Performance Community (Ben, Martijn, Kirk et al)
- We Don’t Need No Stinkin’ Locks (Mike Barker, LMAX)
Seemed like a quiet conference this year. Not really sure why, maybe it was the layout of the massive (and extremely dark) main room; maybe it was the awkward L-shape of the communal space; or maybe this year people were more interested in listening to the (really very good) sessions rather than participating or meeting other people. Whatever the reason, it felt quiet and almost low-key.
Performance seemed pretty high on the agenda, as you’d expect from a London conference, with a number of things on offer:
- A great keynote from Kirk Pepperdine and Martijn Verburg, covering a massive range of things to care about when thinking about performance on the first night
- A high-level talk about Java Performance from yours truly (which I may run again for the LJC if there’s interest, but it’s more likely to be a one-off)
- A deep dive into writing lock-free coding by Mike Barker
- And a talk from Kirk exploring your GC logs.
- Commuting through Victoria Station sucks. I knew this last year but it’s just got worse.
- The iPad + stylus combo is not as precise as the graphics tablet, so I’m probably going back to that for illustrations. But I’d still love to do free-drawing with the iPad on the projector at some point.
- Not everyone can follow the deep-dive tech talks, but they still prefer them to introductory talks, maybe because they feel like they’re learning something (well, that’s my opinion).
The time has come, and I’m moving on from LMAX. I’ve had an incredible (nearly) four years working for one of the most radical finance firms in the world, during which time I feel I’ve learnt more than the rest of my work experience put together, and had the pleasure to work with some of the smartest and most interesting people I’ve ever met.
I’ve been invited to join 10gen and their MongoDB driver team, a challenge I am really looking forward to. After years in finance and in the IT departments of other organisations, I’m finally working for a product firm, and an open source one. I expect it will be very different from anything else I’ve been involved in.
I hope this means I will be blogging even more, and that I’ll have opportunities to abuse my graphics tablet producing more ridiculous scrawlings. I also hope this will give me an opportunity to meet more people as I travel around. So, as if this were a goodbye e-mail to the company or an out-of-office reply, I should finish with: any further enquiries about the Disruptor should be addressed to the Google Groups list - there are people on there waaay smarter than me anyway.
I’ve produced a very cut down version of the presentation I’ve been giving at a lot of conferences, giving a high level overview to the Disruptor. This serves as a quick intro to the concepts behind it.
My slides are usually pretty useless without me (or someone else) talking over them, so for more context don’t forget there’s always my original blog posts (the Magic Ring Buffer, Reading from it, Writing to it, Wiring it up), which are now pretty dated, and the Java Magazine article I wrote at the start of the year.
<div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-0-i_HcBbYEc/UG9JMFLgCxI/AAAAAAAALNk/Y-4DCaY6UHw/s1600/2012-10-01+07.37.51.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;">
<div class="separator" style="clear: both; text-align: center;">