Monday, July 29, 2013

Cassandra vs MongoDB: The basics

Cassandra and MongoDB are two of the more popular NoSQL databases. I've been using Cassandra extensively over the past 6 months and I've recently started using MongoDB. Here is a brief description of the two, I'll follow up this post with a deeper comparison of the more advanced features.


Cassandra was originally created by Facebook and is written in Java, however it is now a Apache project. Traditionally Cassandra can be thought of as a column orientated database or a row orientated database depending on how you use columns. Each row is uniquely identified by a row key, like a primary key in a relational database. Unlike a relational database each row can have a different set of columns and it is common to use both the column name and the column value to store data. Rows are contained bya  column family which can be thought of as a table.

Client's use the thrift transport protocol and queries look like:

set Person['chbatey']['fname'] = 'Chris Batey';

Where Person is the column family, chbatey is the row key, fname is the column name and "Chris Batey" is the column value. Column names are dynamic so a client can store any key/value pairs. In this sense Cassandra is quite schemaless.

Then came Cassandra 1.* and CQL 3. Cassandra Query Language (CQL) is a SQL like language for Cassandra. Suddenly Cassandra, from a client's perspective, become much more like a relational database. Queries now look like this:

insert into Person(fname) values ('chbatey')

Using CQL3 there are no more dynamic column names and you create tables rather than column families (however the map type basically gives the same functionality). It's all still column families under the covers, CQL3 is just a very nice abstraction (a simplification). 

Cassandra appears to be moving away from a thrift protocol and moving to a proprietary protocol referred to as a native protocol. 

Overall Cassandra is quite a "rough around the edges" database to use (less so with CQL3) from a client perspective. It's real power comes from its horizontal scalability and tuneable eventual consistency. More on this in a future post.


MongoDB is a document database written in C++. Document databases are very intuitive as you simply store and retrieve documents! No crazy data model to learn, for MongoDB you simply store and retrieve JSON (BSON) objects.

Storing looks like this:{_id: 'chbatey', fname:'Chris Batey'})

Retrieving looks like this:

db.people.find({_id: 'chbatey'})


MongoDB has a very rich JSON based querying language and a fantastic aggregation framework. From a client's perspective MongoDB is a vastly more featured database with support for ad-hoc querying (Cassandra you must index everything you want to search by). 


This post was a very brief description of Cassandra and MongoDB. In future posts I will compare:
  • Fault tolerance - replication
  • Read and write consistency
  • Clients
  • Hadoop support
Particularly for Cassandra it is very important how your data centre and Cassandra cluster are laid out as to which read and write consistency levels you need to get the desired behaviour. 

Wednesday, July 24, 2013

21st July 2013: What's happening?

I am always looking to improve as a Software Engineer. To keep track of what I'm working on I've broken it down to the following categories:
  • Languages: My day job is primarily Java so I like to use other languages for everything else.
  • Frameworks: Usually tightly coupled to a language but becoming less so - especially for JVM based languages.
  • Databases: The world is changing. No longer can you get away with relational database / SQL knowledge
  • Craftsmanship: How do I go about producing better, more maintainable software as well as helping those around me to do the same.
  • General knowledge: Keeping up with technological world takes some doing. I try to read a few articles a day and listen to podcasts.
I won't work on each category every week. Here's what I've been doing the last week:


At work I'm a complete Java head. Over the past few years I've primarily developed standalone multi-threaded server applications for financial companies. More recently I've been developing cloud based applications so been doing a lot more Java development where it is deployed to a container e.g tomcat.

For this reason, when not at work, I am completely avoiding Java. This week I've been learning to test Java applications using Groovy (ok ok so I didn't leave Java complete behind!) and been learning to unit test the logic in Gradle scripts using GroovyTest.

In addition to Groovy I've been working on Python this week. If you live in London you might be aware you can get Transport for London to send you your travel statements in CSV format. I've been writing a Python application to parse these and work out how much money I spend commuting to work. Blog posts coming about this but initial version on github:


Having worked with Cassandra a lot over the last six months I'm now exploring MongoDB. Leaving the relational world for the NoSQL world has been a great learning experience this year. I'll put up a comparison for Cassandra vs MongoDB soon. Cassandra is such a low level, developer must understand everything database so MongoDB is quite refreshing!


I've started doing katas again the last few weeks. I've started with sorting algorithms. I'm doing this quite quickly and in Python to further solidify my knowledge of the language. Here's merge sort: quicksort coming!

General Knowledge

Started going through the backlog of programming throwdown last few weeks: Not bad listening for the train, though I wish they spoke about games less!

Thursday, July 11, 2013

Mergesort in Python

My train was cancelled today and as I am trying to cement my knowledge of python  I decided to do mergesort in python from memory. I find when adding new languages to my toolkit It is easy to forget how to setup a quick project with unit tests so I find it useful to do regular little projects from scratch Here's the code:

 And hear's the unit tests (of course the unit tests came first!):

I really like python and its unit testing framework. So simple to get going and for doing TDD.

Tuesday, July 9, 2013

Uncle Bob: Automated acceptance tests

Yesterday I went to see Uncle Bob at the Skills Matter Exchange in London. Having read and enjoyed Clean Code and Clean Coder it was great to see Uncle Bob in the flesh.

The talk was on automated acceptance tests. Such a simple topic - we all automate our acceptance tests don't we?

A few points I took away from the talk:

  • Can we get our stakeholders to write our acceptance tests? If not is it at least business analysts or QAs? If it is developers you're in trouble!
  • Developers are brilliant at rationalising about concepts such as "is it done?". Don't trust a developer to tell you something is done!
  • Acceptance tests should be automated at the latest half way through an iteration if your QAs are going to have time to do exploratory testing
  • The QAs should be the smartest people in your team. They should be specifying the system with the business analyst not verifying it after it has been developed
  • Your functional tests are just another component of the system. If that part of brittle it means your project is badly designed! Go back to the drawing board.

A final point that stuck with me is that acceptance tests don't need to be black box tests. The language they are written in should be high level (it was your stake holder who wrote it right??). But the implementation could interact with a version of your system that has the database or HTTP layer mocked out. Think of it this way:
  • How many times do you need to test the login functionality of your application? Once!
  • How many times will you test it if all your tests go through the GUI/web front end? Hundreds!
Hearing Uncle Bob speak reminds me that even when I am working on a project I think is being developed in a fantastic way, with fantastic testing - I can still try and make it better.