Wednesday, October 2, 2013

Installing Cassandra 2.0 on Ubuntu

Update your apt source list with the following:

sudo vim /etc/apt/sources.list

#Add at the bottom
deb 20x main
deb-src 20x main

Run an apt-get update. 

sudo apt-get update

This will give you a warning about not being able to verify the signatures of the apache repos:

GPG error: unstable Release:
The following signatures couldn't be verified because the public key is not available:

Now do the following for that key:

gpg --keyserver --recv-keys 4BD736A82B5C1B00
gpg --export --armor 4BD736A82B5C1B00 | sudo apt-key add -

Also add this one:

gpg --keyserver --recv-keys 2B5C1B00
gpg --export --armor 2B5C1B00 | sudo apt-key add -

Now run apt-get update again.
sudo apt-get update

The error should be gone. Now check that all is working and UBuntu can see Cassandra 2.0:

apt-cache showpkg cassandra
Package: cassandra

Great! Now install it:

sudo apt-get install cassandra

Now start it:

sudo service cassandra start
xss = -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1001M -Xmx1001M -Xmn100M -XX:+HeapDumpOnOutOfMemoryError -Xss256k

Now you can check you can connect:
Connected to Test Cluster at localhost:9160.
[cqlsh 4.0.1 | Cassandra 2.0.1 | CQL spec 3.1.1 | Thrift protocol 19.37.0]
Use HELP for help.

Where is everything?

  • Logs: /var/log/cassandra
  • Config: /etc/cassandra/
  • Data: /var/lib/cassandra


Tuesday, October 1, 2013

Using Cassandra on Mac OSX

I posted some time ago about installing Cassandra on Mac OSX. Admittedly I generally use Linux when dealing with Cassandra but have recently been using it on Mac OSX again so here are some tips when working with Cassandra on ac OSX.

Install it with homebrew 

It's easy! The only reason for not using homebrew is if you want a specific version. I have an old blog post on installing it with homebrew here: install Cassandra on Mac OSX. If you want 1.2 rather than 2.0 read below first.

The default formula for Cassandra is now 2.0. If you aren't that cutting edge and want to stick to  Cassandra 1.2 then you need to do some tinkering. First off do a brew update & tap to the versions branch:

brew update
brew tap homebrew/versions

Now lets see what we get for cassandra:

brew search cassandra
cassandra      cassandra-0.6  cassandra12

Homebrew have kindly created three formulas you can work with: 0.6, 1.2 and the latest (currently 2.0). If you want 1.2 simply do:

brew install cassandra12 

Rather than brew install Cassandra. By default the brew installed Cassandra will use the same config/data locations for 1.2 and 2 so you can't (without work) use brew to manage multiple versions of Cassandra on your Mac - but if you want that you probably should use VMs instead.

Cassandra is installed: Where is everything?

All of this applies regardless of whether you're on Cassandra 1.2 or Cassandra 2.0. Package managers are great but sometimes they leave you baffled to where they put everything!

Where's my Cassandra yaml and other property files? /usr/local/etc/cassandra

Where's my logs? /usr/local/var/log/cassandra/
  • This can be updated by modifying /usr/local/etc/cassandra/
Where's the data/commit log etc (you may need to delete this when playing with different versions / partitioners) ? /usr/local/var/lib/cassandra/data

How do I stop and start Cassandra?

If you're used to unix services/init.d etc you'll want to know how to start/stop Cassandra without the kill command. On Mac this is launchd using the launchctl utility. Assuming you installed Cassandra using homebrew use the following commands:

launchctl start homebrew.mxcl.cassandra
launchctl stop homebrew.mxcl.cassandra

That's a lot of typing so I tend to alias these in my profile e.g

alias stop_cassandra="launchctl stop homebrew.mxcl.cassandra"
alias start_cassandra="launchctl start homebrew.mxcl.cassandra"

Cassandra: Datastax Java driver retry policy

The Datastax Java Driver for Cassandra exposes its strategy for retrying via the following interface:

There are three scenarios you can control retry policy for:
  1. Read time out: When a coordinator received the request and sent the read to replica(s) but the replica(s) did not respond in time
  2. Write timeout: As above but for writes
  3. Unavailable: When the coordinator is aware there aren't enough replica available without sending the read/write request on

What is the default behaviour?

The DefaultRetryPolicy retries with the following behaviour:
  1. Read timeout: When enough replica are available but the data did not come back within the configured read time out 
  2. Write timeout: Only if the initial phase of a batch write times out - see cassandra batch statement
  3. Unavailable timeout: Never

How do I configure the value for the read and write timeout?

This is configured in the cassandra.yaml on the Cassandra server.  The default is 10 seconds, you can change the following properties:
# How long the coordinator should wait for read operations to complete
read_request_timeout_in_ms: 10000
# How long the coordinator should wait for writes to complete
write_request_timeout_in_ms: 10000

What are the other policies?


The most complicated retry policy and comes with a big warning: your read/write may be re-tried at a lower consistency. So if you have business requirements to not report success if you don't meet a certain level of consistency then use this with cation.

What does it do?
  1. Read: If at least one replica responded then the read is retried at a lower consistency
  2. Write: Retries for unlogged batch queries when at least one replica responded (see and for all other types of writes the timeout is just ignored if at least one replica acknowledged the write (essentially ignoring the consistency request)
  3. Unavailable: If at least one replica is available then the query is re-tried with a lower consistency


No retrying! Any failure is re-thrown to the client.


This is just a decorator policy that you can wrap around any other policy that logs ignored (no retry) and any actual retries. The driver uses SLF4J and logs at INFO level.

How do I use a different policy?

Simply add it with creating your Cluster. The retry policies all have a singleton you can use e.g:


The Datastax driver is very open for extension as it exposes its strategies for retry, load balancing and reconnection. 

The retry policy is very easy to work with as all the current implementations are stateless. I'll follow this post up with how to implement your own retry policy.