Cassandra hardware setup

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Cassandra hardware setup

Scott Chacon
We're playing with Cassandra and would like to get a test cluster
setup for evaluation.  I've been playing with it on my laptop and EC2,
which are the resources easily available to me, but not that close to
what I would be using in a production environment.

What would be an ideal machine setup for a Cassandra node?  At least
two separate physical disks, one for the commit log and another for
the data, no RAID, 8-16G memory? (I think that's what Evan recommended
in his blog post)  Does anyone have any configuration stories/setups
they're willing to share on what has worked well and possibly what has
not?  Partitioning decisions and config?

Sans helpful pointers, I'll probably try setting it up on a few Dell
2950 IIs / 16G ram / 2x300G 15k SAS drives.  Curious what others have
done and any lessons learned.

Thanks,
Scott
Reply | Threaded
Open this post in threaded view
|

Re: Cassandra hardware setup

Jonathan Ellis-3
On Tue, Aug 25, 2009 at 7:07 PM, Scott Chacon<[hidden email]> wrote:
> We're playing with Cassandra and would like to get a test cluster
> setup for evaluation.  I've been playing with it on my laptop and EC2,
> which are the resources easily available to me, but not that close to
> what I would be using in a production environment.

Yeah, EC2 is io hell.

Jason at slicehost thinks their VMs and cloud servers' should post
better numbers, fwiw, but it's still going to be best to run on
non-virtualized hardware.

> What would be an ideal machine setup for a Cassandra node?  At least
> two separate physical disks, one for the commit log and another for
> the data, no RAID, 8-16G memory? (I think that's what Evan recommended
> in his blog post)

Right.  Right now you don't get a huge win on writes from going over
commitlog disk + 1 disk per heavily-written columnfamily, but if you
can get to 3 or 4 without much extra cost (e.g. staying in 1U) it will
help read seeks linearly, on average.  So more is better.

If you do have multiple data disks, I would expect JBOD to work better
than raid0-ing things.  (Mostly a wash on writes, but better read
performance.)  But I don't know if anyone has actually tested this.

Digg is running on 16GB machines with a 10 GB heap size, which you can
set in cassandra.in.sh.

Cassandra defaults are tuned so that you can test things out on a 1GB
heap without OOM-ing.  Look in the peformance section; the main ones
are

MemtableSizeInMB
MemtableObjectCountInMillions

If you are also using a 10GB heap then you can just multiply these by
10 as a first step.

If you are using 8 cores you probably want to double ConcurrentReads too.

-Jonathan
Reply | Threaded
Open this post in threaded view
|

Re: Cassandra hardware setup

Jonathan Ellis-3
In reply to this post by Scott Chacon
How did your testing go?  Any more questions?

-Jonathan

On Tue, Aug 25, 2009 at 7:07 PM, Scott Chacon <[hidden email]> wrote:

> We're playing with Cassandra and would like to get a test cluster
> setup for evaluation.  I've been playing with it on my laptop and EC2,
> which are the resources easily available to me, but not that close to
> what I would be using in a production environment.
>
> What would be an ideal machine setup for a Cassandra node?  At least
> two separate physical disks, one for the commit log and another for
> the data, no RAID, 8-16G memory? (I think that's what Evan recommended
> in his blog post)  Does anyone have any configuration stories/setups
> they're willing to share on what has worked well and possibly what has
> not?  Partitioning decisions and config?
>
> Sans helpful pointers, I'll probably try setting it up on a few Dell
> 2950 IIs / 16G ram / 2x300G 15k SAS drives.  Curious what others have
> done and any lessons learned.
>
> Thanks,
> Scott
>