Cassandra memory footprint

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Cassandra memory footprint

Huming Wu
I am currently doing some test on cassandra (0.3.0-final). two nodes
with each node
has 8G ram and 8 core cpus. And here are some setting from my
storage-conf.xml:

<ReplicationFactor>2</ReplicationFactor>
<ColumnIndexSizeInKB>256</ColumnIndexSizeInKB>
<MemtableSizeInMB>1024</MemtableSizeInMB>
<MemtableObjectCountInMillions>2</MemtableObjectCountInMillions>

The test data I have has about 880K unique keys and my test program
simply inserts the same 5 columns into Table1.Standard1 using
thrift.batch_insert. For each key, the record size ranges from 21
bytes to 5K with the average being 40 bytes. The program calls
batch_insert repeatedly - 4 million times with 50 concurrent thrift
connections (about 220MB data excluding keys is sent to cassandra).
What I see was basically the JAVA resident memory grows to the GC
limit (6G) and everything just halts after that. If I restarted
cassandra I can see the footprint is around 1.9G and I can do insert
again but the memory keeps growing and so on. Here is my JVM setting:

-Xmx6000m -Xms6000m -XX:+HeapDumpOnOutOfMemoryError -XX:NewSize=1000m
-XX:MaxNewSize=1000m -XX:SurvivorRatio=8 -XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode -verbose:gc -XX:+PrintHeapAtGC
-XX:+PrintGCDetails -Xloggc:gc.log

Here is the jmap output (top 10 objects):

num   #instances    #bytes  class name
--------------------------------------

 1:   1436005   658220304  [Ljava.lang.Object;
 2:  12100491   484019640  java.lang.String
 3:   9904511   437577600  [C
 4:   2709812   322398784  [I
 5:   5607988   224319520  java.util.concurrent.ConcurrentSkipListMap$Node
 6:   4469810   214550880  org.apache.cassandra.db.Column
 7:   3339219   213710016  org.cliffc.high_scale_lib.ConcurrentAutoTable$CAT
 8:   3339230   191142200  [J
 9:   4506140   179147648  [B
 10:   2220024   106561152  java.util.concurrent.ConcurrentSkipListMap$HeadIndex

Does anyone have any idea why cassandra uses so much memory? From
gc.log I do see gc has kicked in many times (but not major compaction
though). I'd expect that with this small data set everything would
just work fine with the avail. memory (I mean the test should just go
on for weeks).

Any suggestion?

Thanks,
Huming
Reply | Threaded
Open this post in threaded view
|

Re: Cassandra memory footprint

Jonathan Ellis-3
I turns out there were several bugs that make 0.3 run out of memory
during sustained insert.  These are fixed in trunk, which is almost
stable (#233 is the last disk format change, and will be committed as
soon as review is done).

-Jonathan

On Mon, Aug 10, 2009 at 7:20 PM, Huming Wu<[hidden email]> wrote:

> I am currently doing some test on cassandra (0.3.0-final). two nodes
> with each node
> has 8G ram and 8 core cpus. And here are some setting from my
> storage-conf.xml:
>
> <ReplicationFactor>2</ReplicationFactor>
> <ColumnIndexSizeInKB>256</ColumnIndexSizeInKB>
> <MemtableSizeInMB>1024</MemtableSizeInMB>
> <MemtableObjectCountInMillions>2</MemtableObjectCountInMillions>
>
> The test data I have has about 880K unique keys and my test program
> simply inserts the same 5 columns into Table1.Standard1 using
> thrift.batch_insert. For each key, the record size ranges from 21
> bytes to 5K with the average being 40 bytes. The program calls
> batch_insert repeatedly - 4 million times with 50 concurrent thrift
> connections (about 220MB data excluding keys is sent to cassandra).
> What I see was basically the JAVA resident memory grows to the GC
> limit (6G) and everything just halts after that. If I restarted
> cassandra I can see the footprint is around 1.9G and I can do insert
> again but the memory keeps growing and so on. Here is my JVM setting:
>
> -Xmx6000m -Xms6000m -XX:+HeapDumpOnOutOfMemoryError -XX:NewSize=1000m
> -XX:MaxNewSize=1000m -XX:SurvivorRatio=8 -XX:+UseConcMarkSweepGC
> -XX:+CMSIncrementalMode -verbose:gc -XX:+PrintHeapAtGC
> -XX:+PrintGCDetails -Xloggc:gc.log
>
> Here is the jmap output (top 10 objects):
>
> num   #instances    #bytes  class name
> --------------------------------------
>
>  1:   1436005   658220304  [Ljava.lang.Object;
>  2:  12100491   484019640  java.lang.String
>  3:   9904511   437577600  [C
>  4:   2709812   322398784  [I
>  5:   5607988   224319520  java.util.concurrent.ConcurrentSkipListMap$Node
>  6:   4469810   214550880  org.apache.cassandra.db.Column
>  7:   3339219   213710016  org.cliffc.high_scale_lib.ConcurrentAutoTable$CAT
>  8:   3339230   191142200  [J
>  9:   4506140   179147648  [B
>  10:   2220024   106561152  java.util.concurrent.ConcurrentSkipListMap$HeadIndex
>
> Does anyone have any idea why cassandra uses so much memory? From
> gc.log I do see gc has kicked in many times (but not major compaction
> though). I'd expect that with this small data set everything would
> just work fine with the avail. memory (I mean the test should just go
> on for weeks).
>
> Any suggestion?
>
> Thanks,
> Huming
>