Anybody experience one Cassandra server locking up?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Anybody experience one Cassandra server locking up?

Brian Frank Cooper

Hi folks,

 

I have been loading a 6-server Cassandra cluster with 1KB records. After a few million inserts, the insert rate drops dramatically. After investigation, one of the Cassandra servers seems to be in a bad state, using 100% of one core on an 8-core machine, and 0% on the other cores. Inserts to this box have completely stopped, and the inserts to the other boxes have slowed way down (more than a factor of 10 slower.) A “kill” or “kill -3” to the bad java process does nothing; I have to use “kill -9” to stop it. Has anybody experienced anything like this?

 

Additional info:

 

The servers are 8 core, 8GB servers. I am running 64 bit java 1.6, and here are the JVM options:

 

# Arguments to pass to the JVM

JVM_OPTS=" \

        -ea \

        -Xdebug \

        -Xrunjdwp:transport=dt_socket,server=y,address=8888,suspend=n \

        -Xms128M \

        -Xmx6G \

        -XX:SurvivorRatio=8 \

        -XX:TargetSurvivorRatio=90 \

        -XX:+AggressiveOpts \

        -XX:+UseParNewGC \

        -XX:+UseConcMarkSweepGC \

        -XX:CMSInitiatingOccupancyFraction=1 \

        -XX:+CMSParallelRemarkEnabled \

        -XX:+HeapDumpOnOutOfMemoryError \

        -Dcom.sun.management.jmxremote.port=8080 \

        -Dcom.sun.management.jmxremote.ssl=false \

        -Dcom.sun.management.jmxremote.authenticate=false"

 

(standard options from the Cassandra distribution, except for the 6GB of heap space.)

 

Replication factor is 1 (this is just a test, not a production setup) and memtable size is set to 1GB.

 

Thanks…

 

brian

Reply | Threaded
Open this post in threaded view
|

Re: Anybody experience one Cassandra server locking up?

Jun Rao

Do you see any exceptions in the Cassandra log?

Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA 95120-6099

[hidden email]


Inactive hide details for Brian Frank Cooper ---08/18/2009 03:37:40 PM---Hi folks, I have been loading a 6-server Cassandra cluBrian Frank Cooper ---08/18/2009 03:37:40 PM---Hi folks, I have been loading a 6-server Cassandra cluster with 1KB records. After a few million ins


From:

Brian Frank Cooper <[hidden email]>

To:

"[hidden email]" <[hidden email]>

Date:

08/18/2009 03:37 PM

Subject:

Anybody experience one Cassandra server locking up?




Hi folks,

I have been loading a 6-server Cassandra cluster with 1KB records. After a few million inserts, the insert rate drops dramatically. After investigation, one of the Cassandra servers seems to be in a bad state, using 100% of one core on an 8-core machine, and 0% on the other cores. Inserts to this box have completely stopped, and the inserts to the other boxes have slowed way down (more than a factor of 10 slower.) A “kill” or “kill -3” to the bad java process does nothing; I have to use “kill -9” to stop it. Has anybody experienced anything like this?

Additional info:

The servers are 8 core, 8GB servers. I am running 64 bit java 1.6, and here are the JVM options:

# Arguments to pass to the JVM
JVM_OPTS=" \
-ea \
-Xdebug \
-Xrunjdwp:transport=dt_socket,server=y,address=8888,suspend=n \
-Xms128M \
-Xmx6G \
-XX:SurvivorRatio=8 \
-XX:TargetSurvivorRatio=90 \
-XX:+AggressiveOpts \
-XX:+UseParNewGC \
-XX:+UseConcMarkSweepGC \
-XX:CMSInitiatingOccupancyFraction=1 \
-XX:+CMSParallelRemarkEnabled \
-XX:+HeapDumpOnOutOfMemoryError \
-Dcom.sun.management.jmxremote.port=8080 \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.authenticate=false"

(standard options from the Cassandra distribution, except for the 6GB of heap space.)

Replication factor is 1 (this is just a test, not a production setup) and memtable size is set to 1GB.

Thanks…

brian

Reply | Threaded
Open this post in threaded view
|

Re: Anybody experience one Cassandra server locking up?

Jonathan Ellis-3
In reply to this post by Brian Frank Cooper
sounds like you are exhausting the memory on that instance and it is
going into "GC swap" trying to free enough to continue.  this is very
easy to do on 0.3 -- try upgrading to the 0.4 beta if you are using
0.3.

On Tue, Aug 18, 2009 at 3:36 PM, Brian Frank
Cooper<[hidden email]> wrote:

> Hi folks,
>
>
>
> I have been loading a 6-server Cassandra cluster with 1KB records. After a
> few million inserts, the insert rate drops dramatically. After
> investigation, one of the Cassandra servers seems to be in a bad state,
> using 100% of one core on an 8-core machine, and 0% on the other cores.
> Inserts to this box have completely stopped, and the inserts to the other
> boxes have slowed way down (more than a factor of 10 slower.) A “kill” or
> “kill -3” to the bad java process does nothing; I have to use “kill -9” to
> stop it. Has anybody experienced anything like this?
>
>
>
> Additional info:
>
>
>
> The servers are 8 core, 8GB servers. I am running 64 bit java 1.6, and here
> are the JVM options:
>
>
>
> # Arguments to pass to the JVM
>
> JVM_OPTS=" \
>
>         -ea \
>
>         -Xdebug \
>
>         -Xrunjdwp:transport=dt_socket,server=y,address=8888,suspend=n \
>
>         -Xms128M \
>
>         -Xmx6G \
>
>         -XX:SurvivorRatio=8 \
>
>         -XX:TargetSurvivorRatio=90 \
>
>         -XX:+AggressiveOpts \
>
>         -XX:+UseParNewGC \
>
>         -XX:+UseConcMarkSweepGC \
>
>         -XX:CMSInitiatingOccupancyFraction=1 \
>
>         -XX:+CMSParallelRemarkEnabled \
>
>         -XX:+HeapDumpOnOutOfMemoryError \
>
>         -Dcom.sun.management.jmxremote.port=8080 \
>
>         -Dcom.sun.management.jmxremote.ssl=false \
>
>         -Dcom.sun.management.jmxremote.authenticate=false"
>
>
>
> (standard options from the Cassandra distribution, except for the 6GB of
> heap space.)
>
>
>
> Replication factor is 1 (this is just a test, not a production setup) and
> memtable size is set to 1GB.
>
>
>
> Thanks…
>
>
>
> brian
Reply | Threaded
Open this post in threaded view
|

RE: Anybody experience one Cassandra server locking up?

Brian Frank Cooper
Probably you are right; after Jun's response I looked in the log and saw an out of memory exception. I'll try the 0.4 beta...

Thanks!

brian

-----Original Message-----
From: Jonathan Ellis [mailto:[hidden email]]
Sent: Wednesday, August 19, 2009 9:12 AM
To: [hidden email]
Subject: Re: Anybody experience one Cassandra server locking up?

sounds like you are exhausting the memory on that instance and it is
going into "GC swap" trying to free enough to continue.  this is very
easy to do on 0.3 -- try upgrading to the 0.4 beta if you are using
0.3.

On Tue, Aug 18, 2009 at 3:36 PM, Brian Frank
Cooper<[hidden email]> wrote:

> Hi folks,
>
>
>
> I have been loading a 6-server Cassandra cluster with 1KB records. After a
> few million inserts, the insert rate drops dramatically. After
> investigation, one of the Cassandra servers seems to be in a bad state,
> using 100% of one core on an 8-core machine, and 0% on the other cores.
> Inserts to this box have completely stopped, and the inserts to the other
> boxes have slowed way down (more than a factor of 10 slower.) A "kill" or
> "kill -3" to the bad java process does nothing; I have to use "kill -9" to
> stop it. Has anybody experienced anything like this?
>
>
>
> Additional info:
>
>
>
> The servers are 8 core, 8GB servers. I am running 64 bit java 1.6, and here
> are the JVM options:
>
>
>
> # Arguments to pass to the JVM
>
> JVM_OPTS=" \
>
>         -ea \
>
>         -Xdebug \
>
>         -Xrunjdwp:transport=dt_socket,server=y,address=8888,suspend=n \
>
>         -Xms128M \
>
>         -Xmx6G \
>
>         -XX:SurvivorRatio=8 \
>
>         -XX:TargetSurvivorRatio=90 \
>
>         -XX:+AggressiveOpts \
>
>         -XX:+UseParNewGC \
>
>         -XX:+UseConcMarkSweepGC \
>
>         -XX:CMSInitiatingOccupancyFraction=1 \
>
>         -XX:+CMSParallelRemarkEnabled \
>
>         -XX:+HeapDumpOnOutOfMemoryError \
>
>         -Dcom.sun.management.jmxremote.port=8080 \
>
>         -Dcom.sun.management.jmxremote.ssl=false \
>
>         -Dcom.sun.management.jmxremote.authenticate=false"
>
>
>
> (standard options from the Cassandra distribution, except for the 6GB of
> heap space.)
>
>
>
> Replication factor is 1 (this is just a test, not a production setup) and
> memtable size is set to 1GB.
>
>
>
> Thanks.
>
>
>
> brian
Reply | Threaded
Open this post in threaded view
|

Re: Anybody experience one Cassandra server locking up?

Sandeep Tata
Brian,

Are you guys planning to run workloads at Yahoo to compare Cassandra and PNUTS?
We'd be curious to see what you learn with the 0.4/trunk code.

Sandeep

On Wed, Aug 19, 2009 at 10:20 AM, Brian Frank
Cooper<[hidden email]> wrote:

> Probably you are right; after Jun's response I looked in the log and saw an out of memory exception. I'll try the 0.4 beta...
>
> Thanks!
>
> brian
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:[hidden email]]
> Sent: Wednesday, August 19, 2009 9:12 AM
> To: [hidden email]
> Subject: Re: Anybody experience one Cassandra server locking up?
>
> sounds like you are exhausting the memory on that instance and it is
> going into "GC swap" trying to free enough to continue.  this is very
> easy to do on 0.3 -- try upgrading to the 0.4 beta if you are using
> 0.3.
>
> On Tue, Aug 18, 2009 at 3:36 PM, Brian Frank
> Cooper<[hidden email]> wrote:
>> Hi folks,
>>
>>
>>
>> I have been loading a 6-server Cassandra cluster with 1KB records. After a
>> few million inserts, the insert rate drops dramatically. After
>> investigation, one of the Cassandra servers seems to be in a bad state,
>> using 100% of one core on an 8-core machine, and 0% on the other cores.
>> Inserts to this box have completely stopped, and the inserts to the other
>> boxes have slowed way down (more than a factor of 10 slower.) A "kill" or
>> "kill -3" to the bad java process does nothing; I have to use "kill -9" to
>> stop it. Has anybody experienced anything like this?
>>
>>
>>
>> Additional info:
>>
>>
>>
>> The servers are 8 core, 8GB servers. I am running 64 bit java 1.6, and here
>> are the JVM options:
>>
>>
>>
>> # Arguments to pass to the JVM
>>
>> JVM_OPTS=" \
>>
>>         -ea \
>>
>>         -Xdebug \
>>
>>         -Xrunjdwp:transport=dt_socket,server=y,address=8888,suspend=n \
>>
>>         -Xms128M \
>>
>>         -Xmx6G \
>>
>>         -XX:SurvivorRatio=8 \
>>
>>         -XX:TargetSurvivorRatio=90 \
>>
>>         -XX:+AggressiveOpts \
>>
>>         -XX:+UseParNewGC \
>>
>>         -XX:+UseConcMarkSweepGC \
>>
>>         -XX:CMSInitiatingOccupancyFraction=1 \
>>
>>         -XX:+CMSParallelRemarkEnabled \
>>
>>         -XX:+HeapDumpOnOutOfMemoryError \
>>
>>         -Dcom.sun.management.jmxremote.port=8080 \
>>
>>         -Dcom.sun.management.jmxremote.ssl=false \
>>
>>         -Dcom.sun.management.jmxremote.authenticate=false"
>>
>>
>>
>> (standard options from the Cassandra distribution, except for the 6GB of
>> heap space.)
>>
>>
>>
>> Replication factor is 1 (this is just a test, not a production setup) and
>> memtable size is set to 1GB.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> brian
>
Reply | Threaded
Open this post in threaded view
|

RE: Anybody experience one Cassandra server locking up?

Brian Frank Cooper
We are trying to learn what we can about the performance of Cassandra. I hope to have some results to share publicly in the next couple of weeks.

The 0.4 version seems to have handled the insert load better, but is having trouble with a 50/50 read/write workload. One server again has a busy core with the other 7 cores (and the other servers) idle or near idle. Any ideas? The problem seems to come when we dial up the request rate made by the client; after a certain point, the achievable throughput slows way down, even lower than what we could have achieved with a lower request rate. (Incidentally, we are reading and writing 10 KB records; does the large data size have any impact?) And using top -H, it looks like it is one of the Java threads that is consistently busy. Maybe it is GC again.

I was hoping to chat with some of you Cassandra folks when we visited FB last week...perhaps we can grab coffee sometime and chat about these issues...

Thanks!

brian
________________________________________
From: Sandeep Tata [[hidden email]]
Sent: Wednesday, August 19, 2009 1:29 PM
To: [hidden email]
Subject: Re: Anybody experience one Cassandra server locking up?

Brian,

Are you guys planning to run workloads at Yahoo to compare Cassandra and PNUTS?
We'd be curious to see what you learn with the 0.4/trunk code.

Sandeep

On Wed, Aug 19, 2009 at 10:20 AM, Brian Frank
Cooper<[hidden email]> wrote:

> Probably you are right; after Jun's response I looked in the log and saw an out of memory exception. I'll try the 0.4 beta...
>
> Thanks!
>
> brian
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:[hidden email]]
> Sent: Wednesday, August 19, 2009 9:12 AM
> To: [hidden email]
> Subject: Re: Anybody experience one Cassandra server locking up?
>
> sounds like you are exhausting the memory on that instance and it is
> going into "GC swap" trying to free enough to continue.  this is very
> easy to do on 0.3 -- try upgrading to the 0.4 beta if you are using
> 0.3.
>
> On Tue, Aug 18, 2009 at 3:36 PM, Brian Frank
> Cooper<[hidden email]> wrote:
>> Hi folks,
>>
>>
>>
>> I have been loading a 6-server Cassandra cluster with 1KB records. After a
>> few million inserts, the insert rate drops dramatically. After
>> investigation, one of the Cassandra servers seems to be in a bad state,
>> using 100% of one core on an 8-core machine, and 0% on the other cores.
>> Inserts to this box have completely stopped, and the inserts to the other
>> boxes have slowed way down (more than a factor of 10 slower.) A "kill" or
>> "kill -3" to the bad java process does nothing; I have to use "kill -9" to
>> stop it. Has anybody experienced anything like this?
>>
>>
>>
>> Additional info:
>>
>>
>>
>> The servers are 8 core, 8GB servers. I am running 64 bit java 1.6, and here
>> are the JVM options:
>>
>>
>>
>> # Arguments to pass to the JVM
>>
>> JVM_OPTS=" \
>>
>>         -ea \
>>
>>         -Xdebug \
>>
>>         -Xrunjdwp:transport=dt_socket,server=y,address=8888,suspend=n \
>>
>>         -Xms128M \
>>
>>         -Xmx6G \
>>
>>         -XX:SurvivorRatio=8 \
>>
>>         -XX:TargetSurvivorRatio=90 \
>>
>>         -XX:+AggressiveOpts \
>>
>>         -XX:+UseParNewGC \
>>
>>         -XX:+UseConcMarkSweepGC \
>>
>>         -XX:CMSInitiatingOccupancyFraction=1 \
>>
>>         -XX:+CMSParallelRemarkEnabled \
>>
>>         -XX:+HeapDumpOnOutOfMemoryError \
>>
>>         -Dcom.sun.management.jmxremote.port=8080 \
>>
>>         -Dcom.sun.management.jmxremote.ssl=false \
>>
>>         -Dcom.sun.management.jmxremote.authenticate=false"
>>
>>
>>
>> (standard options from the Cassandra distribution, except for the 6GB of
>> heap space.)
>>
>>
>>
>> Replication factor is 1 (this is just a test, not a production setup) and
>> memtable size is set to 1GB.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> brian
>
Reply | Threaded
Open this post in threaded view
|

Re: Anybody experience one Cassandra server locking up?

Jonathan Ellis-3
On Wed, Aug 19, 2009 at 5:19 PM, Brian Frank
Cooper<[hidden email]> wrote:
> We are trying to learn what we can about the performance of Cassandra. I hope to have some results to share publicly in the next couple of weeks.
>
> The 0.4 version seems to have handled the insert load better, but is having trouble with a 50/50 read/write workload. One server again has a busy core with the other 7 cores (and the other servers) idle or near idle. Any ideas?

Writes are serialized per columnfamily.  There are some ways we can
improve that but right now you may need multiple CFs to max write
throughput.  (Reads are not serialized like that though so I am a
little surprised that the idleness difference is so complete.)

If only one server is getting all the load something is wrong.  Is
that the server all your clients are connecting to?  It's designed to
have the clients spread around the cluster.

Or, are you using OrderPreservingPartitioner?  Load balancing won't be
in until 0.5 so unless you manually pick your tokens carefully and/or
do writes in a non-sequential manner one server will get all the keys.
 Or just use RandomPartitioner (which of course means giving up range
queries).

> (Incidentally, we are reading and writing 10 KB records; does the large data size have any impact?)

Since unlike a traditional K/V store you can update and retrieve
individual columns separately, most column sizes are for < 1KB but
10KB isn't totally unreasonable.

> I was hoping to chat with some of you Cassandra folks when we visited FB last week...perhaps we can grab coffee sometime and chat about these issues...

The FB guys haven't been involved with the OSS project for some time,
unfortunately.

-Jonathan
Reply | Threaded
Open this post in threaded view
|

RE: Anybody experience one Cassandra server locking up?

Brian Frank Cooper
Thanks for the detailed response. It is really helpful to understand what is going on behind the covers.

We are using "RandomPartitioner." However, I have noticed that some of the boxes have significantly more data (in /var/cassandra/data and /var/cassandra/commitlog) than others (like 30 X more). I have 50 client processes doing the read/write workload (and had 50 doing the load) but had them round-robined between servers. E.g. the cassandra server to connect to was clientid % 6.

I have been reading and writing a single column family, so perhaps that's part of the issue.

Incidentally, the system is quite fun to play with, and the startup is very easy (just start the nodes and they all find each other.) Writing the client (e.g. dealing with thrift) was much harder. I wonder whether a lot of users had tried to write C++ clients; this was a little non-trivial as the documentation I could find favors the java and PHP cases.

(I didn't realize the FB folks had disconnected from the OSS project.)

brian
________________________________________
From: Jonathan Ellis [[hidden email]]
Sent: Wednesday, August 19, 2009 5:32 PM
To: [hidden email]
Subject: Re: Anybody experience one Cassandra server locking up?

On Wed, Aug 19, 2009 at 5:19 PM, Brian Frank
Cooper<[hidden email]> wrote:
> We are trying to learn what we can about the performance of Cassandra. I hope to have some results to share publicly in the next couple of weeks.
>
> The 0.4 version seems to have handled the insert load better, but is having trouble with a 50/50 read/write workload. One server again has a busy core with the other 7 cores (and the other servers) idle or near idle. Any ideas?

Writes are serialized per columnfamily.  There are some ways we can
improve that but right now you may need multiple CFs to max write
throughput.  (Reads are not serialized like that though so I am a
little surprised that the idleness difference is so complete.)

If only one server is getting all the load something is wrong.  Is
that the server all your clients are connecting to?  It's designed to
have the clients spread around the cluster.

Or, are you using OrderPreservingPartitioner?  Load balancing won't be
in until 0.5 so unless you manually pick your tokens carefully and/or
do writes in a non-sequential manner one server will get all the keys.
 Or just use RandomPartitioner (which of course means giving up range
queries).

> (Incidentally, we are reading and writing 10 KB records; does the large data size have any impact?)

Since unlike a traditional K/V store you can update and retrieve
individual columns separately, most column sizes are for < 1KB but
10KB isn't totally unreasonable.

> I was hoping to chat with some of you Cassandra folks when we visited FB last week...perhaps we can grab coffee sometime and chat about these issues...

The FB guys haven't been involved with the OSS project for some time,
unfortunately.

-Jonathan
Reply | Threaded
Open this post in threaded view
|

Re: Anybody experience one Cassandra server locking up?

Jonathan Ellis-3
On Wed, Aug 19, 2009 at 6:51 PM, Brian Frank
Cooper<[hidden email]> wrote:
> We are using "RandomPartitioner." However, I have noticed that some of the boxes have significantly more data (in /var/cassandra/data and /var/cassandra/commitlog) than others (like 30 X more).

Ah, with small numbers of nodes you should manually space tokens
around the ring instead of having them pick one randomly.  You can do
this w/ the InitialToken directive before starting the node *for the
first time* (afterwards it stores it in the system keyspace, under
data/).  The digg guys should have a utility done soon to set it
post-start but that is the only option for now.

> Incidentally, the system is quite fun to play with, and the startup is very easy (just start the nodes and they all find each other.) Writing the client (e.g. dealing with thrift) was much harder.

Yeah, thrift is a pain.  It's the worst possible option except for all
the others. :)  (E.g. protocol buffers doesn't do RPC; avro only does
java/c/python, ...)  That's why you have more idiomatic clients for
python, ruby, scala, at the least.

> I wonder whether a lot of users had tried to write C++ clients

I think you're the first.  Why put yourself through that kind of pain
just to test things out? :)

-Jonathan
Reply | Threaded
Open this post in threaded view
|

RE: Anybody experience one Cassandra server locking up?

Brian Frank Cooper
> Ah, with small numbers of nodes you should manually space tokens
around the ring instead of having them pick one randomly.  

Excellent, this seems to be helping quite a lot with the throughput.


> I think you're the first.  Why put yourself through that kind of pain
just to test things out? :)

Its a good question :)

brian