Nodes frozen in GC

classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Nodes frozen in GC

ruslan usifov
Hello

I have cluster of 3 cassandra 0.7.3 nodes, and some times nodes a frozen. As i understand this because GC, in this moment i see

INFO [ScheduledTasks:1] 2011-03-05 15:21:23,524 GCInspector.java (line 128) GC for ConcurrentMarkSweep: 18052 ms, -997761672 reclaimed leaving 5796586088

How can i reduce this? I installed JNA (i install it correctly: "CLibrary.java (line 106) JNA mlockall successful" log record  in system.log present on all nodes), and set vm.swappiness = 0, but this doesn't help. Any suggest?

PS: Nodes have 12GB of memory with 6GB HeapSize
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes frozen in GC

aaron morton
Your node is under memory pressure, after the GC there is still 5.7GB in use. In fact it looks like memory usage went up during the GC process.

Can you reduce the memtable size, caches or the number of CF's or increase the JVM size? Also is this happening under heavy load ?

Aaron

On 6/03/2011, at 10:34 AM, ruslan usifov wrote:

> Hello
>
> I have cluster of 3 cassandra 0.7.3 nodes, and some times nodes a frozen. As i understand this because GC, in this moment i see
>
> INFO [ScheduledTasks:1] 2011-03-05 15:21:23,524 GCInspector.java (line 128) GC for ConcurrentMarkSweep: 18052 ms, -997761672 reclaimed leaving 5796586088
>
> How can i reduce this? I installed JNA (i install it correctly: "CLibrary.java (line 106) JNA mlockall successful" log record  in system.log present on all nodes), and set vm.swappiness = 0, but this doesn't help. Any suggest?
>
> PS: Nodes have 12GB of memory with 6GB HeapSize

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes frozen in GC

ruslan usifov


2011/3/6 aaron morton <[hidden email]>
Your node is under memory pressure, after the GC there is still 5.7GB in use. In fact it looks like memory usage went up during the GC process.

Can you reduce the memtable size, caches or the number of CF's or increase the JVM size? Also is this happening under heavy load ?

Yes I do bulk load to cluster


I reduce memtable to 64MB sutuation became better, but it is all the same very rare GC more than 15 Sec happens. Can reduce   flush_largest_memtables_at helps? O may be some GC options?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes frozen in GC

Peter Schuller
Do you have row cache enabled? Disable it. If it fixes it and you want
it, re-enable but consider row sizes and the cap on the cache size..
--
/ Peter Schuller
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes frozen in GC

ruslan usifov
In reply to this post by aaron morton


2011/3/6 aaron morton <[hidden email]>
Your node is under memory pressure, after the GC there is still 5.7GB in use. In fact it looks like memory usage went up during the GC process.

Can you reduce the memtable size, caches or the number of CF's or increase the JVM size? Also is this happening under heavy load ?


I have memtable size, and insert data into one CF, with biggest rowsize 1K, how it is possible that after GC all memory is load? Meybe this is memory leak in cassandra 0.7.3?

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes frozen in GC

aaron morton
It's always possible to run out of memory. Can you provide...

- number cf's and their Memtable settings
- any row or key cache settings
- any other buffer or memory settings you may have changed in Cassandra.yaml.
- what load you are putting on the cluster, e.g. Inserting x rows/columns per second with avg size y

Aaron

On 8/03/2011, at 2:39 AM, ruslan usifov <[hidden email]> wrote:



2011/3/6 aaron morton <[hidden email]>
Your node is under memory pressure, after the GC there is still 5.7GB in use. In fact it looks like memory usage went up during the GC process.

Can you reduce the memtable size, caches or the number of CF's or increase the JVM size? Also is this happening under heavy load ?


I have memtable size, and insert data into one CF, with biggest rowsize 1K, how it is possible that after GC all memory is load? Meybe this is memory leak in cassandra 0.7.3?

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes frozen in GC

ruslan usifov
Hello thanks for reply

2011/3/7 Aaron Morton <[hidden email]>
It's always possible to run out of memory. Can you provide...

- number cf's and their Memtable settings
1 CF with memtable 64MB, other settings as default

 
- any row or key cache settings
Its stay default e.g 200000, but i don't do any read, only writes

 
- any other buffer or memory settings you may have changed in Cassandra.yaml.
I change only binary_memtable_throughput_in_mb setting, and set it to 64MB


 
- what load you are putting on the cluster, e.g. Inserting x rows/columns per second with avg size y


I do bulk load of real data into cluster. In may case this do in 16 threads (8 process / per machine, on two machines), with avg insert speed 1000 per second per thread, so total is 16000 rows per second, with avg size of row 573 bytes
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes frozen in GC

Jonathan Ellis-3
In reply to this post by ruslan usifov
It sounds like you're complaining that the JVM sometimes does stop-the-world GC.

You can mitigate this but not (for most workloads) eliminate it with
GC option tuning.  That's simply the state of the art for Java garbage
collection right now.

On Sun, Mar 6, 2011 at 2:18 AM, ruslan usifov <[hidden email]> wrote:

>
>
> 2011/3/6 aaron morton <[hidden email]>
>>
>> Your node is under memory pressure, after the GC there is still 5.7GB in
>> use. In fact it looks like memory usage went up during the GC process.
>>
>> Can you reduce the memtable size, caches or the number of CF's or increase
>> the JVM size? Also is this happening under heavy load ?
>>
> Yes I do bulk load to cluster
>
>
> I reduce memtable to 64MB sutuation became better, but it is all the same
> very rare GC more than 15 Sec happens. Can reduce
> flush_largest_memtables_at helps? O may be some GC options?
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes frozen in GC

ruslan usifov


2011/3/8 Jonathan Ellis <[hidden email]>
It sounds like you're complaining that the JVM sometimes does stop-the-world GC.

You can mitigate this but not (for most workloads) eliminate it with
GC option tuning.  That's simply the state of the art for Java garbage
collection right now.


Hm, but what to do in this cases?? In these moments throughput of cluster degrade, and I misunderstand what workaround I must do to prevent this situations?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes frozen in GC

Paul Pak
In reply to this post by Jonathan Ellis-3
So, are you saying this is normal and expected from Cassandra?  So,
under load, we can expect java garbage collection to stop the Cassandra
process on that server from time to time, essentially taking out the
node for short periods of time while it does garbage collection?

Also, why is there so much garbage collection to begin with?  Memcache
uses a slab allocator to reuse blocks to prevent allocation/deallocation
of blocks from consuming all the cpu time.  Are there any plans to reuse
blocks so the garbage collector doesn't have to work so hard?

Paul

On 3/7/2011 4:35 PM, Jonathan Ellis wrote:

> It sounds like you're complaining that the JVM sometimes does stop-the-world GC.
>
> You can mitigate this but not (for most workloads) eliminate it with
> GC option tuning.  That's simply the state of the art for Java garbage
> collection right now.
>
> On Sun, Mar 6, 2011 at 2:18 AM, ruslan usifov <[hidden email]> wrote:
>>
>> 2011/3/6 aaron morton <[hidden email]>
>>> Your node is under memory pressure, after the GC there is still 5.7GB in
>>> use. In fact it looks like memory usage went up during the GC process.
>>>
>>> Can you reduce the memtable size, caches or the number of CF's or increase
>>> the JVM size? Also is this happening under heavy load ?
>>>
>> Yes I do bulk load to cluster
>>
>>
>> I reduce memtable to 64MB sutuation became better, but it is all the same
>> very rare GC more than 15 Sec happens. Can reduce
>> flush_largest_memtables_at helps? O may be some GC options?
>>
>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes frozen in GC

Paul Pak
In reply to this post by ruslan usifov
Hi Ruslan,

It looks like Jonathan and Stu have already been working to reduce garbage collection on v.8  The ticket is at https://issues.apache.org/jira/browse/CASSANDRA-2252

Jonathan, is there any way to apply the patch to .73 and have ruslan test it to see if it fixes his issue with Garbage collection?  Given the fact that he's already got a test case, it seems like a reasonable thing to do (with the caveat that he is a guinea pig for new code which may not be fully tested yet :) .

Paul

On 3/7/2011 4:50 PM, ruslan usifov wrote:


2011/3/8 Jonathan Ellis <[hidden email]>
It sounds like you're complaining that the JVM sometimes does stop-the-world GC.

You can mitigate this but not (for most workloads) eliminate it with
GC option tuning.  That's simply the state of the art for Java garbage
collection right now.


Hm, but what to do in this cases?? In these moments throughput of cluster degrade, and I misunderstand what workaround I must do to prevent this situations?

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes frozen in GC

Chris Goffinet
In reply to this post by ruslan usifov
Can you tell me how many SSTables on disk when you see GC pauses? In your 3 node cluster, what's the RF factor?

On Mon, Mar 7, 2011 at 1:50 PM, ruslan usifov <[hidden email]> wrote:


2011/3/8 Jonathan Ellis <[hidden email]>

It sounds like you're complaining that the JVM sometimes does stop-the-world GC.

You can mitigate this but not (for most workloads) eliminate it with
GC option tuning.  That's simply the state of the art for Java garbage
collection right now.


Hm, but what to do in this cases?? In these moments throughput of cluster degrade, and I misunderstand what workaround I must do to prevent this situations?

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes frozen in GC

ruslan usifov


2011/3/8 Chris Goffinet <[hidden email]>
Can you tell me how many SSTables on disk when you see GC pauses? In your 3 node cluster, what's the RF factor?


About 30-40, and i use RF=2, and insert rows with QUORUM consistency level
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes frozen in GC

Chris Goffinet
The rows you are inserting, what is your update ratio to those rows?


On Mon, Mar 7, 2011 at 4:03 PM, ruslan usifov <[hidden email]> wrote:


2011/3/8 Chris Goffinet <[hidden email]>

Can you tell me how many SSTables on disk when you see GC pauses? In your 3 node cluster, what's the RF factor?


About 30-40, and i use RF=2, and insert rows with QUORUM consistency level

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes frozen in GC

ruslan usifov


2011/3/8 Chris Goffinet <[hidden email]>
The rows you are inserting, what is your update ratio to those rows?

I doesn't update them only insert, with speed 16000 per second
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes frozen in GC

Chris Goffinet
How large are your SSTables on disk? My thought was because you have so many on disk, we have to store the bloom filter + every 128 keys from index in memory.

On Mon, Mar 7, 2011 at 4:35 PM, ruslan usifov <[hidden email]> wrote:


2011/3/8 Chris Goffinet <[hidden email]>
The rows you are inserting, what is your update ratio to those rows?

I doesn't update them only insert, with speed 16000 per second

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes frozen in GC

ruslan usifov


2011/3/8 Chris Goffinet <[hidden email]>
How large are your SSTables on disk? My thought was because you have so many on disk, we have to store the bloom filter + every 128 keys from index in memory.


0.5GB
 But as I understand store in memory happens only when read happens, i do only inserts. And i think that memory doesn't problem, because heap allocations looks like saw (in max Heap allocations get about 5,5 GB then they reduce to 2GB)


Also when i increase Heap Size to 7GB, situation stay mach better, but nodes frozen still happens, and in gc.log I steel see:

Total time for which application threads were stopped: 20.0686307 seconds

lines (right not so often, like before)
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes frozen in GC

David Boxenhorn-2
If RF=2 and CL= QUORUM, you're getting no benefit from replication. When a node is in GC it stops everything. Set RF=3, so when one node is busy the cluster will still work.

On Tue, Mar 8, 2011 at 11:46 AM, ruslan usifov <[hidden email]> wrote:


2011/3/8 Chris Goffinet <[hidden email]>
How large are your SSTables on disk? My thought was because you have so many on disk, we have to store the bloom filter + every 128 keys from index in memory.


0.5GB
 But as I understand store in memory happens only when read happens, i do only inserts. And i think that memory doesn't problem, because heap allocations looks like saw (in max Heap allocations get about 5,5 GB then they reduce to 2GB)


Also when i increase Heap Size to 7GB, situation stay mach better, but nodes frozen still happens, and in gc.log I steel see:

Total time for which application threads were stopped: 20.0686307 seconds

lines (right not so often, like before)

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes frozen in GC

ruslan usifov
In reply to this post by Paul Pak


2011/3/8 Paul Pak <[hidden email]>
Hi Ruslan,

It looks like Jonathan and Stu have already been working to reduce garbage collection on v.8  The ticket is at https://issues.apache.org/jira/browse/CASSANDRA-2252

Jonathan, is there any way to apply the patch to .73 and have ruslan test it to see if it fixes his issue with Garbage collection?  Given the fact that he's already got a test case, it seems like a reasonable thing to do (with the caveat that he is a guinea pig for new code which may not be fully tested yet :) .

Thanks Paul for information, it very informative. But how this work at others. I see that situation stay better when i increase heap size to 7GB, but nodes still frozen truth not so frequent as before, but still, also but I don;t probe this i think that setting -Xmn to something bigger then 800M may give benefit (http://java.sun.com/performance/reference/whitepapers/tuning.html)

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes frozen in GC

Peter Schuller
In reply to this post by Paul Pak
> So, are you saying this is normal and expected from Cassandra?  So,
> under load, we can expect java garbage collection to stop the Cassandra
> process on that server from time to time, essentially taking out the
> node for short periods of time while it does garbage collection?

This thread is getting out of hand and out into la-la-land. Original
poster: If you want to skip some rantings of mine, skip to the end and
do (1) and get the results to the list.

First of all, re-checking the history, it seems the only concrete
information is:

INFO [ScheduledTasks:1] 2011-03-05 15:21:23,524 GCInspector.java (line
128) GC for ConcurrentMarkSweep: 18052 ms, -997761672 reclaimed
leaving 5796586088

This indicates that a concurrent mark/sweep GC took 18 seconds. That
may or may not be a bit high for the heap size, but regardless, the
CMS is not a stop-the-world pause. It involves some stop-the-world
pauses, but those 18 seconds aren't it.

I still don't see anything that tells what's actually going on in the
OP's case. But the fact that the heap grew rather than shrunk as a
result of the GC cycle suggests that something is indeed wrong.
Probably the heap is too full, as has already been suggested, and the
question is just why. Probably *something* is tweaked incorrectly for
the heap size, be that row cache, memtable flush thresholds, etc. Or
there's a bug. But there seems to be a distinct lack of information
and a distinct non-lack of random speculating and GC blaming.

CASSANDRA-2252 as was linked to is *not* a magic fix for this.

A lot can be said about garbage collection techniques, but the point
is that the point of the CMS collector is to avoid the need for long
stop-the-world pauses. Some are still required, but they are supposed
to normally be short. For some workloads, you eventually reach a point
where fragmentation in the old generation causes the need do to a full
stop-the-world pause while the entire heap is compacted. This *does*
result in a long uninterrupted pause, if/when it happens.

Usually, it happens because you actually have too much live data on
the heap. That is entirely different from having a reasonable workload
that is still not handled by the GC in a sensible fashion.

Is it possible that everything is configured correctly and the OP is
triggering a bug or just triggering a sufficiently poor behavior of
CMS such that the freezes are due to unavoidable periodic compactions?
Yes. Do we have nearly enough information to know? No. Should we
assume it is the GC/JVM:s fault before having such information, given
that lots of people run Cassandra without triggering this to the
extent this seems to imply? No.

I would suggest to the OP:

(1) I cannot stress this one enough: Run with -XX:+PrintGC
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps and collect the output.
(2) Attach to your process with jconsole or some similar tool.
(3) Observe the behavior of the heap over time. Preferably post
screenshots so others can look at them.

(1) in particular is very important. It's completely useless to be
speculating about details and making sweeping statements when all
indications so far indicate that there is too much live data on the
heap, when there is not even the results of (1) to go by.

(1) will give you output which shows when different stages of GC
trigger and information about heap sizes etc. It will also print the
reason for fallback to full GC, such as a promotion failure. One can
usually observe fairly well what lead up to such a fallback and draw
conclusions. It will also show which stages took what amount of time
(and not all of them are stop-the-world).

--
/ Peter Schuller
12
Loading...