OOM while reading key cache

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

OOM while reading key cache

olek.stasiak@gmail.com
Hello,
I'm facing OOM on reading key_cache
Cluster conf is as follows:
-6 machines which 8gb RAM each and three 150GB disks each
-default heap configuration
-deafult key cache configuration
-the biggest keyspace has abt 500GB size (RF: 2, so in fact there is
250GB of raw data).

After upgrading first of the machines from 1.2.11 to 2.0.2 i've recieved error:
 INFO [main] 2013-11-08 10:53:16,716 AutoSavingCache.java (line 114)
reading saved cache
/home/synat/nosql_filesystem/cassandra/data/saved_caches/production_storage-METADATA-KeyCache-b.db
ERROR [main] 2013-11-08 10:53:16,895 CassandraDaemon.java (line 478)
Exception encountered during startup
java.lang.OutOfMemoryError: Java heap space
        at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:394)
        at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
        at org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:352)
        at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119)
        at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:264)
        at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:409)
        at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:381)
        at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:314)
        at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:268)
        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:274)
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461)
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504)


Error appears every start, so I've decided to disable key cache (this
was not helpful) and temporarily moved key cache out of cache folder
(file was of size 13M). That helps in starting node, but this is only
workaround and it's not demanded configuration. Anyone has any idea
what is the real cause of problem with oom?
best regards
Aleksander
ps. I've still 5 nodes to upgrade, I'll inform if problem apperas on the rest.
Reply | Threaded
Open this post in threaded view
|

Re: OOM while reading key cache

aaron morton
-6 machines which 8gb RAM each and three 150GB disks each
-default heap configuration
With 8GB the default heap is 2GB, try kicking that up to 4GB and a 600 to 800 MB new heap. 

I would guess for the data load  you have 2GB is not enough. 

hope that helps. 

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting

On 8/11/2013, at 11:31 pm, [hidden email] wrote:

Hello,
I'm facing OOM on reading key_cache
Cluster conf is as follows:
-6 machines which 8gb RAM each and three 150GB disks each
-default heap configuration
-deafult key cache configuration
-the biggest keyspace has abt 500GB size (RF: 2, so in fact there is
250GB of raw data).

After upgrading first of the machines from 1.2.11 to 2.0.2 i've recieved error:
INFO [main] 2013-11-08 10:53:16,716 AutoSavingCache.java (line 114)
reading saved cache
/home/synat/nosql_filesystem/cassandra/data/saved_caches/production_storage-METADATA-KeyCache-b.db
ERROR [main] 2013-11-08 10:53:16,895 CassandraDaemon.java (line 478)
Exception encountered during startup
java.lang.OutOfMemoryError: Java heap space
       at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:394)
       at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
       at org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:352)
       at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119)
       at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:264)
       at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:409)
       at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:381)
       at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:314)
       at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:268)
       at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
       at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
       at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:274)
       at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461)
       at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504)


Error appears every start, so I've decided to disable key cache (this
was not helpful) and temporarily moved key cache out of cache folder
(file was of size 13M). That helps in starting node, but this is only
workaround and it's not demanded configuration. Anyone has any idea
what is the real cause of problem with oom?
best regards
Aleksander
ps. I've still 5 nodes to upgrade, I'll inform if problem apperas on the rest.

Reply | Threaded
Open this post in threaded view
|

Re: OOM while reading key cache

Tom van den Berge
I'm having the same problem, after upgrading from 1.2.3 to 1.2.10. 

I can remember this was a bug that was solved in the 1.0 or 1.1 version some time ago, but apparently it got back.
A workaround is to delete the contents of the saved_caches directory before starting up.


Tom


On Tue, Nov 12, 2013 at 5:15 AM, Aaron Morton <[hidden email]> wrote:
-6 machines which 8gb RAM each and three 150GB disks each
-default heap configuration
With 8GB the default heap is 2GB, try kicking that up to 4GB and a 600 to 800 MB new heap. 

I would guess for the data load  you have 2GB is not enough. 

hope that helps. 

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting

On 8/11/2013, at 11:31 pm, [hidden email] wrote:

Hello,
I'm facing OOM on reading key_cache
Cluster conf is as follows:
-6 machines which 8gb RAM each and three 150GB disks each
-default heap configuration
-deafult key cache configuration
-the biggest keyspace has abt 500GB size (RF: 2, so in fact there is
250GB of raw data).

After upgrading first of the machines from 1.2.11 to 2.0.2 i've recieved error:
INFO [main] 2013-11-08 10:53:16,716 AutoSavingCache.java (line 114)
reading saved cache
/home/synat/nosql_filesystem/cassandra/data/saved_caches/production_storage-METADATA-KeyCache-b.db
ERROR [main] 2013-11-08 10:53:16,895 CassandraDaemon.java (line 478)
Exception encountered during startup
java.lang.OutOfMemoryError: Java heap space
       at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:394)
       at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
       at org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:352)
       at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119)
       at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:264)
       at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:409)
       at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:381)
       at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:314)
       at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:268)
       at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
       at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
       at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:274)
       at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461)
       at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504)


Error appears every start, so I've decided to disable key cache (this
was not helpful) and temporarily moved key cache out of cache folder
(file was of size 13M). That helps in starting node, but this is only
workaround and it's not demanded configuration. Anyone has any idea
what is the real cause of problem with oom?
best regards
Aleksander
ps. I've still 5 nodes to upgrade, I'll inform if problem apperas on the rest.





Reply | Threaded
Open this post in threaded view
|

Re: OOM while reading key cache

Robert Coli-3

On Wed, Nov 13, 2013 at 12:35 AM, Tom van den Berge <[hidden email]> wrote:
I'm having the same problem, after upgrading from 1.2.3 to 1.2.10. 

I can remember this was a bug that was solved in the 1.0 or 1.1 version some time ago, but apparently it got back.
A workaround is to delete the contents of the saved_caches directory before starting up.

Yours is not the first report of this I've heard resulting from a 1.2.x to 1.2.x upgrade. Reports are of the form "I had to nuke my saved_caches or <I couldn't start my node, it OOMED, etc.>".


Exists, but doesn't seem  to be the same issue.


Similar, doesn't seem to be an issue triggered by upgrade..

If I were one of the posters on this thread, I would strongly consider filing a JIRA on point.

@OP (olek) : did removing the saved_caches also fix your problem?

=Rob


Reply | Threaded
Open this post in threaded view
|

Re: OOM while reading key cache

olek.stasiak@gmail.com
Yes, as I wrote in first e-mail.  When I removed key cache file
cassandra started without further problems.
regards
Olek

2013/11/13 Robert Coli <[hidden email]>:

>
> On Wed, Nov 13, 2013 at 12:35 AM, Tom van den Berge <[hidden email]>
> wrote:
>>
>> I'm having the same problem, after upgrading from 1.2.3 to 1.2.10.
>>
>> I can remember this was a bug that was solved in the 1.0 or 1.1 version
>> some time ago, but apparently it got back.
>> A workaround is to delete the contents of the saved_caches directory
>> before starting up.
>
>
> Yours is not the first report of this I've heard resulting from a 1.2.x to
> 1.2.x upgrade. Reports are of the form "I had to nuke my saved_caches or <I
> couldn't start my node, it OOMED, etc.>".
>
> https://issues.apache.org/jira/browse/CASSANDRA-6325
>
> Exists, but doesn't seem  to be the same issue.
>
> https://issues.apache.org/jira/browse/CASSANDRA-5986
>
> Similar, doesn't seem to be an issue triggered by upgrade..
>
> If I were one of the posters on this thread, I would strongly consider
> filing a JIRA on point.
>
> @OP (olek) : did removing the saved_caches also fix your problem?
>
> =Rob
>
>
Reply | Threaded
Open this post in threaded view
|

Re: OOM while reading key cache

Fabien Rousseau
A few month ago, we've got a similar issue on 1.2.6 :

But it has been fixed and did not encountered this issue anymore (we're also on 1.2.10)


2013/11/14 [hidden email] <[hidden email]>
Yes, as I wrote in first e-mail.  When I removed key cache file
cassandra started without further problems.
regards
Olek

2013/11/13 Robert Coli <[hidden email]>:
>
> On Wed, Nov 13, 2013 at 12:35 AM, Tom van den Berge <[hidden email]>
> wrote:
>>
>> I'm having the same problem, after upgrading from 1.2.3 to 1.2.10.
>>
>> I can remember this was a bug that was solved in the 1.0 or 1.1 version
>> some time ago, but apparently it got back.
>> A workaround is to delete the contents of the saved_caches directory
>> before starting up.
>
>
> Yours is not the first report of this I've heard resulting from a 1.2.x to
> 1.2.x upgrade. Reports are of the form "I had to nuke my saved_caches or <I
> couldn't start my node, it OOMED, etc.>".
>
> https://issues.apache.org/jira/browse/CASSANDRA-6325
>
> Exists, but doesn't seem  to be the same issue.
>
> https://issues.apache.org/jira/browse/CASSANDRA-5986
>
> Similar, doesn't seem to be an issue triggered by upgrade..
>
> If I were one of the posters on this thread, I would strongly consider
> filing a JIRA on point.
>
> @OP (olek) : did removing the saved_caches also fix your problem?
>
> =Rob
>
>



--
Reply | Threaded
Open this post in threaded view
|

making sense of output from Eclipse Memory Analyzer tool taken from .hprof file

Mike Koh
I am investigating Java Out of memory heap errors. So I created an .hprof
file and loaded it into Eclipse Memory Analyzer Tool which gave some
"Problem Suspects".

First one looks like:
----
One instance of "org.apache.cassandra.db.ColumnFamilyStore" loaded by
"sun.misc.Launcher$AppClassLoader @ 0x613e1bdc8" occupies 984,094,664
(11.64%) bytes. The memory is accumulated in one instance of
"org.apache.cassandra.db.DataTracker$View" loaded by
"sun.misc.Launcher$AppClassLoader @ 0x613e1bdc8".
----

If I click around into the verbiage, I believe I can pick out the name of
a column family but that is about it. Can someone explain what the above
means in more detail and if it is indicative of a problem?


Next one looks like:
-----
•java.lang.Thread @ 0x73e1f74c8 CompactionExecutor:158 - 839,225,000
(9.92%) bytes.
•java.lang.Thread @ 0x717f08178 MutationStage:31 - 809,909,192 (9.58%) bytes.
•java.lang.Thread @ 0x717f082c8 MutationStage:5 - 649,667,472 (7.68%) bytes.
•java.lang.Thread @ 0x717f083a8 MutationStage:21 - 498,081,544 (5.89%) bytes.
•java.lang.Thread @ 0x71b357e70 MutationStage:11 - 444,931,288 (5.26%) bytes.
------
If I click into the verbiage, they above Compaction and Mutations all seem
to be referencing the same column family. Are the above related? Is there
a way I can tell more exactly what is being compacted and/or mutated more
specifically than which column family?
Reply | Threaded
Open this post in threaded view
|

Re: making sense of output from Eclipse Memory Analyzer tool taken from .hprof file

aaron morton
What version of cassandra are you using ?
What are the JVM settings? (check with ps aux | grep cassandra)



One instance of "org.apache.cassandra.db.ColumnFamilyStore" loaded by "sun.misc.Launcher$AppClassLoader @ 0x613e1bdc8" occupies 984,094,664 (11.64%) bytes.
938MB is a bit of memory, the CFS and data tracker are dealing with the memtable. This may indicate things are not being flushed from memory correctly. 

•java.lang.Thread @ 0x73e1f74c8 CompactionExecutor:158 - 839,225,000 (9.92%) bytes.
•java.lang.Thread @ 0x717f08178 MutationStage:31 - 809,909,192 (9.58%) bytes.
•java.lang.Thread @ 0x717f082c8 MutationStage:5 - 649,667,472 (7.68%) bytes.
•java.lang.Thread @ 0x717f083a8 MutationStage:21 - 498,081,544 (5.89%) bytes.
•java.lang.Thread @ 0x71b357e70 MutationStage:11 - 444,931,288 (5.26%) bytes.
maybe very big rows and/or very big mutations. 

hope that helps. 

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting

On 15/11/2013, at 12:34 pm, Mike Koh <[hidden email]> wrote:

I am investigating Java Out of memory heap errors. So I created an .hprof file and loaded it into Eclipse Memory Analyzer Tool which gave some "Problem Suspects".

First one looks like:
----
One instance of "org.apache.cassandra.db.ColumnFamilyStore" loaded by "sun.misc.Launcher$AppClassLoader @ 0x613e1bdc8" occupies 984,094,664 (11.64%) bytes. The memory is accumulated in one instance of "org.apache.cassandra.db.DataTracker$View" loaded by "sun.misc.Launcher$AppClassLoader @ 0x613e1bdc8".
----

If I click around into the verbiage, I believe I can pick out the name of a column family but that is about it. Can someone explain what the above means in more detail and if it is indicative of a problem?


Next one looks like:
-----
•java.lang.Thread @ 0x73e1f74c8 CompactionExecutor:158 - 839,225,000 (9.92%) bytes.
•java.lang.Thread @ 0x717f08178 MutationStage:31 - 809,909,192 (9.58%) bytes.
•java.lang.Thread @ 0x717f082c8 MutationStage:5 - 649,667,472 (7.68%) bytes.
•java.lang.Thread @ 0x717f083a8 MutationStage:21 - 498,081,544 (5.89%) bytes.
•java.lang.Thread @ 0x71b357e70 MutationStage:11 - 444,931,288 (5.26%) bytes.
------
If I click into the verbiage, they above Compaction and Mutations all seem to be referencing the same column family. Are the above related? Is there a way I can tell more exactly what is being compacted and/or mutated more specifically than which column family?