Heap memory usage while writing

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Heap memory usage while writing

Anishek Agarwal
Hello, 

We have only on CF as

CREATE TABLE t1(id bigint, ts timestamp, definition text, primary key (id, ts))
with clustering order by (ts desc) and gc_grace_seconds=0 
and compaction = {'class': 'DateTieredCompactionStrategy', 'timestamp_resolution':'SECONDS', 'base_time_seconds':'20', 'max_sstable_age_days':'30'}
and compression={'sstable_compression' : ''}; 

on a single Node using the following in 

cassandra.yaml:::::
memtable_total_space_in_mb: 2048
commitlog_total_space_in_mb: 4096
memtable_flush_writers: 2
memtable_flush_queue_size: 1

cassandra-env.sh ::::
MAX_HEAP_SIZE="8G"
HEAP_NEWSIZE="5120M"
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=6"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m"
JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts"
JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode"
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalPacing"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCTimeStamps -verbose:gc"
JVM_OPTS="$JVM_OPTS -Xloggc:/home/anishek/apache-cassandra-2.0.13/logs/gc.log"
JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC"
JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution"


I am writing via 20 threads continuously to this table above. 
I see that some data keeps moving from the young generation to the older generation continuously.

I am wondering why this is happening. Given i am writing constantly and my young generation is more than twice the max mem table space used i would think only the young generation space would be used and nothing would ever go old generation. 

** System.log show no compactions happening.
** There are no read operations. 
** Cassandra version 2.0.13 on centos with 16 cores and 16 GB Ram

Thanks
Anishek
Reply | Threaded
Open this post in threaded view
|

Re: Heap memory usage while writing

Serj Veras
Hi,

You should increase MaxTenuringThreshold and CMSWaitDuration to keep your data in young generation longer (until the data will be flushed to disk).
Depending on your load, combine values of the next parameters: HEAP_NEWSIZE, memtable_total_space_in_mb, memtable_cleanup_threshold and your disk_throughput.
Ideally, only ParNewGC will work to collect ephemeral objects, and it will take very short delays. 


On 04/09/2015 09:30 AM, Anishek Agarwal wrote:
Hello, 

We have only on CF as

CREATE TABLE t1(id bigint, ts timestamp, definition text, primary key (id, ts))
with clustering order by (ts desc) and gc_grace_seconds=0 
and compaction = {'class': 'DateTieredCompactionStrategy', 'timestamp_resolution':'SECONDS', 'base_time_seconds':'20', 'max_sstable_age_days':'30'}
and compression={'sstable_compression' : ''}; 

on a single Node using the following in 

cassandra.yaml:::::
memtable_total_space_in_mb: 2048
commitlog_total_space_in_mb: 4096
memtable_flush_writers: 2
memtable_flush_queue_size: 1

cassandra-env.sh ::::
MAX_HEAP_SIZE="8G"
HEAP_NEWSIZE="5120M"
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=6"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m"
JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts"
JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode"
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalPacing"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCTimeStamps -verbose:gc"
JVM_OPTS="$JVM_OPTS -Xloggc:/home/anishek/apache-cassandra-2.0.13/logs/gc.log"
JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC"
JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution"


I am writing via 20 threads continuously to this table above. 
I see that some data keeps moving from the young generation to the older generation continuously.

I am wondering why this is happening. Given i am writing constantly and my young generation is more than twice the max mem table space used i would think only the young generation space would be used and nothing would ever go old generation. 

** System.log show no compactions happening.
** There are no read operations. 
** Cassandra version 2.0.13 on centos with 16 cores and 16 GB Ram

Thanks
Anishek

-- 
Thanks,
Serj
Reply | Threaded
Open this post in threaded view
|

Re: Heap memory usage while writing

Anishek Agarwal
Hey,

Any reason you think the MaxTenuringThreshold should be increased. I am pumping data at full capacity that a single nodes seems to take so all the data becomes stale soon enough (when its flushed), additionally the whole memtable can be in young generation only. There seems to be enough additional space to even hold the bloom filters for the respective SSTTAbles i would guess. 

I will try with the CMSWaitDuration that should help in reducing the CMS initial mark phase i think.

Though i am not sure what is getting moved to old generation continuously to fill it ?

Thanks for the pointers.

On Fri, Apr 10, 2015 at 12:12 PM, [hidden email] <[hidden email]> wrote:
Hi,

You should increase MaxTenuringThreshold and CMSWaitDuration to keep your data in young generation longer (until the data will be flushed to disk).
Depending on your load, combine values of the next parameters: HEAP_NEWSIZE, memtable_total_space_in_mb, memtable_cleanup_threshold and your disk_throughput.
Ideally, only ParNewGC will work to collect ephemeral objects, and it will take very short delays. 


On 04/09/2015 09:30 AM, Anishek Agarwal wrote:
Hello, 

We have only on CF as

CREATE TABLE t1(id bigint, ts timestamp, definition text, primary key (id, ts))
with clustering order by (ts desc) and gc_grace_seconds=0 
and compaction = {'class': 'DateTieredCompactionStrategy', 'timestamp_resolution':'SECONDS', 'base_time_seconds':'20', 'max_sstable_age_days':'30'}
and compression={'sstable_compression' : ''}; 

on a single Node using the following in 

cassandra.yaml:::::
memtable_total_space_in_mb: 2048
commitlog_total_space_in_mb: 4096
memtable_flush_writers: 2
memtable_flush_queue_size: 1

cassandra-env.sh ::::
MAX_HEAP_SIZE="8G"
HEAP_NEWSIZE="5120M"
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=6"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m"
JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts"
JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode"
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalPacing"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCTimeStamps -verbose:gc"
JVM_OPTS="$JVM_OPTS -Xloggc:/home/anishek/apache-cassandra-2.0.13/logs/gc.log"
JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC"
JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution"


I am writing via 20 threads continuously to this table above. 
I see that some data keeps moving from the young generation to the older generation continuously.

I am wondering why this is happening. Given i am writing constantly and my young generation is more than twice the max mem table space used i would think only the young generation space would be used and nothing would ever go old generation. 

** System.log show no compactions happening.
** There are no read operations. 
** Cassandra version 2.0.13 on centos with 16 cores and 16 GB Ram

Thanks
Anishek

-- 
Thanks,
Serj

Reply | Threaded
Open this post in threaded view
|

Re: Heap memory usage while writing

graham sanderson
In reply to this post by Serj Veras
I wasn’t involved, however I suspect the MaxTenuringThreshold is set to 1 (and you should leave it that way) because you should have eden configured large enough for anything which is truly ephemeral to only survive one young GC

Anything being promoted ideally are cached items, but of course you can never have the ideal.

I’m surprised by your config, since CMSIncrementalMode should only be used if you have very few cores, and I don’t see -XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways which should be in there

On Apr 10, 2015, at 1:42 AM, [hidden email] wrote:

Hi,

You should increase MaxTenuringThreshold and CMSWaitDuration to keep your data in young generation longer (until the data will be flushed to disk).
Depending on your load, combine values of the next parameters: HEAP_NEWSIZE, memtable_total_space_in_mb, memtable_cleanup_threshold and your disk_throughput.
Ideally, only ParNewGC will work to collect ephemeral objects, and it will take very short delays. 


On 04/09/2015 09:30 AM, Anishek Agarwal wrote:
Hello, 

We have only on CF as

CREATE TABLE t1(id bigint, ts timestamp, definition text, primary key (id, ts))
with clustering order by (ts desc) and gc_grace_seconds=0 
and compaction = {'class': 'DateTieredCompactionStrategy', 'timestamp_resolution':'SECONDS', 'base_time_seconds':'20', 'max_sstable_age_days':'30'}
and compression={'sstable_compression' : ''}; 

on a single Node using the following in 

cassandra.yaml:::::
memtable_total_space_in_mb: 2048
commitlog_total_space_in_mb: 4096
memtable_flush_writers: 2
memtable_flush_queue_size: 1

cassandra-env.sh ::::
MAX_HEAP_SIZE="8G"
HEAP_NEWSIZE="5120M"
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=6"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m"
JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts"
JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode"
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalPacing"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCTimeStamps -verbose:gc"
JVM_OPTS="$JVM_OPTS -Xloggc:/home/anishek/apache-cassandra-2.0.13/logs/gc.log"
JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC"
JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution"


I am writing via 20 threads continuously to this table above. 
I see that some data keeps moving from the young generation to the older generation continuously.

I am wondering why this is happening. Given i am writing constantly and my young generation is more than twice the max mem table space used i would think only the young generation space would be used and nothing would ever go old generation. 

** System.log show no compactions happening.
** There are no read operations. 
** Cassandra version 2.0.13 on centos with 16 cores and 16 GB Ram

Thanks
Anishek

-- 
Thanks,
Serj


smime.p7s (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Heap memory usage while writing

Sebastian Estevez
In reply to this post by Anishek Agarwal

Did you check out Cassandra-8150?

On Apr 10, 2015 7:04 AM, "Anishek Agarwal" <[hidden email]> wrote:
Hey,

Any reason you think the MaxTenuringThreshold should be increased. I am pumping data at full capacity that a single nodes seems to take so all the data becomes stale soon enough (when its flushed), additionally the whole memtable can be in young generation only. There seems to be enough additional space to even hold the bloom filters for the respective SSTTAbles i would guess. 

I will try with the CMSWaitDuration that should help in reducing the CMS initial mark phase i think.

Though i am not sure what is getting moved to old generation continuously to fill it ?

Thanks for the pointers.

On Fri, Apr 10, 2015 at 12:12 PM, [hidden email] <[hidden email]> wrote:
Hi,

You should increase MaxTenuringThreshold and CMSWaitDuration to keep your data in young generation longer (until the data will be flushed to disk).
Depending on your load, combine values of the next parameters: HEAP_NEWSIZE, memtable_total_space_in_mb, memtable_cleanup_threshold and your disk_throughput.
Ideally, only ParNewGC will work to collect ephemeral objects, and it will take very short delays. 


On 04/09/2015 09:30 AM, Anishek Agarwal wrote:
Hello, 

We have only on CF as

CREATE TABLE t1(id bigint, ts timestamp, definition text, primary key (id, ts))
with clustering order by (ts desc) and gc_grace_seconds=0 
and compaction = {'class': 'DateTieredCompactionStrategy', 'timestamp_resolution':'SECONDS', 'base_time_seconds':'20', 'max_sstable_age_days':'30'}
and compression={'sstable_compression' : ''}; 

on a single Node using the following in 

cassandra.yaml:::::
memtable_total_space_in_mb: 2048
commitlog_total_space_in_mb: 4096
memtable_flush_writers: 2
memtable_flush_queue_size: 1

cassandra-env.sh ::::
MAX_HEAP_SIZE="8G"
HEAP_NEWSIZE="5120M"
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=6"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m"
JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts"
JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode"
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalPacing"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCTimeStamps -verbose:gc"
JVM_OPTS="$JVM_OPTS -Xloggc:/home/anishek/apache-cassandra-2.0.13/logs/gc.log"
JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC"
JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution"


I am writing via 20 threads continuously to this table above. 
I see that some data keeps moving from the young generation to the older generation continuously.

I am wondering why this is happening. Given i am writing constantly and my young generation is more than twice the max mem table space used i would think only the young generation space would be used and nothing would ever go old generation. 

** System.log show no compactions happening.
** There are no read operations. 
** Cassandra version 2.0.13 on centos with 16 cores and 16 GB Ram

Thanks
Anishek

-- 
Thanks,
Serj

Reply | Threaded
Open this post in threaded view
|

Re: Heap memory usage while writing

Anishek Agarwal
Sorry i forgot to update but i am not using "CMSIncrementalMode" anymore as it over rides "UseCMSInitiatingOccupancyOnly".

@Graham : thanks for the "CMSParallelInitialMarkEnabled" and "CMSEdenChunksRecordAlways"  i havent used them, i will try it. My initial mark is only around 6ms though.

With my current config(with incorporating the changes above), I have been able to reduce the number of CMS run significantly now and mostly ParNewGC is running but when CMS triggers it takes a lot of time for Remark hence started using  -XX:+CMSParallelRemarkEnabled which gave some improvement. This is still around 70 ms.

MaxTenuringThreshold is low as i think most of the objects should be ephemeral with only writes. 

@Sebastian : I started from that Issue :), though i havent tried the GC affinity ones as of yet still. Thanks for the link!

Thanks
anishek


On Fri, Apr 10, 2015 at 5:49 PM, Sebastian Estevez <[hidden email]> wrote:

Did you check out Cassandra-8150?

On Apr 10, 2015 7:04 AM, "Anishek Agarwal" <[hidden email]> wrote:
Hey,

Any reason you think the MaxTenuringThreshold should be increased. I am pumping data at full capacity that a single nodes seems to take so all the data becomes stale soon enough (when its flushed), additionally the whole memtable can be in young generation only. There seems to be enough additional space to even hold the bloom filters for the respective SSTTAbles i would guess. 

I will try with the CMSWaitDuration that should help in reducing the CMS initial mark phase i think.

Though i am not sure what is getting moved to old generation continuously to fill it ?

Thanks for the pointers.

On Fri, Apr 10, 2015 at 12:12 PM, [hidden email] <[hidden email]> wrote:
Hi,

You should increase MaxTenuringThreshold and CMSWaitDuration to keep your data in young generation longer (until the data will be flushed to disk).
Depending on your load, combine values of the next parameters: HEAP_NEWSIZE, memtable_total_space_in_mb, memtable_cleanup_threshold and your disk_throughput.
Ideally, only ParNewGC will work to collect ephemeral objects, and it will take very short delays. 


On 04/09/2015 09:30 AM, Anishek Agarwal wrote:
Hello, 

We have only on CF as

CREATE TABLE t1(id bigint, ts timestamp, definition text, primary key (id, ts))
with clustering order by (ts desc) and gc_grace_seconds=0 
and compaction = {'class': 'DateTieredCompactionStrategy', 'timestamp_resolution':'SECONDS', 'base_time_seconds':'20', 'max_sstable_age_days':'30'}
and compression={'sstable_compression' : ''}; 

on a single Node using the following in 

cassandra.yaml:::::
memtable_total_space_in_mb: 2048
commitlog_total_space_in_mb: 4096
memtable_flush_writers: 2
memtable_flush_queue_size: 1

cassandra-env.sh ::::
MAX_HEAP_SIZE="8G"
HEAP_NEWSIZE="5120M"
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=6"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m"
JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts"
JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode"
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalPacing"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCTimeStamps -verbose:gc"
JVM_OPTS="$JVM_OPTS -Xloggc:/home/anishek/apache-cassandra-2.0.13/logs/gc.log"
JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC"
JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution"


I am writing via 20 threads continuously to this table above. 
I see that some data keeps moving from the young generation to the older generation continuously.

I am wondering why this is happening. Given i am writing constantly and my young generation is more than twice the max mem table space used i would think only the young generation space would be used and nothing would ever go old generation. 

** System.log show no compactions happening.
** There are no read operations. 
** Cassandra version 2.0.13 on centos with 16 cores and 16 GB Ram

Thanks
Anishek

-- 
Thanks,
Serj


Reply | Threaded
Open this post in threaded view
|

Re: Heap memory usage while writing

Serj Veras

MaxTenuringThreshold is low as i think most of the objects should be ephemeral with only writes.
You don't understant how MaxTenuringThreshold works. If you keep it low, than GC will move objects which is still "alive" to old gen space.
Yes, they ephemeral, but C* will keep it until flushed to disk. So, again, you should balance heap space, memtable_total_space_in_mb, memtable_cleanup_threshold and your disk_throughput to rid off memtables as soon as possible. If memtable_total_space_in_mb is large and young gen is large too, then you have to increase MaxTenuringThreshold, to keep CMS off of moving data to old gen.
If you sure that young gen is filled not so fast, that you can increase CMSWaitDuration to avoid useless calls of CMS.


On 04/10/2015 03:42 PM, Anishek Agarwal wrote:
Sorry i forgot to update but i am not using "CMSIncrementalMode" anymore as it over rides "UseCMSInitiatingOccupancyOnly".

@Graham : thanks for the "CMSParallelInitialMarkEnabled" and "CMSEdenChunksRecordAlways"  i havent used them, i will try it. My initial mark is only around 6ms though.

With my current config(with incorporating the changes above), I have been able to reduce the number of CMS run significantly now and mostly ParNewGC is running but when CMS triggers it takes a lot of time for Remark hence started using  -XX:+CMSParallelRemarkEnabled which gave some improvement. This is still around 70 ms.

MaxTenuringThreshold is low as i think most of the objects should be ephemeral with only writes. 

@Sebastian : I started from that Issue :), though i havent tried the GC affinity ones as of yet still. Thanks for the link!

Thanks
anishek


On Fri, Apr 10, 2015 at 5:49 PM, Sebastian Estevez <[hidden email]> wrote:

Did you check out Cassandra-8150?

On Apr 10, 2015 7:04 AM, "Anishek Agarwal" <[hidden email]> wrote:
Hey,

Any reason you think the MaxTenuringThreshold should be increased. I am pumping data at full capacity that a single nodes seems to take so all the data becomes stale soon enough (when its flushed), additionally the whole memtable can be in young generation only. There seems to be enough additional space to even hold the bloom filters for the respective SSTTAbles i would guess. 

I will try with the CMSWaitDuration that should help in reducing the CMS initial mark phase i think.

Though i am not sure what is getting moved to old generation continuously to fill it ?

Thanks for the pointers.

On Fri, Apr 10, 2015 at 12:12 PM, [hidden email] <[hidden email]> wrote:
Hi,

You should increase MaxTenuringThreshold and CMSWaitDuration to keep your data in young generation longer (until the data will be flushed to disk).
Depending on your load, combine values of the next parameters: HEAP_NEWSIZE, memtable_total_space_in_mb, memtable_cleanup_threshold and your disk_throughput.
Ideally, only ParNewGC will work to collect ephemeral objects, and it will take very short delays. 


On 04/09/2015 09:30 AM, Anishek Agarwal wrote:
Hello, 

We have only on CF as

CREATE TABLE t1(id bigint, ts timestamp, definition text, primary key (id, ts))
with clustering order by (ts desc) and gc_grace_seconds=0 
and compaction = {'class': 'DateTieredCompactionStrategy', 'timestamp_resolution':'SECONDS', 'base_time_seconds':'20', 'max_sstable_age_days':'30'}
and compression={'sstable_compression' : ''}; 

on a single Node using the following in 

cassandra.yaml:::::
memtable_total_space_in_mb: 2048
commitlog_total_space_in_mb: 4096
memtable_flush_writers: 2
memtable_flush_queue_size: 1

cassandra-env.sh ::::
MAX_HEAP_SIZE="8G"
HEAP_NEWSIZE="5120M"
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=6"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m"
JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts"
JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode"
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalPacing"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCTimeStamps -verbose:gc"
JVM_OPTS="$JVM_OPTS -Xloggc:/home/anishek/apache-cassandra-2.0.13/logs/gc.log"
JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC"
JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution"


I am writing via 20 threads continuously to this table above. 
I see that some data keeps moving from the young generation to the older generation continuously.

I am wondering why this is happening. Given i am writing constantly and my young generation is more than twice the max mem table space used i would think only the young generation space would be used and nothing would ever go old generation. 

** System.log show no compactions happening.
** There are no read operations. 
** Cassandra version 2.0.13 on centos with 16 cores and 16 GB Ram

Thanks
Anishek

-- 
Thanks,
Serj



-- 
Thanks,
Serj
Reply | Threaded
Open this post in threaded view
|

Re: Heap memory usage while writing

Anishek Agarwal
I do understand how MaxTenuringThreshold works, thanks for your evaluation though. 

I dont think you saw my complete post with the values i have used for the heap size and and the memtable_total_space_in_mb=2048 which is two times smaller than the young generation space i am using. additionally memtable_flush_queue_size=1 so there are not many memtables in memory, this coupled with the fact that i am writing out to cassandra wit 20 threads, it should pretty much just collect the objects from ParNewGC, which is what it is doing now. 

there are only 2 CMS collections that happened for me in 15 mins when running at full capacity, what i am now concerned about is that the CMS-remark phase is about 70 ms and that is something i am looking to bring down. There seems to be valuable pointers @ Cassandra-8150 still which i am going to try.



On Fri, Apr 10, 2015 at 7:26 PM, [hidden email] <[hidden email]> wrote:

MaxTenuringThreshold is low as i think most of the objects should be ephemeral with only writes.
You don't understant how MaxTenuringThreshold works. If you keep it low, than GC will move objects which is still "alive" to old gen space.
Yes, they ephemeral, but C* will keep it until flushed to disk. So, again, you should balance heap space, memtable_total_space_in_mb, memtable_cleanup_threshold and your disk_throughput to rid off memtables as soon as possible. If memtable_total_space_in_mb is large and young gen is large too, then you have to increase MaxTenuringThreshold, to keep CMS off of moving data to old gen.
If you sure that young gen is filled not so fast, that you can increase CMSWaitDuration to avoid useless calls of CMS.



On 04/10/2015 03:42 PM, Anishek Agarwal wrote:
Sorry i forgot to update but i am not using "CMSIncrementalMode" anymore as it over rides "UseCMSInitiatingOccupancyOnly".

@Graham : thanks for the "CMSParallelInitialMarkEnabled" and "CMSEdenChunksRecordAlways"  i havent used them, i will try it. My initial mark is only around 6ms though.

With my current config(with incorporating the changes above), I have been able to reduce the number of CMS run significantly now and mostly ParNewGC is running but when CMS triggers it takes a lot of time for Remark hence started using  -XX:+CMSParallelRemarkEnabled which gave some improvement. This is still around 70 ms.

MaxTenuringThreshold is low as i think most of the objects should be ephemeral with only writes. 

@Sebastian : I started from that Issue :), though i havent tried the GC affinity ones as of yet still. Thanks for the link!

Thanks
anishek


On Fri, Apr 10, 2015 at 5:49 PM, Sebastian Estevez <[hidden email]> wrote:

Did you check out Cassandra-8150?

On Apr 10, 2015 7:04 AM, "Anishek Agarwal" <[hidden email]> wrote:
Hey,

Any reason you think the MaxTenuringThreshold should be increased. I am pumping data at full capacity that a single nodes seems to take so all the data becomes stale soon enough (when its flushed), additionally the whole memtable can be in young generation only. There seems to be enough additional space to even hold the bloom filters for the respective SSTTAbles i would guess. 

I will try with the CMSWaitDuration that should help in reducing the CMS initial mark phase i think.

Though i am not sure what is getting moved to old generation continuously to fill it ?

Thanks for the pointers.

On Fri, Apr 10, 2015 at 12:12 PM, [hidden email] <[hidden email]> wrote:
Hi,

You should increase MaxTenuringThreshold and CMSWaitDuration to keep your data in young generation longer (until the data will be flushed to disk).
Depending on your load, combine values of the next parameters: HEAP_NEWSIZE, memtable_total_space_in_mb, memtable_cleanup_threshold and your disk_throughput.
Ideally, only ParNewGC will work to collect ephemeral objects, and it will take very short delays. 


On 04/09/2015 09:30 AM, Anishek Agarwal wrote:
Hello, 

We have only on CF as

CREATE TABLE t1(id bigint, ts timestamp, definition text, primary key (id, ts))
with clustering order by (ts desc) and gc_grace_seconds=0 
and compaction = {'class': 'DateTieredCompactionStrategy', 'timestamp_resolution':'SECONDS', 'base_time_seconds':'20', 'max_sstable_age_days':'30'}
and compression={'sstable_compression' : ''}; 

on a single Node using the following in 

cassandra.yaml:::::
memtable_total_space_in_mb: 2048
commitlog_total_space_in_mb: 4096
memtable_flush_writers: 2
memtable_flush_queue_size: 1

cassandra-env.sh ::::
MAX_HEAP_SIZE="8G"
HEAP_NEWSIZE="5120M"
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=6"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m"
JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts"
JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode"
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalPacing"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCTimeStamps -verbose:gc"
JVM_OPTS="$JVM_OPTS -Xloggc:/home/anishek/apache-cassandra-2.0.13/logs/gc.log"
JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC"
JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution"


I am writing via 20 threads continuously to this table above. 
I see that some data keeps moving from the young generation to the older generation continuously.

I am wondering why this is happening. Given i am writing constantly and my young generation is more than twice the max mem table space used i would think only the young generation space would be used and nothing would ever go old generation. 

** System.log show no compactions happening.
** There are no read operations. 
** Cassandra version 2.0.13 on centos with 16 cores and 16 GB Ram

Thanks
Anishek

-- 
Thanks,
Serj



-- 
Thanks,
Serj