Slow writes in cassandra local cluster with 2000+ column families

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Slow writes in cassandra local cluster with 2000+ column families

sainathreddy
This post has NOT been accepted by the mailing list yet.
 0
down vote
favorite
       

I have this use case where I would need to constantly listen to a kafka topic and write to 2000 column families(15 columns each.. time series data) based on a column value from a Spark streaming App. I have a local Cassandra installation set up. Creating these column families takes around 1.5 hrs on a CentOS VM using 3 cores and and 12 gigs of ram. In my spark streaming app I'm doing some preprocessing for storing these stream events to Cassandra. I'm running into issues with the amount of time it takes for my streaming app to complete this.
I was trying to save 300 events to multiple column families(roughly 200-250) based on key for this my app takes around 10 minutes to save them. This seems to be strange as printing these events to screen grouped by key takes less than a minute, but only when I am saving them to Cassandra it takes time. I have had no issues saving records in the order of 3 million to Cassandra . It took less than 3 minutes(but this was to a single column family in Cassandra).

My requirement is to be as real-time as possible and this seems to be nowhere close. Production environment would have roughly 400 events every 3 seconds.

Is there any tuning that i need to do With the YAML file in Cassandra or any changes to cassandra-connector itself

This a part of the log that i see on cassandra console

***************************************************************************************
INFO  05:25:14 system_traces.events                      0,0
WARN  05:25:14 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:14 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN  05:25:15 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:15 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:15 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN  05:25:15 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
INFO  05:25:16 ParNew GC in 340ms.  CMS Old Gen: 1308020680 -> 1454559048; Par Eden Space: 251658240 -> 0;
WARN  05:25:16 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:16 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN  05:25:17 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:17 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:17 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN  05:25:17 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
INFO  05:25:17 ParNew GC in 370ms.  CMS Old Gen: 1498825040 -> 1669094840; Par Eden Space: 251658240 -> 0;
WARN  05:25:18 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:18 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN  05:25:18 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:18 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:19 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN  05:25:19 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
INFO  05:25:19 ParNew GC in 382ms.  CMS Old Gen: 1714792864 -> 1875460032; Par Eden Space: 251658240 -> 0;
W

*******************************************************************************************************
I tried to set the heap size in the cassandra-env.sh file
MAX_HEAP_SIZE="2G"
HEAP_NEWSIZE="300M"
That didn't help either
What i understand is there are a lot of ParNew GC's being done like every 500 ms or so . I dont know how i can alter this . Any suggestions?
Loading...