Re:Cassandra Read Timeout

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Re:Cassandra Read Timeout

Marcelo Valle (BLOOMBERG/ LONDON)
Yulian,

Maybe other people have other clues, but I think if you could monitor the behavior in tpstats after activity "Seeking to partition beginning in data file" it could help to find the problem. Which type of thread is getting stuck? Do you see any number increasing continuously during the request?

Best regards,
Marcelo.

Subject: Re:Cassandra Read Timeout
Hello to all
I have single node cassandra on amazon ec2.
Currently i am having a read timeout problem on single CF , single raw.

Raw size is aroung 190MB.There are bigger raws with similar structure ( its index raws , which actually stores keys ) and everything is working fine on them, everything is working also fine on this cf but on other raw.

Tables data from CFStats ( First table has bigger raws but works fine , where second has timeout ) :
Column Family: pendindexes
SSTable count: 5
Space used (live): 462298352
Space used (total): 462306752
SSTable Compression Ratio: 0.3511107495795905
Number of Keys (estimate): 640
Memtable Columns Count: 63339
Memtable Data Size: 12328802
Memtable Switch Count: 78
Read Count: 10
Read Latency: NaN ms.
Write Count: 1530113
Write Latency: 0.022 ms.
Pending Tasks: 0
Bloom Filter False Positives: 0
Bloom Filter False Ratio: 0.00000
Bloom Filter Space Used: 3920
Compacted row minimum size: 73
Compacted row maximum size: 223875792
Compacted row mean size: 42694982
Average live cells per slice (last five minutes): 21.0
Average tombstones per slice (last five minutes): 0.0

Column Family: statuspindexes
SSTable count: 1
Space used (live): 99602136
Space used (total): 99609360
SSTable Compression Ratio: 0.34278775390997873
Number of Keys (estimate): 128
Memtable Columns Count: 6250
Memtable Data Size: 6061097
Memtable Switch Count: 65
Read Count: 1000
Read Latency: NaN ms.
Write Count: 1193142
Write Latency: 3.616 ms.
Pending Tasks: 0
Bloom Filter False Positives: 0
Bloom Filter False Ratio: 0.00000
Bloom Filter Space Used: 656
Compacted row minimum size: 180
Compacted row maximum size: 186563160
Compacted row mean size: 63225562
Average live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0

I have tried to debug it with cql , thats what i get:

 activity                                                                                        | timestamp    | source       | source_elapsed
-------------------------------------------------------------------------------------------------+--------------+--------------+----------------
                                                                              execute_cql3_query | 15:39:53,120 | 172.31.6.173 |              0
                                               Parsing Select * from statuspindexes LIMIT 10000; | 15:39:53,120 | 172.31.6.173 |            875
                                                                             Preparing statement | 15:39:53,121 | 172.31.6.173 |           1643
                                                                   Determining replicas to query | 15:39:53,121 | 172.31.6.173 |           1740
 Executing seq scan across 1 sstables for [min(-9223372036854775808), min(-9223372036854775808)] | 15:39:53,122 | 172.31.6.173 |           2581
                                                     Seeking to partition beginning in data file | 15:39:53,123 | 172.31.6.173 |           3118
                                           Timed out; received 0 of 1 responses for range 2 of 2 | 15:40:03,121 | 172.31.6.173 |       10001370
                                                                                Request complete | 15:40:03,121 | 172.31.6.173 |       10001513

I have executed compaction on that cf.
What could lead to that problem?
Best regards
Yulian Oifa


Reply | Threaded
Open this post in threaded view
|

Re: Cassandra Read Timeout

Yulian Oifa
Hello
TP STATS Before Request:

Pool Name                    Active   Pending      Completed   Blocked  All time blocked
ReadStage                         0         0        7592835         0                 0
RequestResponseStage              0         0              0         0                 0
MutationStage                     0         0      215980736         0                 0
ReadRepairStage                   0         0              0         0                 0
ReplicateOnWriteStage             0         0              0         0                 0
GossipStage                       0         0              0         0                 0
AntiEntropyStage                  0         0              0         0                 0
MigrationStage                    0         0             28         0                 0
MemoryMeter                       0         0            474         0                 0
MemtablePostFlusher               0         0          32845         0                 0
FlushWriter                       0         0           4013         0              2239
MiscStage                         0         0              0         0                 0
PendingRangeCalculator            0         0              1         0                 0
commitlog_archiver                0         0              0         0                 0
InternalResponseStage             0         0              0         0                 0
HintedHandoff                     0         0              0         0                 0

Message type           Dropped
RANGE_SLICE                  0
READ_REPAIR                  0
BINARY                       0
READ                         0
MUTATION                     0
_TRACE                       0
REQUEST_RESPONSE             0
COUNTER_MUTATION             0

TP STATS After Request:

Pool Name                    Active   Pending      Completed   Blocked  All time blocked
ReadStage                         1         1        7592942         0                 0
RequestResponseStage              0         0              0         0                 0
MutationStage                     0         0      215983339         0                 0
ReadRepairStage                   0         0              0         0                 0
ReplicateOnWriteStage             0         0              0         0                 0
GossipStage                       0         0              0         0                 0
AntiEntropyStage                  0         0              0         0                 0
MigrationStage                    0         0             28         0                 0
MemoryMeter                       0         0            474         0                 0
MemtablePostFlusher               0         0          32845         0                 0
FlushWriter                       0         0           4013         0              2239
MiscStage                         0         0              0         0                 0
PendingRangeCalculator            0         0              1         0                 0
commitlog_archiver                0         0              0         0                 0
InternalResponseStage             0         0              0         0                 0
HintedHandoff                     0         0              0         0                 0

Message type           Dropped
RANGE_SLICE                  0
READ_REPAIR                  0
BINARY                       0
READ                         0
MUTATION                     0
_TRACE                       0
REQUEST_RESPONSE             0
COUNTER_MUTATION             0

The only items changed are : ReadStage increased by 107 + 1 Active/Pending and MutationStage which changed by 2603.
Please note that system is writing all the time in batches ( each second 2 servers write one batch each ) so i dont see anything special in this numbers.

Best regards
Yulian Oifa
Reply | Threaded
Open this post in threaded view
|

Re: Cassandra Read Timeout

Marcelo Valle (BLOOMBERG/ LONDON)
In reply to this post by Marcelo Valle (BLOOMBERG/ LONDON)
Indeed, I thought something odd could be happening to your cluster, but it seems it's working fine but the request is taking too long to complete. 

I noticed from your cfstats the read count was about 10 in the first CF and in the second one it was about 1000... Would you be doing much more reads in the second one? If the load is higher, it could justify the timeout.

Do both CFs have the same data model? Are you running exactly the same queries?

Best regards,
Marcelo.

Subject: Re: Cassandra Read Timeout
Hello
TP STATS Before Request:

Pool Name                    Active   Pending      Completed   Blocked  All time blocked
ReadStage                         0         0        7592835         0                 0
RequestResponseStage              0         0              0         0                 0
MutationStage                     0         0      215980736         0                 0
ReadRepairStage                   0         0              0         0                 0
ReplicateOnWriteStage             0         0              0         0                 0
GossipStage                       0         0              0         0                 0
AntiEntropyStage                  0         0              0         0                 0
MigrationStage                    0         0             28         0                 0
MemoryMeter                       0         0            474         0                 0
MemtablePostFlusher               0         0          32845         0                 0
FlushWriter                       0         0           4013         0              2239
MiscStage                         0         0              0         0                 0
PendingRangeCalculator            0         0              1         0                 0
commitlog_archiver                0         0              0         0                 0
InternalResponseStage             0         0              0         0                 0
HintedHandoff                     0         0              0         0                 0

Message type           Dropped
RANGE_SLICE                  0
READ_REPAIR                  0
BINARY                       0
READ                         0
MUTATION                     0
_TRACE                       0
REQUEST_RESPONSE             0
COUNTER_MUTATION             0

TP STATS After Request:

Pool Name                    Active   Pending      Completed   Blocked  All time blocked
ReadStage                         1         1        7592942         0                 0
RequestResponseStage              0         0              0         0                 0
MutationStage                     0         0      215983339         0                 0
ReadRepairStage                   0         0              0         0                 0
ReplicateOnWriteStage             0         0              0         0                 0
GossipStage                       0         0              0         0                 0
AntiEntropyStage                  0         0              0         0                 0
MigrationStage                    0         0             28         0                 0
MemoryMeter                       0         0            474         0                 0
MemtablePostFlusher               0         0          32845         0                 0
FlushWriter                       0         0           4013         0              2239
MiscStage                         0         0              0         0                 0
PendingRangeCalculator            0         0              1         0                 0
commitlog_archiver                0         0              0         0                 0
InternalResponseStage             0         0              0         0                 0
HintedHandoff                     0         0              0         0                 0

Message type           Dropped
RANGE_SLICE                  0
READ_REPAIR                  0
BINARY                       0
READ                         0
MUTATION                     0
_TRACE                       0
REQUEST_RESPONSE             0
COUNTER_MUTATION             0

The only items changed are : ReadStage increased by 107 + 1 Active/Pending and MutationStage which changed by 2603.
Please note that system is writing all the time in batches ( each second 2 servers write one batch each ) so i dont see anything special in this numbers.

Best regards
Yulian Oifa

Reply | Threaded
Open this post in threaded view
|

Re: Cassandra Read Timeout

Yulian Oifa
Hello
The structure is the same , the CFs are super column CFs , where key is long  ( timestamp to partition the index , so each 11 days new row is created ) , super Column is int32 and columns / values are timeuuids.I am running same queries , getting reversed slice by raw key and super column.
The number of reads is relatively high on second CF since i am testing it several hours already , most of time there are no requests for read on both of them , only writes.There is at most 1 read request in 20-30 seconds so it should not create a load.There is also no reads ( 0 before and 1 after ) pending on tpstats.
Please also note that queries on different row same super column , and same row , different super column are working , and if i am not mistaken cassandra is loading complete raw including all super columns to memory ( either any request to this row should fail if this would be a memory problem , or none...).

Best regards
Yulian Oifa

Reply | Threaded
Open this post in threaded view
|

Re: Cassandra Read Timeout

Marcelo Valle (BLOOMBERG/ LONDON)
In reply to this post by Marcelo Valle (BLOOMBERG/ LONDON)
Super column? Out of curiosity, which Cassandra version are you running?


Subject: Re: Cassandra Read Timeout
Hello
The structure is the same , the CFs are super column CFs , where key is long  ( timestamp to partition the index , so each 11 days new row is created ) , super Column is int32 and columns / values are timeuuids.I am running same queries , getting reversed slice by raw key and super column.
The number of reads is relatively high on second CF since i am testing it several hours already , most of time there are no requests for read on both of them , only writes.There is at most 1 read request in 20-30 seconds so it should not create a load.There is also no reads ( 0 before and 1 after ) pending on tpstats.
Please also note that queries on different row same super column , and same row , different super column are working , and if i am not mistaken cassandra is loading complete raw including all super columns to memory ( either any request to this row should fail if this would be a memory problem , or none...).

Best regards
Yulian Oifa


Reply | Threaded
Open this post in threaded view
|

Re: Cassandra Read Timeout

Yulian Oifa
Hello
I am running 1.2.19
Best regards
Yulian Oifa

On Tue, Feb 24, 2015 at 6:57 PM, Marcelo Valle (BLOOMBERG/ LONDON) <[hidden email]> wrote:
Super column? Out of curiosity, which Cassandra version are you running?


Subject: Re: Cassandra Read Timeout
Hello
The structure is the same , the CFs are super column CFs , where key is long  ( timestamp to partition the index , so each 11 days new row is created ) , super Column is int32 and columns / values are timeuuids.I am running same queries , getting reversed slice by raw key and super column.
The number of reads is relatively high on second CF since i am testing it several hours already , most of time there are no requests for read on both of them , only writes.There is at most 1 read request in 20-30 seconds so it should not create a load.There is also no reads ( 0 before and 1 after ) pending on tpstats.
Please also note that queries on different row same super column , and same row , different super column are working , and if i am not mistaken cassandra is loading complete raw including all super columns to memory ( either any request to this row should fail if this would be a memory problem , or none...).

Best regards
Yulian Oifa



Reply | Threaded
Open this post in threaded view
|

Re: Cassandra Read Timeout

Marcelo Valle (BLOOMBERG/ LONDON)
In reply to this post by Marcelo Valle (BLOOMBERG/ LONDON)
I am sorry, not sure if I will be able to help you. I am not familiar with super columns, I would tell you to try to get rid of them as soon as possible. Maybe someone else in the list can help you.
Anyway, it really seems you have a cell with a very large amount of data and the request might be taking longer to complete because of the amount, but I also don't understand why the requests on same row different SC and same SC different row would work. 

[]s

Subject: Re: Cassandra Read Timeout
Hello
I am running 1.2.19
Best regards
Yulian Oifa

On Tue, Feb 24, 2015 at 6:57 PM, Marcelo Valle (BLOOMBERG/ LONDON) <[hidden email]> wrote:
Super column? Out of curiosity, which Cassandra version are you running?


Subject: Re: Cassandra Read Timeout
Hello
The structure is the same , the CFs are super column CFs , where key is long  ( timestamp to partition the index , so each 11 days new row is created ) , super Column is int32 and columns / values are timeuuids.I am running same queries , getting reversed slice by raw key and super column.
The number of reads is relatively high on second CF since i am testing it several hours already , most of time there are no requests for read on both of them , only writes.There is at most 1 read request in 20-30 seconds so it should not create a load.There is also no reads ( 0 before and 1 after ) pending on tpstats.
Please also note that queries on different row same super column , and same row , different super column are working , and if i am not mistaken cassandra is loading complete raw including all super columns to memory ( either any request to this row should fail if this would be a memory problem , or none...).

Best regards
Yulian Oifa




Reply | Threaded
Open this post in threaded view
|

Re: Cassandra Read Timeout

Robert Coli-3
In reply to this post by Yulian Oifa
On Tue, Feb 24, 2015 at 8:50 AM, Yulian Oifa <[hidden email]> wrote:
The structure is the same , the CFs are super column CFs , where key is long  ( timestamp to partition the index , so each 11 days new row is created ) , super Column is int32 and columns / values are timeuuids.I am running same queries , getting reversed slice by raw key and super column.

Obligatory notice that Super Columns are not really recommended for use. I have no idea if the performance problem you are seeing is related to the use of Super Columns.

=Rob
 
Jan
Reply | Threaded
Open this post in threaded view
|

Re: Cassandra Read Timeout

Jan
Yulian; 

Quote :
Raw size is aroung 190MB.There are bigger raws with similar structure ( its index raws , which actually stores keys ) 
and everything is working fine on them, everything is working also fine on this cf but on other raw.

Tables data from CFStats ( First table has bigger raws but works fine , where second has timeout ) :

---------------------------

You asked: There are bigger raws with similar structure 
Question:  Do you mean bigger rows  ?      What is the structure of the statuspindexes Keyspace & which table are you querying within it ?

you asked: its index raws , which actually stores keys
Question: do you mean Index rows ?     how are you creating Indexes  , what type of Indexes ?  

you asked: Tables data from CFStats, where second has timeout 
Question: What is the time out value set at  & whats different about both these tables ?     What are you querying from the second table ? 


Unfortunately,  I have more questions that answers;  however  despite the sacrilege of using super-columns (lol), there has got to be a logical answer to the Performance problem you are having.       Hopefully we could dig in and find an answer . 


Jan/  








On Tuesday, February 24, 2015 12:00 PM, Robert Coli <[hidden email]> wrote:


On Tue, Feb 24, 2015 at 8:50 AM, Yulian Oifa <[hidden email]> wrote:
The structure is the same , the CFs are super column CFs , where key is long  ( timestamp to partition the index , so each 11 days new row is created ) , super Column is int32 and columns / values are timeuuids.I am running same queries , getting reversed slice by raw key and super column.

Obligatory notice that Super Columns are not really recommended for use. I have no idea if the performance problem you are seeing is related to the use of Super Columns.

=Rob