Adjusting readahead for SSD disk seeks

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Adjusting readahead for SSD disk seeks

Donald Smith

We’re using cassandra as a key-value store; our values are small.  So we’re thinking we don’t need much disk readahead (e.g., “blockdev –getra /dev/sda”).   We’re using SSDs.

 

When cassandra does disk seeks to satisfy read requests does it typically have to read in the entire SStable into memory (assuming the bloom filter said yes)?  If cassandra needs to read in lots of blocks anyway or if it needs to read the entire file during compaction then I'd expect we might as well have a big readahead.   Perhaps there’s a tradeoff between read latency and compaction time.

 

Any feedback welcome.


Thanks

 

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
[hidden email]


AudienceScience

 

Reply | Threaded
Open this post in threaded view
|

Re: Adjusting readahead for SSD disk seeks

DuyHai Doan
"does it typically have to read in the entire SStable into memory (assuming the bloom filter said yes)?" --> No, it would be perf killer.

 On the read path, after Bloom filter, Cassandra is using the "Partition Key Cache" to see if the partition it is looking for is present there. 

 If yes, it gets the offset (from the beginning of the SSTable) to skip a lot of data and move the disk head directly there
 If not, it then relies on the "Partition sample" to move the disk head to the nearest location of the sought partition

 If compaction is on (by default), there will be another step before hitting disk: compression offset. It's a translation table to match uncompressed file offset / compressed file offset


On Wed, Sep 24, 2014 at 10:07 PM, Donald Smith <[hidden email]> wrote:

We’re using cassandra as a key-value store; our values are small.  So we’re thinking we don’t need much disk readahead (e.g., “blockdev –getra /dev/sda”).   We’re using SSDs.

 

When cassandra does disk seeks to satisfy read requests does it typically have to read in the entire SStable into memory (assuming the bloom filter said yes)?  If cassandra needs to read in lots of blocks anyway or if it needs to read the entire file during compaction then I'd expect we might as well have a big readahead.   Perhaps there’s a tradeoff between read latency and compaction time.

 

Any feedback welcome.


Thanks

 

Donald A. Smith | Senior Software Engineer
P: <a href="tel:425.201.3900%20x%203866" value="+14252013900" target="_blank">425.201.3900 x 3866
C: <a href="tel:%28206%29%20819-5965" value="+12068195965" target="_blank">(206) 819-5965
F: <a href="tel:%28646%29%20443-2333" value="+16464432333" target="_blank">(646) 443-2333
[hidden email]


AudienceScience

 


Reply | Threaded
Open this post in threaded view
|

Re: Adjusting readahead for SSD disk seeks

Daniel Chia
Cassandra only reads a small part of each SSTable during normal operation (not compaction), in fact Datastax recommends lowering readahead - http://www.datastax.com/documentation/cassandra/2.1/cassandra/install/installRecommendSettings.html

There are also blogposts where people have improved their read latency reducing ra.

Thanks,
Daniel

On Wed, Sep 24, 2014 at 4:15 PM, DuyHai Doan <[hidden email]> wrote:
"does it typically have to read in the entire SStable into memory (assuming the bloom filter said yes)?" --> No, it would be perf killer.

 On the read path, after Bloom filter, Cassandra is using the "Partition Key Cache" to see if the partition it is looking for is present there. 

 If yes, it gets the offset (from the beginning of the SSTable) to skip a lot of data and move the disk head directly there
 If not, it then relies on the "Partition sample" to move the disk head to the nearest location of the sought partition

 If compaction is on (by default), there will be another step before hitting disk: compression offset. It's a translation table to match uncompressed file offset / compressed file offset


On Wed, Sep 24, 2014 at 10:07 PM, Donald Smith <[hidden email]> wrote:

We’re using cassandra as a key-value store; our values are small.  So we’re thinking we don’t need much disk readahead (e.g., “blockdev –getra /dev/sda”).   We’re using SSDs.

 

When cassandra does disk seeks to satisfy read requests does it typically have to read in the entire SStable into memory (assuming the bloom filter said yes)?  If cassandra needs to read in lots of blocks anyway or if it needs to read the entire file during compaction then I'd expect we might as well have a big readahead.   Perhaps there’s a tradeoff between read latency and compaction time.

 

Any feedback welcome.


Thanks

 

Donald A. Smith | Senior Software Engineer
P: <a href="tel:425.201.3900%20x%203866" value="+14252013900" target="_blank">425.201.3900 x 3866
C: <a href="tel:%28206%29%20819-5965" value="+12068195965" target="_blank">(206) 819-5965
F: <a href="tel:%28646%29%20443-2333" value="+16464432333" target="_blank">(646) 443-2333
[hidden email]


AudienceScience

 



Reply | Threaded
Open this post in threaded view
|

Re: Adjusting readahead for SSD disk seeks

Kevin Burton
I’d advise keeping read ahead low… or turning it off on SSD.  Also, noop IO scheduler might help you on that disk..

IF Cassandra DOES perform a contiguous read, read ahead won’t be helpful.

It’s essentially obsolete now on SSDs.

On Wed, Sep 24, 2014 at 1:20 PM, Daniel Chia <[hidden email]> wrote:
Cassandra only reads a small part of each SSTable during normal operation (not compaction), in fact Datastax recommends lowering readahead - http://www.datastax.com/documentation/cassandra/2.1/cassandra/install/installRecommendSettings.html

There are also blogposts where people have improved their read latency reducing ra.

Thanks,
Daniel

On Wed, Sep 24, 2014 at 4:15 PM, DuyHai Doan <[hidden email]> wrote:
"does it typically have to read in the entire SStable into memory (assuming the bloom filter said yes)?" --> No, it would be perf killer.

 On the read path, after Bloom filter, Cassandra is using the "Partition Key Cache" to see if the partition it is looking for is present there. 

 If yes, it gets the offset (from the beginning of the SSTable) to skip a lot of data and move the disk head directly there
 If not, it then relies on the "Partition sample" to move the disk head to the nearest location of the sought partition

 If compaction is on (by default), there will be another step before hitting disk: compression offset. It's a translation table to match uncompressed file offset / compressed file offset


On Wed, Sep 24, 2014 at 10:07 PM, Donald Smith <[hidden email]> wrote:

We’re using cassandra as a key-value store; our values are small.  So we’re thinking we don’t need much disk readahead (e.g., “blockdev –getra /dev/sda”).   We’re using SSDs.

 

When cassandra does disk seeks to satisfy read requests does it typically have to read in the entire SStable into memory (assuming the bloom filter said yes)?  If cassandra needs to read in lots of blocks anyway or if it needs to read the entire file during compaction then I'd expect we might as well have a big readahead.   Perhaps there’s a tradeoff between read latency and compaction time.

 

Any feedback welcome.


Thanks

 

Donald A. Smith | Senior Software Engineer
P: <a href="tel:425.201.3900%20x%203866" value="+14252013900" target="_blank">425.201.3900 x 3866
C: <a href="tel:%28206%29%20819-5965" value="+12068195965" target="_blank">(206) 819-5965
F: <a href="tel:%28646%29%20443-2333" value="+16464432333" target="_blank">(646) 443-2333
[hidden email]


AudienceScience

 






--

Founder/CEO Spinn3r.com
Location: San Francisco, CA
… or check out my Google+ profile