We’re using cassandra as a key-value store; our values are small. So we’re thinking we don’t need much disk readahead (e.g., “blockdev –getra /dev/sda”). We’re using SSDs.
When cassandra does disk seeks to satisfy read requests does it typically have to read in the entire SStable into memory (assuming the bloom filter said yes)? If cassandra needs to read in lots of blocks anyway or if it needs to read the entire file during compaction then I'd expect we might as well have a big readahead. Perhaps there’s a tradeoff between read latency and compaction time.
Any feedback welcome.
Donald A. Smith
| Senior Software Engineer
"does it typically have to read in the entire SStable into memory (assuming the bloom filter said yes)?" --> No, it would be perf killer.
On the read path, after Bloom filter, Cassandra is using the "Partition Key Cache" to see if the partition it is looking for is present there.
If yes, it gets the offset (from the beginning of the SSTable) to skip a lot of data and move the disk head directly there
If not, it then relies on the "Partition sample" to move the disk head to the nearest location of the sought partition
If compaction is on (by default), there will be another step before hitting disk: compression offset. It's a translation table to match uncompressed file offset / compressed file offset
On Wed, Sep 24, 2014 at 10:07 PM, Donald Smith <[hidden email]> wrote:
Cassandra only reads a small part of each SSTable during normal operation (not compaction), in fact Datastax recommends lowering readahead - http://www.datastax.com/documentation/cassandra/2.1/cassandra/install/installRecommendSettings.html
There are also blogposts where people have improved their read latency reducing ra.
On Wed, Sep 24, 2014 at 4:15 PM, DuyHai Doan <[hidden email]> wrote:
I’d advise keeping read ahead low… or turning it off on SSD. Also, noop IO scheduler might help you on that disk..
IF Cassandra DOES perform a contiguous read, read ahead won’t be helpful.
It’s essentially obsolete now on SSDs.
On Wed, Sep 24, 2014 at 1:20 PM, Daniel Chia <[hidden email]> wrote:
|Free forum by Nabble||Edit this page|