Is cassandra smart enough to serve Read requests entirely from Memtables in some cases?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Is cassandra smart enough to serve Read requests entirely from Memtables in some cases?

Donald Smith

Question about the read path in cassandra.  If a partition/row is in the Memtable and is being actively written to by other clients,  will a READ of that partition also have to hit SStables on disk (or in the page cache)?  Or can it be serviced entirely from the Memtable?

 

If you select all columns (e.g., “select * from ….”)   then I can imagine that cassandra would need to merge whatever columns are in the Memtable with what’s in SStables on disk.

 

But if you select a single column (e.g., “select Name from ….  where id= ….”) and if that column is in the Memtable, I’d hope cassandra could skip checking the disk.  Can it do this optimization?

 

Thanks, Don

 

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
[hidden email]


AudienceScience

 

Reply | Threaded
Open this post in threaded view
|

Re: Is cassandra smart enough to serve Read requests entirely from Memtables in some cases?

Jonathan Haddad
No.  Consider a scenario where you supply a timestamp a week in the future, flush it to sstable, and then do a write, with the current timestamp.  The record in disk will have a timestamp greater than the one in the memtable.

On Wed, Oct 22, 2014 at 9:18 AM, Donald Smith <[hidden email]> wrote:

Question about the read path in cassandra.  If a partition/row is in the Memtable and is being actively written to by other clients,  will a READ of that partition also have to hit SStables on disk (or in the page cache)?  Or can it be serviced entirely from the Memtable?

 

If you select all columns (e.g., “select * from ….”)   then I can imagine that cassandra would need to merge whatever columns are in the Memtable with what’s in SStables on disk.

 

But if you select a single column (e.g., “select Name from ….  where id= ….”) and if that column is in the Memtable, I’d hope cassandra could skip checking the disk.  Can it do this optimization?

 

Thanks, Don

 

Donald A. Smith | Senior Software Engineer
P: <a href="tel:425.201.3900%20x%203866" value="+14252013900" target="_blank">425.201.3900 x 3866
C: <a href="tel:%28206%29%20819-5965" value="+12068195965" target="_blank">(206) 819-5965
F: <a href="tel:%28646%29%20443-2333" value="+16464432333" target="_blank">(646) 443-2333
[hidden email]


AudienceScience

 




--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade
Reply | Threaded
Open this post in threaded view
|

RE: Is cassandra smart enough to serve Read requests entirely from Memtables in some cases?

Donald Smith

On the cassandra irc channel I discussed this question.  I learned that the timestamp in the Memtable may be OLDER than the timestamp in some SSTable (e.g., due to hints or retries).  So there’s no guarantee that the Memtable has the most recent version. 

 

But there may be cases, they say, in which the time stamp in the SSTable can be used to skip over SSTables that have older data (via metadata on SSTables, I presume).

 

Memtable are like write-through caches and do NOT correspond to SSTables loaded from disk.

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Jonathan Haddad
Sent: Wednesday, October 22, 2014 9:24 AM
To: [hidden email]
Subject: Re: Is cassandra smart enough to serve Read requests entirely from Memtables in some cases?

 

No.  Consider a scenario where you supply a timestamp a week in the future, flush it to sstable, and then do a write, with the current timestamp.  The record in disk will have a timestamp greater than the one in the memtable.

 

On Wed, Oct 22, 2014 at 9:18 AM, Donald Smith <[hidden email]> wrote:

Question about the read path in cassandra.  If a partition/row is in the Memtable and is being actively written to by other clients,  will a READ of that partition also have to hit SStables on disk (or in the page cache)?  Or can it be serviced entirely from the Memtable?

 

If you select all columns (e.g., “select * from ….”)   then I can imagine that cassandra would need to merge whatever columns are in the Memtable with what’s in SStables on disk.

 

But if you select a single column (e.g., “select Name from ….  where id= ….”) and if that column is in the Memtable, I’d hope cassandra could skip checking the disk.  Can it do this optimization?

 

Thanks, Don

 

Donald A. Smith | Senior Software Engineer
P: <a href="tel:425.201.3900%20x%203866" target="_blank">425.201.3900 x 3866
C: <a href="tel:%28206%29%20819-5965" target="_blank">(206) 819-5965
F: <a href="tel:%28646%29%20443-2333" target="_blank">(646) 443-2333
[hidden email]


AudienceScience

 



 

--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade