Uderstanding Read after update

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Uderstanding Read after update

Anishek Agarwal
Hello,

I am wondering how the actual read happens in Cassandra with Updates happening to data over time such that its in multiple SST Tables.

As far as i can gather from the documentation, 

SST Table level bloom filters have details as to what partition keys are in that table. So to clear up my understanding, if I insert and then have a update to the same row after some time (assuming both go to different SST Tables),

then during read cassandra will read data from both SST Tables and merge them in order of time series with Data in Second SST table for the row taking precedence over the First SST Table and return the result ? Does it mark the old column as tombstone in the previous SST Table or wait for compaction to remove the old data ?

Thanks
Anishek
Reply | Threaded
Open this post in threaded view
|

Re: Uderstanding Read after update

Anishek Agarwal
Additional thought,

As far as i understand, to have a bloom filter such that it is able to reliably match a error rate we have to have some estimate of the key space, to design how many bits are to be given + how many hashes are to be used for a given key. Since users dont provide that does cassandra have some mechanism internally to start with defaults and create new bloom filters for SST Tables ? or when the data is in mem cache it also keep tracks of unique keys in that memtable so when it writes to disk it can use that to derive the right size of bloom filter for that SST Table ?



On Fri, Apr 10, 2015 at 5:16 PM, Anishek Agarwal <[hidden email]> wrote:
Hello,

I am wondering how the actual read happens in Cassandra with Updates happening to data over time such that its in multiple SST Tables.

As far as i can gather from the documentation, 

SST Table level bloom filters have details as to what partition keys are in that table. So to clear up my understanding, if I insert and then have a update to the same row after some time (assuming both go to different SST Tables),

then during read cassandra will read data from both SST Tables and merge them in order of time series with Data in Second SST table for the row taking precedence over the First SST Table and return the result ? Does it mark the old column as tombstone in the previous SST Table or wait for compaction to remove the old data ?

Thanks
Anishek

Reply | Threaded
Open this post in threaded view
|

Re: Uderstanding Read after update

Tyler Hobbs-2


SST Table level bloom filters have details as to what partition keys are in that table. So to clear up my understanding, if I insert and then have a update to the same row after some time (assuming both go to different SST Tables), then during read cassandra will read data from both SST Tables and merge them in order of time series with Data in Second SST table for the row taking precedence over the First SST Table and return the result ?

That's approximately correct.  The only part that's incorrect is how merging works.  One SSTable doesn't have precedence over another.  Instead, when the same cell exists in both sstables, the one with the higher write timestamp wins.
 
Does it mark the old column as tombstone in the previous SST Table or wait for compaction to remove the old data ?

It just waits for compaction to remove the old data, there's no tombstone.


when the data is in mem cache it also keep tracks of unique keys in that memtable so when it writes to disk it can use that to derive the right size of bloom filter for that SST Table ?

That's correct, it knows the number of keys before the bloom filter is created.

--
Tyler Hobbs
DataStax
Reply | Threaded
Open this post in threaded view
|

Re: Uderstanding Read after update

Anishek Agarwal
Thanks Tyler for the validations, 

I have a follow up question. 

" One SSTable doesn't have precedence over another.  Instead, when the same cell exists in both sstables, the one with the higher write timestamp wins."

if my table has 5(non partition key columns) and i update only 1 of them then the new SST table should have only that entry, which means if i query everything for that parition key,  cassandra has to have the timestamp matched per column for a partition key across SST tables to get me the data ?


On Fri, Apr 10, 2015 at 10:52 PM, Tyler Hobbs <[hidden email]> wrote:


SST Table level bloom filters have details as to what partition keys are in that table. So to clear up my understanding, if I insert and then have a update to the same row after some time (assuming both go to different SST Tables), then during read cassandra will read data from both SST Tables and merge them in order of time series with Data in Second SST table for the row taking precedence over the First SST Table and return the result ?

That's approximately correct.  The only part that's incorrect is how merging works.  One SSTable doesn't have precedence over another.  Instead, when the same cell exists in both sstables, the one with the higher write timestamp wins.
 
Does it mark the old column as tombstone in the previous SST Table or wait for compaction to remove the old data ?

It just waits for compaction to remove the old data, there's no tombstone.


when the data is in mem cache it also keep tracks of unique keys in that memtable so when it writes to disk it can use that to derive the right size of bloom filter for that SST Table ?

That's correct, it knows the number of keys before the bloom filter is created.

--
Tyler Hobbs
DataStax

Reply | Threaded
Open this post in threaded view
|

Re: Uderstanding Read after update

graham sanderson
Yes it will look in each sstable that according to the bloom filter may have data for that partition key and use time stamps to figure out the latest version (or none in case of newer tombstone) to return for each clustering key

Sent from my iPhone

On Apr 12, 2015, at 11:18 PM, Anishek Agarwal <[hidden email]> wrote:

Thanks Tyler for the validations, 

I have a follow up question. 

" One SSTable doesn't have precedence over another.  Instead, when the same cell exists in both sstables, the one with the higher write timestamp wins."

if my table has 5(non partition key columns) and i update only 1 of them then the new SST table should have only that entry, which means if i query everything for that parition key,  cassandra has to have the timestamp matched per column for a partition key across SST tables to get me the data ?


On Fri, Apr 10, 2015 at 10:52 PM, Tyler Hobbs <[hidden email]> wrote:


SST Table level bloom filters have details as to what partition keys are in that table. So to clear up my understanding, if I insert and then have a update to the same row after some time (assuming both go to different SST Tables), then during read cassandra will read data from both SST Tables and merge them in order of time series with Data in Second SST table for the row taking precedence over the First SST Table and return the result ?

That's approximately correct.  The only part that's incorrect is how merging works.  One SSTable doesn't have precedence over another.  Instead, when the same cell exists in both sstables, the one with the higher write timestamp wins.
 
Does it mark the old column as tombstone in the previous SST Table or wait for compaction to remove the old data ?

It just waits for compaction to remove the old data, there's no tombstone.


when the data is in mem cache it also keep tracks of unique keys in that memtable so when it writes to disk it can use that to derive the right size of bloom filter for that SST Table ?

That's correct, it knows the number of keys before the bloom filter is created.

--
Tyler Hobbs
DataStax