Quantcast

How does Cassandra store data physically?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How does Cassandra store data physically?

Ivan Chang
I am wondering how Cassandra stores its columns, super columns in the database files?
 
A supercolumn logically groups a set of related columns together, when the supercolumn is written to file, are the columns also stored in adjacent blocks to each other so IO cost is minimized for related data?  What about individual columns not associated with any supercolumn, but related only through a given key?
 
Thanks,
Ivan
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: How does Cassandra store data physically?

Stu Hood-2-3
There is no such thing as a column or supercolumn that is not contained in a ColumnFamily. The ColumnFamily is the structure that is stored together on disk.

A supercolumn is not what you think it is: supercolumns are like regular columns, except they contain other columns, and you can have an almost infinite number of supercolumns within a SuperColumnFamily.

A ColumnFamily is layed out on disk as a sequence of values which is sorted by key, then by (super)column name (or column timestamp), then subcolumn name/timestamp. Therefore, it is very fast to get contiguous keys from the ColumnFamily, but to get a single column name from multiple keys Cassandra still needs to seek to the next interesting column on disk.

There is no concept of 'blocks' in the Cassandra representation, because it does not use a B-Tree to store data. There is an index for each ColumnFamily on disk that allows Cassandra to seek directly to a key in the sorted file.

Please see http://wiki.apache.org/cassandra/DataModel

Thanks,
Stu

-----Original Message-----
From: "Ivan Chang" <[hidden email]>
Sent: Wednesday, July 1, 2009 3:00pm
To: [hidden email]
Subject: How does Cassandra store data physically?

I am wondering how Cassandra stores its columns, super columns in the
database files?

A supercolumn logically groups a set of related columns together, when the
supercolumn is written to file, are the columns also stored in adjacent
blocks to each other so IO cost is minimized for related data?  What about
individual columns not associated with any supercolumn, but related only
through a given key?

Thanks,
Ivan


Loading...