SSTable structure

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

SSTable structure

Pierre
Hi,

Does anyone know if there is a more complete and up to date documentation about the sstable files
structure (data, index, stats etc.) than this one : http://wiki.apache.org/cassandra/ArchitectureSSTable

I'm looking for a full specification, with schema of the structure if possible.

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: SSTable structure

Serj Veras
+1

On 03/30/2015 11:38 AM, Pierre wrote:

> Hi,
>
> Does anyone know if there is a more complete and up to date
> documentation about the sstable files structure (data, index, stats
> etc.) than this one :
> http://wiki.apache.org/cassandra/ArchitectureSSTable
>
> I'm looking for a full specification, with schema of the structure if
> possible.
>
> Thanks.

--
Thanks,
Serj

Reply | Threaded
Open this post in threaded view
|

Re: SSTable structure

Robert Coli-3
In reply to this post by Pierre
On Mon, Mar 30, 2015 at 1:38 AM, Pierre <[hidden email]> wrote:
Does anyone know if there is a more complete and up to date documentation about the sstable files structure (data, index, stats etc.) than this one : http://wiki.apache.org/cassandra/ArchitectureSSTable

No, there isn't. Unfortunately you will have to read the source.
 
I'm looking for a full specification, with schema of the structure if possible.

It would be nice if such fundamental things were documented, wouldn't it?

=Rob
 
Reply | Threaded
Open this post in threaded view
|

Re: SSTable structure

daemeon reiydelle

why? Then there are 2 places 2 maintain or get jira'ed for a discrepancy.

On Mar 30, 2015 4:46 PM, "Robert Coli" <[hidden email]> wrote:
On Mon, Mar 30, 2015 at 1:38 AM, Pierre <[hidden email]> wrote:
Does anyone know if there is a more complete and up to date documentation about the sstable files structure (data, index, stats etc.) than this one : http://wiki.apache.org/cassandra/ArchitectureSSTable

No, there isn't. Unfortunately you will have to read the source.
 
I'm looking for a full specification, with schema of the structure if possible.

It would be nice if such fundamental things were documented, wouldn't it?

=Rob
 
Reply | Threaded
Open this post in threaded view
|

Re: SSTable structure

Robert Coli-3
On Mon, Mar 30, 2015 at 5:07 PM, daemeon reiydelle <[hidden email]> wrote:

why? Then there are 2 places 2 maintain or get jira'ed for a discrepancy.

If you are asserting that code is capable of documenting itself, we will just have to agree to disagree.

=Rob
 
Reply | Threaded
Open this post in threaded view
|

Re: SSTable structure

Kirk True
The tricky thing with documenting the SS tables is that there are a lot of conditionals in the structure, so it makes for twisty reading. Just for fun, here's a terrible start I made once:
 
 
 
On Mon, Mar 30, 2015, at 05:12 PM, Robert Coli wrote:
On Mon, Mar 30, 2015 at 5:07 PM, daemeon reiydelle <[hidden email]> wrote:


why? Then there are 2 places 2 maintain or get jira'ed for a discrepancy.

If you are asserting that code is capable of documenting itself, we will just have to agree to disagree.
 
=Rob
 
 
 
Reply | Threaded
Open this post in threaded view
|

Re: SSTable structure

Jacob Rhoden
In reply to this post by daemeon reiydelle
Yes updating code and documentation can sometimes be annoying, you would only ever maintain both if it were important. It comes down or is having the format of the data files documented for everyone to understand an important thing? 

______________________________
Sent from iPhone

On 31 Mar 2015, at 11:07 am, daemeon reiydelle <[hidden email]> wrote:

why? Then there are 2 places 2 maintain or get jira'ed for a discrepancy.

On Mar 30, 2015 4:46 PM, "Robert Coli" <[hidden email]> wrote:
On Mon, Mar 30, 2015 at 1:38 AM, Pierre <[hidden email]> wrote:
Does anyone know if there is a more complete and up to date documentation about the sstable files structure (data, index, stats etc.) than this one : http://wiki.apache.org/cassandra/ArchitectureSSTable

No, there isn't. Unfortunately you will have to read the source.
 
I'm looking for a full specification, with schema of the structure if possible.

It would be nice if such fundamental things were documented, wouldn't it?

=Rob
 
Reply | Threaded
Open this post in threaded view
|

Re: SSTable structure

Bharatendra Boddu
Some time back I created a blog article about the SSTable storage format with some code references. 


- bharat

On Mon, Mar 30, 2015 at 5:24 PM, Jacob Rhoden <[hidden email]> wrote:
Yes updating code and documentation can sometimes be annoying, you would only ever maintain both if it were important. It comes down or is having the format of the data files documented for everyone to understand an important thing? 

______________________________
Sent from iPhone

On 31 Mar 2015, at 11:07 am, daemeon reiydelle <[hidden email]> wrote:

why? Then there are 2 places 2 maintain or get jira'ed for a discrepancy.

On Mar 30, 2015 4:46 PM, "Robert Coli" <[hidden email]> wrote:
On Mon, Mar 30, 2015 at 1:38 AM, Pierre <[hidden email]> wrote:
Does anyone know if there is a more complete and up to date documentation about the sstable files structure (data, index, stats etc.) than this one : http://wiki.apache.org/cassandra/ArchitectureSSTable

No, there isn't. Unfortunately you will have to read the source.
 
I'm looking for a full specification, with schema of the structure if possible.

It would be nice if such fundamental things were documented, wouldn't it?

=Rob
 

Reply | Threaded
Open this post in threaded view
|

Re: SSTable structure

Serega Sheypak
Hi bharat, 
you are talking about Cassandra 1.2.5 Does it fit Cassandra 2.1?
Were there any significant changes to SSTable format and layout?
Thank you, article is interesting.

It would be great to give general ideas. It could help to understand schema design problems. You start to understand better how Cassandra scans data how you can utilize its power.

2015-04-01 5:39 GMT+02:00 Bharatendra Boddu <[hidden email]>:
Some time back I created a blog article about the SSTable storage format with some code references. 


- bharat

On Mon, Mar 30, 2015 at 5:24 PM, Jacob Rhoden <[hidden email]> wrote:
Yes updating code and documentation can sometimes be annoying, you would only ever maintain both if it were important. It comes down or is having the format of the data files documented for everyone to understand an important thing? 

______________________________
Sent from iPhone

On 31 Mar 2015, at 11:07 am, daemeon reiydelle <[hidden email]> wrote:

why? Then there are 2 places 2 maintain or get jira'ed for a discrepancy.

On Mar 30, 2015 4:46 PM, "Robert Coli" <[hidden email]> wrote:
On Mon, Mar 30, 2015 at 1:38 AM, Pierre <[hidden email]> wrote:
Does anyone know if there is a more complete and up to date documentation about the sstable files structure (data, index, stats etc.) than this one : http://wiki.apache.org/cassandra/ArchitectureSSTable

No, there isn't. Unfortunately you will have to read the source.
 
I'm looking for a full specification, with schema of the structure if possible.

It would be nice if such fundamental things were documented, wouldn't it?

=Rob
 


Reply | Threaded
Open this post in threaded view
|

Re: SSTable structure

Bharatendra Boddu
Hi Serega,

Most of the content in the blog article is still relevant. After 1.2.5 (ic), there are only three new versions (ja, jb, ka) for SSTable format. Following are the changes in these versions.

        // ja (2.0.0): super columns are serialized as composites (note that there is no real format change,
        //               this is mostly a marker to know if we should expect super columns or not. We do need
        //               a major version bump however, because we should not allow streaming of super columns
        //               into this new format)
        //             tracks max local deletiontime in sstable metadata
        //             records bloom_filter_fp_chance in metadata component
        //             remove data size and column count from data file (CASSANDRA-4180)
        //             tracks max/min column values (according to comparator)
        // jb (2.0.1): switch from crc32 to adler32 for compression checksums
        //             checksum the compressed data
// ka (2.1.0): new Statistics.db file format // index summaries can be downsampled and the sampling level is persisted // switch uncompressed checksums to adler32 // tracks presense of legacy (local and remote) counter shards
- bharat

On Wed, Apr 1, 2015 at 12:02 AM, Serega Sheypak <[hidden email]> wrote:
Hi bharat, 
you are talking about Cassandra 1.2.5 Does it fit Cassandra 2.1?
Were there any significant changes to SSTable format and layout?
Thank you, article is interesting.

It would be great to give general ideas. It could help to understand schema design problems. You start to understand better how Cassandra scans data how you can utilize its power.

2015-04-01 5:39 GMT+02:00 Bharatendra Boddu <[hidden email]>:
Some time back I created a blog article about the SSTable storage format with some code references. 


- bharat

On Mon, Mar 30, 2015 at 5:24 PM, Jacob Rhoden <[hidden email]> wrote:
Yes updating code and documentation can sometimes be annoying, you would only ever maintain both if it were important. It comes down or is having the format of the data files documented for everyone to understand an important thing? 

______________________________
Sent from iPhone

On 31 Mar 2015, at 11:07 am, daemeon reiydelle <[hidden email]> wrote:

why? Then there are 2 places 2 maintain or get jira'ed for a discrepancy.

On Mar 30, 2015 4:46 PM, "Robert Coli" <[hidden email]> wrote:
On Mon, Mar 30, 2015 at 1:38 AM, Pierre <[hidden email]> wrote:
Does anyone know if there is a more complete and up to date documentation about the sstable files structure (data, index, stats etc.) than this one : http://wiki.apache.org/cassandra/ArchitectureSSTable

No, there isn't. Unfortunately you will have to read the source.
 
I'm looking for a full specification, with schema of the structure if possible.

It would be nice if such fundamental things were documented, wouldn't it?

=Rob
 



Reply | Threaded
Open this post in threaded view
|

Re: SSTable structure

Serega Sheypak
Thank you, great to know that.

2015-04-01 23:14 GMT+02:00 Bharatendra Boddu <[hidden email]>:
Hi Serega,

Most of the content in the blog article is still relevant. After 1.2.5 (ic), there are only three new versions (ja, jb, ka) for SSTable format. Following are the changes in these versions.

        // ja (2.0.0): super columns are serialized as composites (note that there is no real format change,
        //               this is mostly a marker to know if we should expect super columns or not. We do need
        //               a major version bump however, because we should not allow streaming of super columns
        //               into this new format)
        //             tracks max local deletiontime in sstable metadata
        //             records bloom_filter_fp_chance in metadata component
        //             remove data size and column count from data file (CASSANDRA-4180)
        //             tracks max/min column values (according to comparator)
        // jb (2.0.1): switch from crc32 to adler32 for compression checksums
        //             checksum the compressed data
// ka (2.1.0): new Statistics.db file format // index summaries can be downsampled and the sampling level is persisted // switch uncompressed checksums to adler32 // tracks presense of legacy (local and remote) counter shards
- bharat

On Wed, Apr 1, 2015 at 12:02 AM, Serega Sheypak <[hidden email]> wrote:
Hi bharat, 
you are talking about Cassandra 1.2.5 Does it fit Cassandra 2.1?
Were there any significant changes to SSTable format and layout?
Thank you, article is interesting.

It would be great to give general ideas. It could help to understand schema design problems. You start to understand better how Cassandra scans data how you can utilize its power.

2015-04-01 5:39 GMT+02:00 Bharatendra Boddu <[hidden email]>:
Some time back I created a blog article about the SSTable storage format with some code references. 


- bharat

On Mon, Mar 30, 2015 at 5:24 PM, Jacob Rhoden <[hidden email]> wrote:
Yes updating code and documentation can sometimes be annoying, you would only ever maintain both if it were important. It comes down or is having the format of the data files documented for everyone to understand an important thing? 

______________________________
Sent from iPhone

On 31 Mar 2015, at 11:07 am, daemeon reiydelle <[hidden email]> wrote:

why? Then there are 2 places 2 maintain or get jira'ed for a discrepancy.

On Mar 30, 2015 4:46 PM, "Robert Coli" <[hidden email]> wrote:
On Mon, Mar 30, 2015 at 1:38 AM, Pierre <[hidden email]> wrote:
Does anyone know if there is a more complete and up to date documentation about the sstable files structure (data, index, stats etc.) than this one : http://wiki.apache.org/cassandra/ArchitectureSSTable

No, there isn't. Unfortunately you will have to read the source.
 
I'm looking for a full specification, with schema of the structure if possible.

It would be nice if such fundamental things were documented, wouldn't it?

=Rob