Quantcast

Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

Anuj
Recently we faced an issue where every repair operation caused addition of hundreds of sstables (CASSANDRA-9146). In order to bring situation under control and make sure reads are not impacted, we were left with no option but to run major compaction to ensure that thousands of tiny sstables are compacted.

Queries:
Does major compaction has any drawback after automatic tombstone compaction got implemented in 1.2 via tombstone_threshold sub-property(CASSANDRA-3442)?
I understand that the huge SSTable created after major compaction wont be compacted with new data any time soon but is that a problem if purged data is removed via automatic tombstone compaction? If we major compaction results in a huge file say 500GB, what are the drawbacks of it?

If one big sstable is a problem, is there any way of solving the problem? We tried running sstablesplit after major compaction to split the big sstable but as new sstables were of same size they are again compacted into single huge table once Cassandra was started after executing sstablesplit.


Thanks
Anuj Wadehra
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

Sebastian Estevez

Have you tried user defined compactions via JMX?

On Apr 12, 2015 1:40 PM, "Anuj Wadehra" <[hidden email]> wrote:
Recently we faced an issue where every repair operation caused addition of hundreds of sstables (CASSANDRA-9146). In order to bring situation under control and make sure reads are not impacted, we were left with no option but to run major compaction to ensure that thousands of tiny sstables are compacted.

Queries:
Does major compaction has any drawback after automatic tombstone compaction got implemented in 1.2 via tombstone_threshold sub-property(CASSANDRA-3442)?
I understand that the huge SSTable created after major compaction wont be compacted with new data any time soon but is that a problem if purged data is removed via automatic tombstone compaction? If we major compaction results in a huge file say 500GB, what are the drawbacks of it?

If one big sstable is a problem, is there any way of solving the problem? We tried running sstablesplit after major compaction to split the big sstable but as new sstables were of same size they are again compacted into single huge table once Cassandra was started after executing sstablesplit.


Thanks
Anuj Wadehra
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

Anuj
No.

Anuj Wadehra



On Monday, 13 April 2015 12:23 AM, Sebastian Estevez <[hidden email]> wrote:


Have you tried user defined compactions via JMX?
On Apr 12, 2015 1:40 PM, "Anuj Wadehra" <[hidden email]> wrote:
Recently we faced an issue where every repair operation caused addition of hundreds of sstables (CASSANDRA-9146). In order to bring situation under control and make sure reads are not impacted, we were left with no option but to run major compaction to ensure that thousands of tiny sstables are compacted.

Queries:
Does major compaction has any drawback after automatic tombstone compaction got implemented in 1.2 via tombstone_threshold sub-property(CASSANDRA-3442)?
I understand that the huge SSTable created after major compaction wont be compacted with new data any time soon but is that a problem if purged data is removed via automatic tombstone compaction? If we major compaction results in a huge file say 500GB, what are the drawbacks of it?

If one big sstable is a problem, is there any way of solving the problem? We tried running sstablesplit after major compaction to split the big sstable but as new sstables were of same size they are again compacted into single huge table once Cassandra was started after executing sstablesplit.


Thanks
Anuj Wadehra


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

Anuj
Any comments on side effects of Major compaction especially when sstable generated is 100+ GB? 

After Cassandra 1.2 , automated tombstone compaction occurs even on a single sstable if tombstone percentage increases the tombstone_threshold sub property specified in compaction strategy. So, even if the huge sstable is not compacted with any new table, still tombstones will be collected. Any other disadvantage of having a giant sstable of hundreds of GB? I understand that sstables have a summary and index which helps finding correct data blocks directly from a large data file. Still are there any disadvantages?

Thanks
Anuj Wadehra

Sent from Yahoo Mail on Android


From:"Anuj Wadehra" <[hidden email]>
Date:Mon, 13 Apr, 2015 at 12:33 am
Subject:Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

No.

Anuj Wadehra



On Monday, 13 April 2015 12:23 AM, Sebastian Estevez <[hidden email]> wrote:


Have you tried user defined compactions via JMX?
On Apr 12, 2015 1:40 PM, "Anuj Wadehra" <<a rel="nofollow" shape="rect" ymailto="mailto:anujw_2003@yahoo.co.in" target="_blank" href="javascript:return">anujw_2003@...> wrote:
Recently we faced an issue where every repair operation caused addition of hundreds of sstables (CASSANDRA-9146). In order to bring situation under control and make sure reads are not impacted, we were left with no option but to run major compaction to ensure that thousands of tiny sstables are compacted.

Queries:
Does major compaction has any drawback after automatic tombstone compaction got implemented in 1.2 via tombstone_threshold sub-property(CASSANDRA-3442)?
I understand that the huge SSTable created after major compaction wont be compacted with new data any time soon but is that a problem if purged data is removed via automatic tombstone compaction? If we major compaction results in a huge file say 500GB, what are the drawbacks of it?

If one big sstable is a problem, is there any way of solving the problem? We tried running sstablesplit after major compaction to split the big sstable but as new sstables were of same size they are again compacted into single huge table once Cassandra was started after executing sstablesplit.


Thanks
Anuj Wadehra


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

Robert Coli-3
On Mon, Apr 13, 2015 at 10:52 AM, Anuj Wadehra <[hidden email]> wrote:
Any comments on side effects of Major compaction especially when sstable generated is 100+ GB? 

I have no idea how this interacts with the automatic compaction stuff; if you find out, let us know?

But if you want to do a major and don't want to deal with One Big SSTable afterwards, stop the node and then run sstable_split utility. 

=Rob

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

Rahul Neelakantan
Rob,
Does that mean once you split it back into small ones, automatic compaction a will continue to happen on a more frequent basis now that it's no longer a single large monolith?

Rahul

On Apr 13, 2015, at 3:23 PM, Robert Coli <[hidden email]> wrote:

On Mon, Apr 13, 2015 at 10:52 AM, Anuj Wadehra <[hidden email]> wrote:
Any comments on side effects of Major compaction especially when sstable generated is 100+ GB? 

I have no idea how this interacts with the automatic compaction stuff; if you find out, let us know?

But if you want to do a major and don't want to deal with One Big SSTable afterwards, stop the node and then run sstable_split utility. 

=Rob

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

Robert Coli-3
On Mon, Apr 13, 2015 at 12:26 PM, Rahul Neelakantan <[hidden email]> wrote:
Does that mean once you split it back into small ones, automatic compaction a will continue to happen on a more frequent basis now that it's no longer a single large monolith?

That's what the word "size tiered" means in the phrase "size tiered compaction," yes.

=Rob
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

Anuj
In reply to this post by Robert Coli-3
Hi Robert,

By automatic tombstone compaction, I am referring to tombstone_threshold sub property under compaction strategy in CQL. It is 0.2 by default. So what I understand from the Datastax documentation is that even if a sstable does not find sstables of similar size (STCS) , an automatic tombstone compaction will trigger on sstable when 20% data is tombstone. This compaction works on single sstable only.

How Major compaction is related to automatic tombstone compaction?
Earlier we used to say that Major compaction is not recommended because one huge sstable formed after Major compaction will not find any similar size sstables unless huge data is written in new sstables and thus tombstones will be there in the huge sstable unnnecessarily for long time. My understanding is that automatic tombstone compaction will allow tombstone collection on huge sstable formed after major compaction and that should no more be considered a drawback. Please confirm my understanding. Also I want to know are there any other side effects and inefficiences of say 100+ gb sstable?

Please refer to my fist email on the issue. We tried splitting sstable using sstablesplit but just because all small sstables are generated of same size, STCS compacted them back to single giant sstable as soon as we started Cassandra. Any other alternatives ?

JIRA for the Issue of numerous tiny sstables being generated after repair is still open and we want a confirmation that if we face such issue in prod we can go ahead with one time major compaction.

Thanks
Anuj Wadehra



From:"Robert Coli" <[hidden email]>
Date:Tue, 14 Apr, 2015 at 12:54 am
Subject:Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

On Mon, Apr 13, 2015 at 10:52 AM, Anuj Wadehra <<a rel="nofollow" shape="rect" ymailto="mailto:anujw_2003@yahoo.co.in" target="_blank" href="javascript:return">anujw_2003@...> wrote:
Any comments on side effects of Major compaction especially when sstable generated is 100+ GB? 

I have no idea how this interacts with the automatic compaction stuff; if you find out, let us know?

But if you want to do a major and don't want to deal with One Big SSTable afterwards, stop the node and then run sstable_split utility. 

=Rob

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

Anuj
Hi Robert,

Any comments or suggestions ?

Thanks
Anuj Wadehra

Sent from Yahoo Mail on Android


From:"Anuj Wadehra" <[hidden email]>
Date:Wed, 15 Apr, 2015 at 8:59 am
Subject:Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

Hi Robert,

By automatic tombstone compaction, I am referring to tombstone_threshold sub property under compaction strategy in CQL. It is 0.2 by default. So what I understand from the Datastax documentation is that even if a sstable does not find sstables of similar size (STCS) , an automatic tombstone compaction will trigger on sstable when 20% data is tombstone. This compaction works on single sstable only.

How Major compaction is related to automatic tombstone compaction?
Earlier we used to say that Major compaction is not recommended because one huge sstable formed after Major compaction will not find any similar size sstables unless huge data is written in new sstables and thus tombstones will be there in the huge sstable unnnecessarily for long time. My understanding is that automatic tombstone compaction will allow tombstone collection on huge sstable formed after major compaction and that should no more be considered a drawback. Please confirm my understanding. Also I want to know are there any other side effects and inefficiences of say 100+ gb sstable?

Please refer to my fist email on the issue. We tried splitting sstable using sstablesplit but just because all small sstables are generated of same size, STCS compacted them back to single giant sstable as soon as we started Cassandra. Any other alternatives ?

JIRA for the Issue of numerous tiny sstables being generated after repair is still open and we want a confirmation that if we face such issue in prod we can go ahead with one time major compaction.

Thanks
Anuj Wadehra



From:"Robert Coli" <[hidden email]>
Date:Tue, 14 Apr, 2015 at 12:54 am
Subject:Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

On Mon, Apr 13, 2015 at 10:52 AM, Anuj Wadehra <<a rel="nofollow" shape="rect" ymailto="mailto:anujw_2003@yahoo.co.in" target="_blank" href="javascript:return">anujw_2003@...> wrote:
Any comments on side effects of Major compaction especially when sstable generated is 100+ GB? 

I have no idea how this interacts with the automatic compaction stuff; if you find out, let us know?

But if you want to do a major and don't want to deal with One Big SSTable afterwards, stop the node and then run sstable_split utility. 

=Rob

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

Robert Coli-3
In reply to this post by Anuj
On Tue, Apr 14, 2015 at 8:29 PM, Anuj Wadehra <[hidden email]> wrote:
By automatic tombstone compaction, I am referring to tombstone_threshold sub property under compaction strategy in CQL. It is 0.2 by default. So what I understand from the Datastax documentation is that even if a sstable does not find sstables of similar size (STCS) , an automatic tombstone compaction will trigger on sstable when 20% data is tombstone. This compaction works on single sstable only.

Overall system behavior is discussed here :


They are talking about LCS, but the principles apply, but with an overlay of how STS behaves.

=Rob

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

Anuj
Thanks Robert!!

The JIRA was very helpful in understanding how tombstone threshold is implemented. And ticket also says that running major compaction weekly is an alternative. I actually want to understand if I run major compaction on a cf with 500gb of data and a single giant file is created. Do you see any problems with Cassandra processing such a huge file?  Is there any Max sstable size beyond which performance etc degrades? What are the implications?


Thanks
Anuj Wadehra

Sent from Yahoo Mail on Android


From:"Robert Coli" <[hidden email]>
Date:Fri, 17 Apr, 2015 at 10:55 pm
Subject:Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

On Tue, Apr 14, 2015 at 8:29 PM, Anuj Wadehra <<a rel="nofollow" shape="rect" ymailto="mailto:anujw_2003@yahoo.co.in" target="_blank" href="javascript:return">anujw_2003@...> wrote:
By automatic tombstone compaction, I am referring to tombstone_threshold sub property under compaction strategy in CQL. It is 0.2 by default. So what I understand from the Datastax documentation is that even if a sstable does not find sstables of similar size (STCS) , an automatic tombstone compaction will trigger on sstable when 20% data is tombstone. This compaction works on single sstable only.

Overall system behavior is discussed here :


They are talking about LCS, but the principles apply, but with an overlay of how STS behaves.

=Rob

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

Andrei Ivanov
Just in case it helps - we are running C* with sstable sizes of something like 2.5 TB and ~4TB/node. No evident problems except the time it takes to compact.

Andrei.

On Wed, Apr 22, 2015 at 5:36 PM, Anuj Wadehra <[hidden email]> wrote:
Thanks Robert!!

The JIRA was very helpful in understanding how tombstone threshold is implemented. And ticket also says that running major compaction weekly is an alternative. I actually want to understand if I run major compaction on a cf with 500gb of data and a single giant file is created. Do you see any problems with Cassandra processing such a huge file?  Is there any Max sstable size beyond which performance etc degrades? What are the implications?


Thanks
Anuj Wadehra

Sent from Yahoo Mail on Android


From:"Robert Coli" <[hidden email]>
Date:Fri, 17 Apr, 2015 at 10:55 pm
Subject:Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

On Tue, Apr 14, 2015 at 8:29 PM, Anuj Wadehra <[hidden email]> wrote:
By automatic tombstone compaction, I am referring to tombstone_threshold sub property under compaction strategy in CQL. It is 0.2 by default. So what I understand from the Datastax documentation is that even if a sstable does not find sstables of similar size (STCS) , an automatic tombstone compaction will trigger on sstable when 20% data is tombstone. This compaction works on single sstable only.

Overall system behavior is discussed here :


They are talking about LCS, but the principles apply, but with an overlay of how STS behaves.

=Rob


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

Anuj
Great !!! Thanks Andrei !!! Thats the answer I was looking for :)


Thanks
Anuj Wadehra

Sent from Yahoo Mail on Android


From:"Andrei Ivanov" <[hidden email]>
Date:Thu, 23 Apr, 2015 at 11:57 pm
Subject:Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

Just in case it helps - we are running C* with sstable sizes of something like 2.5 TB and ~4TB/node. No evident problems except the time it takes to compact.

Andrei.

On Wed, Apr 22, 2015 at 5:36 PM, Anuj Wadehra <<a rel="nofollow" shape="rect" ymailto="mailto:anujw_2003@yahoo.co.in" target="_blank" href="javascript:return">anujw_2003@...> wrote:
Thanks Robert!!

The JIRA was very helpful in understanding how tombstone threshold is implemented. And ticket also says that running major compaction weekly is an alternative. I actually want to understand if I run major compaction on a cf with 500gb of data and a single giant file is created. Do you see any problems with Cassandra processing such a huge file?  Is there any Max sstable size beyond which performance etc degrades? What are the implications?


Thanks
Anuj Wadehra

Sent from Yahoo Mail on Android


From:"Robert Coli" <<a rel="nofollow" shape="rect" ymailto="mailto:rcoli@eventbrite.com" target="_blank" href="javascript:return">rcoli@...>
Date:Fri, 17 Apr, 2015 at 10:55 pm
Subject:Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

On Tue, Apr 14, 2015 at 8:29 PM, Anuj Wadehra <[hidden email]> wrote:
By automatic tombstone compaction, I am referring to tombstone_threshold sub property under compaction strategy in CQL. It is 0.2 by default. So what I understand from the Datastax documentation is that even if a sstable does not find sstables of similar size (STCS) , an automatic tombstone compaction will trigger on sstable when 20% data is tombstone. This compaction works on single sstable only.

Overall system behavior is discussed here :


They are talking about LCS, but the principles apply, but with an overlay of how STS behaves.

=Rob


Loading...