Cassandra tombstones being created by updating rows with TTL's

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Cassandra tombstones being created by updating rows with TTL's

Walsh, Stephen

We were chatting to Jon Haddena about a week ago about our tombstone issue using Cassandra 2.0.14

To Summarize

 

We have a 3 node cluster with replication-factor=3 and compaction = SizeTiered

We use 1 keyspace with 1 table

Each row have about 40 columns

Each row has a TTL of 10 seconds

 

We insert about 500 rows per second in a prepared batch** (about 3mb in network overhead)

We query the entire table once per second

 

**This is too enable consistent data, E.G batch in transactional, so we get all queried data from one insert and not a mix of 2 or more.

 

 

Seems every second we insert, the rows are never deleted by the TTL, or so we thought.

After some time we got this message on the query side

 

 

#######################################

ERROR [ReadStage:91] 2015-04-21 12:27:03,902 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in keyspace.table; query aborted (see tombstone_failure_threshold)

ERROR [ReadStage:91] 2015-04-21 12:27:03,931 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:91,5,main]

java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

                at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

#######################################

 

 

So we know tombstones are infact being created.

Solution was to change the table schema and set gc_grace_seconds to run every 60 seconds.

This worked for 20 seconds, then we saw this

 

 

#######################################

Read 500 live and 30000 tombstoned cells in keyspace.table (see tombstone_warn_threshold). 10000 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647}

#######################################

 

So every 20 seconds (500 inserts x 20 seconds = 10,000 tombstones)

So now we have the gc_grace_seconds set to 10 seoncds.

But its feels very wrong to have it at a low number, especially if we move to a larger cluster. This just wont fly.

What are we doing wrong?

 

We shouldn’t increase the tombstone threshold as that is extremely dangerous.

 

 

Best Regards

Stephen Walsh

 

 

 

 

 

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cassandra tombstones being created by updating rows with TTL's

Laing, Michael
If you never delete except by ttl, and always write with the same ttl (or monotonically increasing), you can set gc_grace_seconds to 0.

That's what we do. There have been discussions on the list over the last few years re this topic.

ml

On Tue, Apr 21, 2015 at 11:14 AM, Walsh, Stephen <[hidden email]> wrote:

We were chatting to Jon Haddena about a week ago about our tombstone issue using Cassandra 2.0.14

To Summarize

 

We have a 3 node cluster with replication-factor=3 and compaction = SizeTiered

We use 1 keyspace with 1 table

Each row have about 40 columns

Each row has a TTL of 10 seconds

 

We insert about 500 rows per second in a prepared batch** (about 3mb in network overhead)

We query the entire table once per second

 

**This is too enable consistent data, E.G batch in transactional, so we get all queried data from one insert and not a mix of 2 or more.

 

 

Seems every second we insert, the rows are never deleted by the TTL, or so we thought.

After some time we got this message on the query side

 

 

#######################################

ERROR [ReadStage:91] 2015-04-21 12:27:03,902 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in keyspace.table; query aborted (see tombstone_failure_threshold)

ERROR [ReadStage:91] 2015-04-21 12:27:03,931 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:91,5,main]

java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

                at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

#######################################

 

 

So we know tombstones are infact being created.

Solution was to change the table schema and set gc_grace_seconds to run every 60 seconds.

This worked for 20 seconds, then we saw this

 

 

#######################################

Read 500 live and 30000 tombstoned cells in keyspace.table (see tombstone_warn_threshold). 10000 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=<a href="tel:2147483647" value="+12147483647" target="_blank">2147483647}

#######################################

 

So every 20 seconds (500 inserts x 20 seconds = 10,000 tombstones)

So now we have the gc_grace_seconds set to 10 seoncds.

But its feels very wrong to have it at a low number, especially if we move to a larger cluster. This just wont fly.

What are we doing wrong?

 

We shouldn’t increase the tombstone threshold as that is extremely dangerous.

 

 

Best Regards

Stephen Walsh

 

 

 

 

 

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Cassandra tombstones being created by updating rows with TTL's

Walsh, Stephen

Maybe thanks Michael,

I will give these setting a go,

How do you do you periodic node-tool repairs in the situation, for what I read we need to start doing this also.

 

https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair

 

 

From: Laing, Michael [mailto:[hidden email]]
Sent: 21 April 2015 16:26
To: [hidden email]
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

 

If you never delete except by ttl, and always write with the same ttl (or monotonically increasing), you can set gc_grace_seconds to 0.

 

That's what we do. There have been discussions on the list over the last few years re this topic.

 

ml

 

On Tue, Apr 21, 2015 at 11:14 AM, Walsh, Stephen <[hidden email]> wrote:

We were chatting to Jon Haddena about a week ago about our tombstone issue using Cassandra 2.0.14

To Summarize

 

We have a 3 node cluster with replication-factor=3 and compaction = SizeTiered

We use 1 keyspace with 1 table

Each row have about 40 columns

Each row has a TTL of 10 seconds

 

We insert about 500 rows per second in a prepared batch** (about 3mb in network overhead)

We query the entire table once per second

 

**This is too enable consistent data, E.G batch in transactional, so we get all queried data from one insert and not a mix of 2 or more.

 

 

Seems every second we insert, the rows are never deleted by the TTL, or so we thought.

After some time we got this message on the query side

 

 

#######################################

ERROR [ReadStage:91] 2015-04-21 12:27:03,902 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in keyspace.table; query aborted (see tombstone_failure_threshold)

ERROR [ReadStage:91] 2015-04-21 12:27:03,931 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:91,5,main]

java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

                at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

#######################################

 

 

So we know tombstones are infact being created.

Solution was to change the table schema and set gc_grace_seconds to run every 60 seconds.

This worked for 20 seconds, then we saw this

 

 

#######################################

Read 500 live and 30000 tombstoned cells in keyspace.table (see tombstone_warn_threshold). 10000 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=<a href="tel:2147483647" target="_blank">2147483647}

#######################################

 

So every 20 seconds (500 inserts x 20 seconds = 10,000 tombstones)

So now we have the gc_grace_seconds set to 10 seoncds.

But its feels very wrong to have it at a low number, especially if we move to a larger cluster. This just wont fly.

What are we doing wrong?

 

We shouldn’t increase the tombstone threshold as that is extremely dangerous.

 

 

Best Regards

Stephen Walsh

 

 

 

 

 

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cassandra tombstones being created by updating rows with TTL's

Laing, Michael
Discussions previously on the list show why this is not a problem in much more detail.

If something changes in your cluster: node down, new node, etc - you run repair for sure.

We also run periodic repairs prophylactically.

But if you never delete and always ttl by the same amount, you do not have to worry about zombie data being resurrected - the main reason for running repair within gc_grace_seconds.



On Tue, Apr 21, 2015 at 11:49 AM, Walsh, Stephen <[hidden email]> wrote:

Maybe thanks Michael,

I will give these setting a go,

How do you do you periodic node-tool repairs in the situation, for what I read we need to start doing this also.

 

https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair

 

 

From: Laing, Michael [mailto:[hidden email]]
Sent: 21 April 2015 16:26
To: [hidden email]
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

 

If you never delete except by ttl, and always write with the same ttl (or monotonically increasing), you can set gc_grace_seconds to 0.

 

That's what we do. There have been discussions on the list over the last few years re this topic.

 

ml

 

On Tue, Apr 21, 2015 at 11:14 AM, Walsh, Stephen <[hidden email]> wrote:

We were chatting to Jon Haddena about a week ago about our tombstone issue using Cassandra 2.0.14

To Summarize

 

We have a 3 node cluster with replication-factor=3 and compaction = SizeTiered

We use 1 keyspace with 1 table

Each row have about 40 columns

Each row has a TTL of 10 seconds

 

We insert about 500 rows per second in a prepared batch** (about 3mb in network overhead)

We query the entire table once per second

 

**This is too enable consistent data, E.G batch in transactional, so we get all queried data from one insert and not a mix of 2 or more.

 

 

Seems every second we insert, the rows are never deleted by the TTL, or so we thought.

After some time we got this message on the query side

 

 

#######################################

ERROR [ReadStage:91] 2015-04-21 12:27:03,902 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in keyspace.table; query aborted (see tombstone_failure_threshold)

ERROR [ReadStage:91] 2015-04-21 12:27:03,931 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:91,5,main]

java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

                at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

#######################################

 

 

So we know tombstones are infact being created.

Solution was to change the table schema and set gc_grace_seconds to run every 60 seconds.

This worked for 20 seconds, then we saw this

 

 

#######################################

Read 500 live and 30000 tombstoned cells in keyspace.table (see tombstone_warn_threshold). 10000 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=<a href="tel:2147483647" target="_blank">2147483647}

#######################################

 

So every 20 seconds (500 inserts x 20 seconds = 10,000 tombstones)

So now we have the gc_grace_seconds set to 10 seoncds.

But its feels very wrong to have it at a low number, especially if we move to a larger cluster. This just wont fly.

What are we doing wrong?

 

We shouldn’t increase the tombstone threshold as that is extremely dangerous.

 

 

Best Regards

Stephen Walsh

 

 

 

 

 

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Cassandra tombstones being created by updating rows with TTL's

Walsh, Stephen

Thanks for all your help Michael,

 

Our data will change through the day, so data with a TTL will eventually get dropped, and new data will appear.

I’d imagine the entire table maybe expire and start over 7-10 times a day.

 

 

 

But on the GC topic, now java Driver now gives this error on the query

I also get “Request did not complete within rpc_timeout.” In cqlsh.

 

#################################

com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)

        at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.Responses$Error.asException(Responses.java:100) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:140) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:249) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:433) ~[cassandra-driver-core-2.1.4.jar:na]

Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)

        at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:61) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:38) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:168) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[cassandra-driver-core-2.1.4.jar:na]

#################################

 

 

These queries where taking about 1 second to run when the gc was at 10 seconds (same duration as the TTL).

 

Also seeing a lot of this this stuff in the log file

 

#################################

ERROR [ReadStage:71] 2015-04-21 17:11:07,597 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:71,5,main]

java.lang.RuntimeException: java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db (No such file or directory)

                at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db

################################

 

 

Maybe this is a 1 step back 2 steps forward approach?

Any ideas?

 

 

 

 

From: Laing, Michael [mailto:[hidden email]]
Sent: 21 April 2015 17:09
To: [hidden email]
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

 

Discussions previously on the list show why this is not a problem in much more detail.

 

If something changes in your cluster: node down, new node, etc - you run repair for sure.

 

We also run periodic repairs prophylactically.

 

But if you never delete and always ttl by the same amount, you do not have to worry about zombie data being resurrected - the main reason for running repair within gc_grace_seconds.

 

 

 

On Tue, Apr 21, 2015 at 11:49 AM, Walsh, Stephen <[hidden email]> wrote:

Maybe thanks Michael,

I will give these setting a go,

How do you do you periodic node-tool repairs in the situation, for what I read we need to start doing this also.

 

https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair

 

 

From: Laing, Michael [mailto:[hidden email]]
Sent: 21 April 2015 16:26
To:
[hidden email]
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

 

If you never delete except by ttl, and always write with the same ttl (or monotonically increasing), you can set gc_grace_seconds to 0.

 

That's what we do. There have been discussions on the list over the last few years re this topic.

 

ml

 

On Tue, Apr 21, 2015 at 11:14 AM, Walsh, Stephen <[hidden email]> wrote:

We were chatting to Jon Haddena about a week ago about our tombstone issue using Cassandra 2.0.14

To Summarize

 

We have a 3 node cluster with replication-factor=3 and compaction = SizeTiered

We use 1 keyspace with 1 table

Each row have about 40 columns

Each row has a TTL of 10 seconds

 

We insert about 500 rows per second in a prepared batch** (about 3mb in network overhead)

We query the entire table once per second

 

**This is too enable consistent data, E.G batch in transactional, so we get all queried data from one insert and not a mix of 2 or more.

 

 

Seems every second we insert, the rows are never deleted by the TTL, or so we thought.

After some time we got this message on the query side

 

 

#######################################

ERROR [ReadStage:91] 2015-04-21 12:27:03,902 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in keyspace.table; query aborted (see tombstone_failure_threshold)

ERROR [ReadStage:91] 2015-04-21 12:27:03,931 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:91,5,main]

java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

                at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

#######################################

 

 

So we know tombstones are infact being created.

Solution was to change the table schema and set gc_grace_seconds to run every 60 seconds.

This worked for 20 seconds, then we saw this

 

 

#######################################

Read 500 live and 30000 tombstoned cells in keyspace.table (see tombstone_warn_threshold). 10000 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=<a href="tel:2147483647" target="_blank">2147483647}

#######################################

 

So every 20 seconds (500 inserts x 20 seconds = 10,000 tombstones)

So now we have the gc_grace_seconds set to 10 seoncds.

But its feels very wrong to have it at a low number, especially if we move to a larger cluster. This just wont fly.

What are we doing wrong?

 

We shouldn’t increase the tombstone threshold as that is extremely dangerous.

 

 

Best Regards

Stephen Walsh

 

 

 

 

 

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cassandra tombstones being created by updating rows with TTL's

Laing, Michael
Hmm - we read/write with Local Quorum always - I'd recommend that as that is your 'consistency' defense.

We use python, so I am not familiar with the java driver - but 'file not found' indicates something is inconsistent. 

On Tue, Apr 21, 2015 at 12:22 PM, Walsh, Stephen <[hidden email]> wrote:

Thanks for all your help Michael,

 

Our data will change through the day, so data with a TTL will eventually get dropped, and new data will appear.

I’d imagine the entire table maybe expire and start over 7-10 times a day.

 

 

 

But on the GC topic, now java Driver now gives this error on the query

I also get “Request did not complete within rpc_timeout.” In cqlsh.

 

#################################

com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)

        at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.Responses$Error.asException(Responses.java:100) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:140) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:249) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:433) ~[cassandra-driver-core-2.1.4.jar:na]

Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)

        at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:61) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:38) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:168) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[cassandra-driver-core-2.1.4.jar:na]

#################################

 

 

These queries where taking about 1 second to run when the gc was at 10 seconds (same duration as the TTL).

 

Also seeing a lot of this this stuff in the log file

 

#################################

ERROR [ReadStage:71] 2015-04-21 17:11:07,597 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:71,5,main]

java.lang.RuntimeException: java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db (No such file or directory)

                at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db

################################

 

 

Maybe this is a 1 step back 2 steps forward approach?

Any ideas?

 

 

 

 

From: Laing, Michael [mailto:[hidden email]]
Sent: 21 April 2015 17:09


To: [hidden email]
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

 

Discussions previously on the list show why this is not a problem in much more detail.

 

If something changes in your cluster: node down, new node, etc - you run repair for sure.

 

We also run periodic repairs prophylactically.

 

But if you never delete and always ttl by the same amount, you do not have to worry about zombie data being resurrected - the main reason for running repair within gc_grace_seconds.

 

 

 

On Tue, Apr 21, 2015 at 11:49 AM, Walsh, Stephen <[hidden email]> wrote:

Maybe thanks Michael,

I will give these setting a go,

How do you do you periodic node-tool repairs in the situation, for what I read we need to start doing this also.

 

https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair

 

 

From: Laing, Michael [mailto:[hidden email]]
Sent: 21 April 2015 16:26
To:
[hidden email]
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

 

If you never delete except by ttl, and always write with the same ttl (or monotonically increasing), you can set gc_grace_seconds to 0.

 

That's what we do. There have been discussions on the list over the last few years re this topic.

 

ml

 

On Tue, Apr 21, 2015 at 11:14 AM, Walsh, Stephen <[hidden email]> wrote:

We were chatting to Jon Haddena about a week ago about our tombstone issue using Cassandra 2.0.14

To Summarize

 

We have a 3 node cluster with replication-factor=3 and compaction = SizeTiered

We use 1 keyspace with 1 table

Each row have about 40 columns

Each row has a TTL of 10 seconds

 

We insert about 500 rows per second in a prepared batch** (about 3mb in network overhead)

We query the entire table once per second

 

**This is too enable consistent data, E.G batch in transactional, so we get all queried data from one insert and not a mix of 2 or more.

 

 

Seems every second we insert, the rows are never deleted by the TTL, or so we thought.

After some time we got this message on the query side

 

 

#######################################

ERROR [ReadStage:91] 2015-04-21 12:27:03,902 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in keyspace.table; query aborted (see tombstone_failure_threshold)

ERROR [ReadStage:91] 2015-04-21 12:27:03,931 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:91,5,main]

java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

                at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

#######################################

 

 

So we know tombstones are infact being created.

Solution was to change the table schema and set gc_grace_seconds to run every 60 seconds.

This worked for 20 seconds, then we saw this

 

 

#######################################

Read 500 live and 30000 tombstoned cells in keyspace.table (see tombstone_warn_threshold). 10000 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=<a href="tel:2147483647" target="_blank">2147483647}

#######################################

 

So every 20 seconds (500 inserts x 20 seconds = 10,000 tombstones)

So now we have the gc_grace_seconds set to 10 seoncds.

But its feels very wrong to have it at a low number, especially if we move to a larger cluster. This just wont fly.

What are we doing wrong?

 

We shouldn’t increase the tombstone threshold as that is extremely dangerous.

 

 

Best Regards

Stephen Walsh

 

 

 

 

 

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cassandra tombstones being created by updating rows with TTL's

Anuj
Whats ur sstable count for the CF? I hope compactions are working fine. Also check the full stacktrace of FileNotFoundException ..if its related to compaction....you can try cleaning compactions_in_progress folder in system folder in data directory..there are JIRA issues relating to that.

Thanks
Anuj Wadehra

Sent from Yahoo Mail on Android


From:"Laing, Michael" <[hidden email]>
Date:Tue, 21 Apr, 2015 at 10:21 pm
Subject:Re: Cassandra tombstones being created by updating rows with TTL's

Hmm - we read/write with Local Quorum always - I'd recommend that as that is your 'consistency' defense.

We use python, so I am not familiar with the java driver - but 'file not found' indicates something is inconsistent. 

On Tue, Apr 21, 2015 at 12:22 PM, Walsh, Stephen <<a rel="nofollow" shape="rect" ymailto="mailto:Stephen.Walsh@aspect.com" target="_blank" href="javascript:return">Stephen.Walsh@...> wrote:

Thanks for all your help Michael,

 

Our data will change through the day, so data with a TTL will eventually get dropped, and new data will appear.

I’d imagine the entire table maybe expire and start over 7-10 times a day.

 

 

 

But on the GC topic, now java Driver now gives this error on the query

I also get “Request did not complete within rpc_timeout.” In cqlsh.

 

#################################

com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)

        at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.Responses$Error.asException(Responses.java:100) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:140) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:249) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:433) ~[cassandra-driver-core-2.1.4.jar:na]

Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)

        at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:61) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:38) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:168) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[cassandra-driver-core-2.1.4.jar:na]

#################################

 

 

These queries where taking about 1 second to run when the gc was at 10 seconds (same duration as the TTL).

 

Also seeing a lot of this this stuff in the log file

 

#################################

ERROR [ReadStage:71] 2015-04-21 17:11:07,597 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:71,5,main]

java.lang.RuntimeException: java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db (No such file or directory)

                at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db

################################

 

 

Maybe this is a 1 step back 2 steps forward approach?

Any ideas?

 

 

 

 

From: Laing, Michael [mailto:<a rel="nofollow" shape="rect" ymailto="mailto:michael.laing@nytimes.com" target="_blank" href="javascript:return">michael.laing@...]
Sent: 21 April 2015 17:09


To: <a rel="nofollow" shape="rect" ymailto="mailto:user@cassandra.apache.org" target="_blank" href="javascript:return">user@...
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

 

Discussions previously on the list show why this is not a problem in much more detail.

 

If something changes in your cluster: node down, new node, etc - you run repair for sure.

 

We also run periodic repairs prophylactically.

 

But if you never delete and always ttl by the same amount, you do not have to worry about zombie data being resurrected - the main reason for running repair within gc_grace_seconds.

 

 

 

On Tue, Apr 21, 2015 at 11:49 AM, Walsh, Stephen <<a rel="nofollow" shape="rect" ymailto="mailto:Stephen.Walsh@aspect.com" target="_blank" href="javascript:return">Stephen.Walsh@...> wrote:

Maybe thanks Michael,

I will give these setting a go,

How do you do you periodic node-tool repairs in the situation, for what I read we need to start doing this also.

 

https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair

 

 

From: Laing, Michael [mailto:<a rel="nofollow" shape="rect" ymailto="mailto:michael.laing@nytimes.com" target="_blank" href="javascript:return">michael.laing@...]
Sent: 21 April 2015 16:26
To:
<a rel="nofollow" shape="rect" ymailto="mailto:user@cassandra.apache.org" target="_blank" href="javascript:return">user@...
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

 

If you never delete except by ttl, and always write with the same ttl (or monotonically increasing), you can set gc_grace_seconds to 0.

 

That's what we do. There have been discussions on the list over the last few years re this topic.

 

ml

 

On Tue, Apr 21, 2015 at 11:14 AM, Walsh, Stephen <<a rel="nofollow" shape="rect" ymailto="mailto:Stephen.Walsh@aspect.com" target="_blank" href="javascript:return">Stephen.Walsh@...> wrote:

We were chatting to Jon Haddena about a week ago about our tombstone issue using Cassandra 2.0.14

To Summarize

 

We have a 3 node cluster with replication-factor=3 and compaction = SizeTiered

We use 1 keyspace with 1 table

Each row have about 40 columns

Each row has a TTL of 10 seconds

 

We insert about 500 rows per second in a prepared batch** (about 3mb in network overhead)

We query the entire table once per second

 

**This is too enable consistent data, E.G batch in transactional, so we get all queried data from one insert and not a mix of 2 or more.

 

 

Seems every second we insert, the rows are never deleted by the TTL, or so we thought.

After some time we got this message on the query side

 

 

#######################################

ERROR [ReadStage:91] 2015-04-21 12:27:03,902 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in keyspace.table; query aborted (see tombstone_failure_threshold)

ERROR [ReadStage:91] 2015-04-21 12:27:03,931 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:91,5,main]

java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

                at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

#######################################

 

 

So we know tombstones are infact being created.

Solution was to change the table schema and set gc_grace_seconds to run every 60 seconds.

This worked for 20 seconds, then we saw this

 

 

#######################################

Read 500 live and 30000 tombstoned cells in keyspace.table (see tombstone_warn_threshold). 10000 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647}

#######################################

 

So every 20 seconds (500 inserts x 20 seconds = 10,000 tombstones)

So now we have the gc_grace_seconds set to 10 seoncds.

But its feels very wrong to have it at a low number, especially if we move to a larger cluster. This just wont fly.

What are we doing wrong?

 

We shouldn’t increase the tombstone threshold as that is extremely dangerous.

 

 

Best Regards

Stephen Walsh

 

 

 

 

 

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Cassandra tombstones being created by updating rows with TTL's

Walsh, Stephen

Hey Anuj,

 

I think this might be related to me quickly dropping the tables and re-creating then to add in the gc_grace_seconds to 0, instead of doing a ALTER TABLE command.

This might have caused the FileNotFound Issue.

 

I might just drop the keyspace do a nodetool clean up on each node, then re-add and see what happens.

I no longer have the log file , but I think it was related to the compaction.

 

However nodetool cfstats also give this on the one of the tables….. (only table I’m writing too)

 

Table : table_name

                SSTable count: 35

Exception in thread "main" java.lang.AssertionError

        at org.apache.cassandra.io.compress.CompressionParameters.setLiveMetadata(CompressionParameters.java:111)

        at org.apache.cassandra.io.sstable.SSTableReader.getCompressionMetadata(SSTableReader.java:634)

        at org.apache.cassandra.io.sstable.SSTableReader.getCompressionMetadataOffHeapSize(SSTableReader.java:648)

        at org.apache.cassandra.metrics.ColumnFamilyMetrics$26.value(ColumnFamilyMetrics.java:464)

        at org.apache.cassandra.metrics.ColumnFamilyMetrics$26.value(ColumnFamilyMetrics.java:459)

        at org.apache.cassandra.db.ColumnFamilyStore.getCompressionMetadataOffHeapMemoryUsed(ColumnFamilyStore.java:2226)

        at sun.reflect.GeneratedMethodAccessor90.invoke(Unknown Source)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:483)

        at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)

        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:483)

        at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)

        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)

        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)

        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)

        at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)

        at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)

        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)

        at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)

        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1443)

        at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)

        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1307)

        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1399)

        at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:637)

        at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:483)

        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)

        at sun.rmi.transport.Transport$1.run(Transport.java:200)

        at sun.rmi.transport.Transport$1.run(Transport.java:197)

        at java.security.AccessController.doPrivileged(Native Method)

        at sun.rmi.transport.Transport.serviceCall(Transport.java:196)

        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)

        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)

        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$87(TCPTransport.java:683)

        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$2/1019453949.run(Unknown Source)

        at java.security.AccessController.doPrivileged(Native Method)

        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

 

 

 

From: Anuj Wadehra [mailto:[hidden email]]
Sent: 21 April 2015 19:04
To: [hidden email]
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

 

Whats ur sstable count for the CF? I hope compactions are working fine. Also check the full stacktrace of FileNotFoundException ..if its related to compaction....you can try cleaning compactions_in_progress folder in system folder in data directory..there are JIRA issues relating to that.

 

Thanks

Anuj Wadehra

 

Sent from Yahoo Mail on Android


From:"Laing, Michael" <[hidden email]>
Date:Tue, 21 Apr, 2015 at 10:21 pm
Subject:Re: Cassandra tombstones being created by updating rows with TTL's

Hmm - we read/write with Local Quorum always - I'd recommend that as that is your 'consistency' defense.

 

We use python, so I am not familiar with the java driver - but 'file not found' indicates something is inconsistent. 

 

On Tue, Apr 21, 2015 at 12:22 PM, Walsh, Stephen <<a href="javascript:return" target="_blank">Stephen.Walsh@...> wrote:

Thanks for all your help Michael,

 

Our data will change through the day, so data with a TTL will eventually get dropped, and new data will appear.

I’d imagine the entire table maybe expire and start over 7-10 times a day.

 

 

 

But on the GC topic, now java Driver now gives this error on the query

I also get “Request did not complete within rpc_timeout.” In cqlsh.

 

#################################

com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)

        at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.Responses$Error.asException(Responses.java:100) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:140) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:249) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:433) ~[cassandra-driver-core-2.1.4.jar:na]

Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)

        at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:61) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:38) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:168) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[cassandra-driver-core-2.1.4.jar:na]

#################################

 

 

These queries where taking about 1 second to run when the gc was at 10 seconds (same duration as the TTL).

 

Also seeing a lot of this this stuff in the log file

 

#################################

ERROR [ReadStage:71] 2015-04-21 17:11:07,597 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:71,5,main]

java.lang.RuntimeException: java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db (No such file or directory)

                at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db

################################

 

 

Maybe this is a 1 step back 2 steps forward approach?

Any ideas?

 

 

 

 

From: Laing, Michael [mailto:<a href="javascript:return" target="_blank">michael.laing@...]
Sent: 21 April 2015 17:09


To: <a href="javascript:return" target="_blank">user@...
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

 

Discussions previously on the list show why this is not a problem in much more detail.

 

If something changes in your cluster: node down, new node, etc - you run repair for sure.

 

We also run periodic repairs prophylactically.

 

But if you never delete and always ttl by the same amount, you do not have to worry about zombie data being resurrected - the main reason for running repair within gc_grace_seconds.

 

 

 

On Tue, Apr 21, 2015 at 11:49 AM, Walsh, Stephen <<a href="javascript:return" target="_blank">Stephen.Walsh@...> wrote:

Maybe thanks Michael,

I will give these setting a go,

How do you do you periodic node-tool repairs in the situation, for what I read we need to start doing this also.

 

https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair

 

 

From: Laing, Michael [mailto:<a href="javascript:return" target="_blank">michael.laing@...]
Sent: 21 April 2015 16:26
To:
<a href="javascript:return" target="_blank">user@...
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

 

If you never delete except by ttl, and always write with the same ttl (or monotonically increasing), you can set gc_grace_seconds to 0.

 

That's what we do. There have been discussions on the list over the last few years re this topic.

 

ml

 

On Tue, Apr 21, 2015 at 11:14 AM, Walsh, Stephen <<a href="javascript:return" target="_blank">Stephen.Walsh@...> wrote:

We were chatting to Jon Haddena about a week ago about our tombstone issue using Cassandra 2.0.14

To Summarize

 

We have a 3 node cluster with replication-factor=3 and compaction = SizeTiered

We use 1 keyspace with 1 table

Each row have about 40 columns

Each row has a TTL of 10 seconds

 

We insert about 500 rows per second in a prepared batch** (about 3mb in network overhead)

We query the entire table once per second

 

**This is too enable consistent data, E.G batch in transactional, so we get all queried data from one insert and not a mix of 2 or more.

 

 

Seems every second we insert, the rows are never deleted by the TTL, or so we thought.

After some time we got this message on the query side

 

 

#######################################

ERROR [ReadStage:91] 2015-04-21 12:27:03,902 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in keyspace.table; query aborted (see tombstone_failure_threshold)

ERROR [ReadStage:91] 2015-04-21 12:27:03,931 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:91,5,main]

java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

                at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

#######################################

 

 

So we know tombstones are infact being created.

Solution was to change the table schema and set gc_grace_seconds to run every 60 seconds.

This worked for 20 seconds, then we saw this

 

 

#######################################

Read 500 live and 30000 tombstoned cells in keyspace.table (see tombstone_warn_threshold). 10000 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647}

#######################################

 

So every 20 seconds (500 inserts x 20 seconds = 10,000 tombstones)

So now we have the gc_grace_seconds set to 10 seoncds.

But its feels very wrong to have it at a low number, especially if we move to a larger cluster. This just wont fly.

What are we doing wrong?

 

We shouldn’t increase the tombstone threshold as that is extremely dangerous.

 

 

Best Regards

Stephen Walsh

 

 

 

 

 

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

 

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RE: Cassandra tombstones being created by updating rows with TTL's

Anuj
Hi Stephen,

Dropping cf or keyspace and recreating it looks undesirable here.

What I understood is that your rows survive 10sec but if u set gc_grace_seconds to 10 u find lot of tombstones in query.Please correct me if needed.

I think that problem is that auto compaction is not getting triggered so that tombstones get removed. I would suggest you to keep gc grace sec as 10 and make sure that you configure Cql compaction subproperties 
tombstone_compaction_interval to a lower value and unchecked_tombstone_compaction to true( available after 2.0.9).

You can also go for triggerring major compactions frequently.

Tombstone threshold in yaml may be increased keeping in mind read latency needs.

Thanks
Anuj Wadehra

Sent from Yahoo Mail on Android


From:"Walsh, Stephen" <[hidden email]>
Date:Wed, 22 Apr, 2015 at 7:56 pm
Subject:RE: Cassandra tombstones being created by updating rows with TTL's

Hey Anuj,

 

I think this might be related to me quickly dropping the tables and re-creating then to add in the gc_grace_seconds to 0, instead of doing a ALTER TABLE command.

This might have caused the FileNotFound Issue.

 

I might just drop the keyspace do a nodetool clean up on each node, then re-add and see what happens.

I no longer have the log file , but I think it was related to the compaction.

 

However nodetool cfstats also give this on the one of the tables….. (only table I’m writing too)

 

Table : table_name

                SSTable count: 35

Exception in thread "main" java.lang.AssertionError

        at org.apache.cassandra.io.compress.CompressionParameters.setLiveMetadata(CompressionParameters.java:111)

        at org.apache.cassandra.io.sstable.SSTableReader.getCompressionMetadata(SSTableReader.java:634)

        at org.apache.cassandra.io.sstable.SSTableReader.getCompressionMetadataOffHeapSize(SSTableReader.java:648)

        at org.apache.cassandra.metrics.ColumnFamilyMetrics$26.value(ColumnFamilyMetrics.java:464)

        at org.apache.cassandra.metrics.ColumnFamilyMetrics$26.value(ColumnFamilyMetrics.java:459)

        at org.apache.cassandra.db.ColumnFamilyStore.getCompressionMetadataOffHeapMemoryUsed(ColumnFamilyStore.java:2226)

        at sun.reflect.GeneratedMethodAccessor90.invoke(Unknown Source)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:483)

        at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)

        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:483)

        at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)

        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)

        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)

        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)

        at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)

        at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)

        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)

        at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)

        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1443)

        at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)

        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1307)

        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1399)

        at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:637)

        at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:483)

        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)

        at sun.rmi.transport.Transport$1.run(Transport.java:200)

        at sun.rmi.transport.Transport$1.run(Transport.java:197)

        at java.security.AccessController.doPrivileged(Native Method)

        at sun.rmi.transport.Transport.serviceCall(Transport.java:196)

        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)

        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)

        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$87(TCPTransport.java:683)

        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$2/1019453949.run(Unknown Source)

        at java.security.AccessController.doPrivileged(Native Method)

        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

 

 

 

From: Anuj Wadehra [mailto:[hidden email]]
Sent: 21 April 2015 19:04
To: [hidden email]
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

 

Whats ur sstable count for the CF? I hope compactions are working fine. Also check the full stacktrace of FileNotFoundException ..if its related to compaction....you can try cleaning compactions_in_progress folder in system folder in data directory..there are JIRA issues relating to that.

 

Thanks

Anuj Wadehra

 

Sent from Yahoo Mail on Android


From:"Laing, Michael" <<a rel="nofollow" shape="rect" ymailto="mailto:michael.laing@nytimes.com" target="_blank" href="javascript:return">michael.laing@...>
Date:Tue, 21 Apr, 2015 at 10:21 pm
Subject:Re: Cassandra tombstones being created by updating rows with TTL's

Hmm - we read/write with Local Quorum always - I'd recommend that as that is your 'consistency' defense.

 

We use python, so I am not familiar with the java driver - but 'file not found' indicates something is inconsistent. 

 

On Tue, Apr 21, 2015 at 12:22 PM, Walsh, Stephen <[hidden email]> wrote:

Thanks for all your help Michael,

 

Our data will change through the day, so data with a TTL will eventually get dropped, and new data will appear.

I’d imagine the entire table maybe expire and start over 7-10 times a day.

 

 

 

But on the GC topic, now java Driver now gives this error on the query

I also get “Request did not complete within rpc_timeout.” In cqlsh.

 

#################################

com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)

        at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.Responses$Error.asException(Responses.java:100) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:140) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:249) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:433) ~[cassandra-driver-core-2.1.4.jar:na]

Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)

        at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:61) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:38) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:168) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[cassandra-driver-core-2.1.4.jar:na]

#################################

 

 

These queries where taking about 1 second to run when the gc was at 10 seconds (same duration as the TTL).

 

Also seeing a lot of this this stuff in the log file

 

#################################

ERROR [ReadStage:71] 2015-04-21 17:11:07,597 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:71,5,main]

java.lang.RuntimeException: java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db (No such file or directory)

                at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db

################################

 

 

Maybe this is a 1 step back 2 steps forward approach?

Any ideas?

 

 

 

 

From: Laing, Michael [mailto:[hidden email]]
Sent: 21 April 2015 17:09


To: [hidden email]
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

 

Discussions previously on the list show why this is not a problem in much more detail.

 

If something changes in your cluster: node down, new node, etc - you run repair for sure.

 

We also run periodic repairs prophylactically.

 

But if you never delete and always ttl by the same amount, you do not have to worry about zombie data being resurrected - the main reason for running repair within gc_grace_seconds.

 

 

 

On Tue, Apr 21, 2015 at 11:49 AM, Walsh, Stephen <[hidden email]> wrote:

Maybe thanks Michael,

I will give these setting a go,

How do you do you periodic node-tool repairs in the situation, for what I read we need to start doing this also.

 

https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair

 

 

From: Laing, Michael [mailto:[hidden email]]
Sent: 21 April 2015 16:26
To:
[hidden email]
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

 

If you never delete except by ttl, and always write with the same ttl (or monotonically increasing), you can set gc_grace_seconds to 0.

 

That's what we do. There have been discussions on the list over the last few years re this topic.

 

ml

 

On Tue, Apr 21, 2015 at 11:14 AM, Walsh, Stephen <[hidden email]> wrote:

We were chatting to Jon Haddena about a week ago about our tombstone issue using Cassandra 2.0.14

To Summarize

 

We have a 3 node cluster with replication-factor=3 and compaction = SizeTiered

We use 1 keyspace with 1 table

Each row have about 40 columns

Each row has a TTL of 10 seconds

 

We insert about 500 rows per second in a prepared batch** (about 3mb in network overhead)

We query the entire table once per second

 

**This is too enable consistent data, E.G batch in transactional, so we get all queried data from one insert and not a mix of 2 or more.

 

 

Seems every second we insert, the rows are never deleted by the TTL, or so we thought.

After some time we got this message on the query side

 

 

#######################################

ERROR [ReadStage:91] 2015-04-21 12:27:03,902 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in keyspace.table; query aborted (see tombstone_failure_threshold)

ERROR [ReadStage:91] 2015-04-21 12:27:03,931 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:91,5,main]

java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

                at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

#######################################

 

 

So we know tombstones are infact being created.

Solution was to change the table schema and set gc_grace_seconds to run every 60 seconds.

This worked for 20 seconds, then we saw this

 

 

#######################################

Read 500 live and 30000 tombstoned cells in keyspace.table (see tombstone_warn_threshold). 10000 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647}

#######################################

 

So every 20 seconds (500 inserts x 20 seconds = 10,000 tombstones)

So now we have the gc_grace_seconds set to 10 seoncds.

But its feels very wrong to have it at a low number, especially if we move to a larger cluster. This just wont fly.

What are we doing wrong?

 

We shouldn’t increase the tombstone threshold as that is extremely dangerous.

 

 

Best Regards

Stephen Walsh

 

 

 

 

 

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

 

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: RE: Cassandra tombstones being created by updating rows with TTL's

Walsh, Stephen

Thanks Anij,

 

You are correct in understanding of our setup. However when we set the gc to 10 seconds its manages our tombstone count, any higher than 10 seconds and we start getting tombstone warnings.

I think your right, when I set the gc_grace to 0 , I don’t believe the compaction kicked in quick enough, hence causing the below issues.

In fact, I wasn’t able to drop that keyspace, restarting Cassandra didn’t work either – I actually ended up rebooting the machine to remove it.

 

I wouldn’t be a fan of upping the tombstone threshold, could be dentromental and cause more issues later one. Think we just need to deal with them faster.

 

I’ve been running a system over night with GC=10 and so far so good.

SSTable count is at 1 -3, no inconsistency messages

Dataflow rate hasn’t changed

Compaction is working as it should

Only got 2 flushing pending during the run.

 

Let’s run this setup for a while and see what happens

 

Steve

 

From: Anuj Wadehra [mailto:[hidden email]]
Sent: 22 April 2015 19:07
To: [hidden email]
Subject: Re: RE: Cassandra tombstones being created by updating rows with TTL's

 

Hi Stephen,

 

Dropping cf or keyspace and recreating it looks undesirable here.

 

What I understood is that your rows survive 10sec but if u set gc_grace_seconds to 10 u find lot of tombstones in query.Please correct me if needed.

 

I think that problem is that auto compaction is not getting triggered so that tombstones get removed. I would suggest you to keep gc grace sec as 10 and make sure that you configure Cql compaction subproperties 

tombstone_compaction_interval to a lower value and unchecked_tombstone_compaction to true( available after 2.0.9).

 

You can also go for triggerring major compactions frequently.

 

Tombstone threshold in yaml may be increased keeping in mind read latency needs.

 

Thanks

Anuj Wadehra

 

Sent from Yahoo Mail on Android


From:"Walsh, Stephen" <[hidden email]>
Date:Wed, 22 Apr, 2015 at 7:56 pm
Subject:RE: Cassandra tombstones being created by updating rows with TTL's

Hey Anuj,

 

I think this might be related to me quickly dropping the tables and re-creating then to add in the gc_grace_seconds to 0, instead of doing a ALTER TABLE command.

This might have caused the FileNotFound Issue.

 

I might just drop the keyspace do a nodetool clean up on each node, then re-add and see what happens.

I no longer have the log file , but I think it was related to the compaction.

 

However nodetool cfstats also give this on the one of the tables….. (only table I’m writing too)

 

Table : table_name

                SSTable count: 35

Exception in thread "main" java.lang.AssertionError

        at org.apache.cassandra.io.compress.CompressionParameters.setLiveMetadata(CompressionParameters.java:111)

        at org.apache.cassandra.io.sstable.SSTableReader.getCompressionMetadata(SSTableReader.java:634)

        at org.apache.cassandra.io.sstable.SSTableReader.getCompressionMetadataOffHeapSize(SSTableReader.java:648)

        at org.apache.cassandra.metrics.ColumnFamilyMetrics$26.value(ColumnFamilyMetrics.java:464)

        at org.apache.cassandra.metrics.ColumnFamilyMetrics$26.value(ColumnFamilyMetrics.java:459)

        at org.apache.cassandra.db.ColumnFamilyStore.getCompressionMetadataOffHeapMemoryUsed(ColumnFamilyStore.java:2226)

        at sun.reflect.GeneratedMethodAccessor90.invoke(Unknown Source)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:483)

        at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)

        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:483)

        at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)

        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)

        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)

        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)

        at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)

        at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)

        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)

        at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)

        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1443)

        at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)

        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1307)

        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1399)

        at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:637)

        at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:483)

        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)

        at sun.rmi.transport.Transport$1.run(Transport.java:200)

        at sun.rmi.transport.Transport$1.run(Transport.java:197)

        at java.security.AccessController.doPrivileged(Native Method)

        at sun.rmi.transport.Transport.serviceCall(Transport.java:196)

        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)

        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)

        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$87(TCPTransport.java:683)

        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$2/1019453949.run(Unknown Source)

        at java.security.AccessController.doPrivileged(Native Method)

        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

 

 

 

From: Anuj Wadehra [[hidden email]]
Sent: 21 April 2015 19:04
To: [hidden email]
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

 

Whats ur sstable count for the CF? I hope compactions are working fine. Also check the full stacktrace of FileNotFoundException ..if its related to compaction....you can try cleaning compactions_in_progress folder in system folder in data directory..there are JIRA issues relating to that.

 

Thanks

Anuj Wadehra

 

Sent from Yahoo Mail on Android


From:"Laing, Michael" <<a href="javascript:return" target="_blank">michael.laing@...>
Date:Tue, 21 Apr, 2015 at 10:21 pm
Subject:Re: Cassandra tombstones being created by updating rows with TTL's

Hmm - we read/write with Local Quorum always - I'd recommend that as that is your 'consistency' defense.

 

We use python, so I am not familiar with the java driver - but 'file not found' indicates something is inconsistent. 

 

On Tue, Apr 21, 2015 at 12:22 PM, Walsh, Stephen <[hidden email]> wrote:

Thanks for all your help Michael,

 

Our data will change through the day, so data with a TTL will eventually get dropped, and new data will appear.

I’d imagine the entire table maybe expire and start over 7-10 times a day.

 

 

 

But on the GC topic, now java Driver now gives this error on the query

I also get “Request did not complete within rpc_timeout.” In cqlsh.

 

#################################

com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)

        at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.Responses$Error.asException(Responses.java:100) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:140) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:249) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:433) ~[cassandra-driver-core-2.1.4.jar:na]

Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)

        at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:61) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:38) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:168) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) ~[cassandra-driver-core-2.1.4.jar:na]

        at com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[cassandra-driver-core-2.1.4.jar:na]

#################################

 

 

These queries where taking about 1 second to run when the gc was at 10 seconds (same duration as the TTL).

 

Also seeing a lot of this this stuff in the log file

 

#################################

ERROR [ReadStage:71] 2015-04-21 17:11:07,597 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:71,5,main]

java.lang.RuntimeException: java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db (No such file or directory)

                at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db

################################

 

 

Maybe this is a 1 step back 2 steps forward approach?

Any ideas?

 

 

 

 

From: Laing, Michael [[hidden email]]
Sent: 21 April 2015 17:09


To: [hidden email]
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

 

Discussions previously on the list show why this is not a problem in much more detail.

 

If something changes in your cluster: node down, new node, etc - you run repair for sure.

 

We also run periodic repairs prophylactically.

 

But if you never delete and always ttl by the same amount, you do not have to worry about zombie data being resurrected - the main reason for running repair within gc_grace_seconds.

 

 

 

On Tue, Apr 21, 2015 at 11:49 AM, Walsh, Stephen <[hidden email]> wrote:

Maybe thanks Michael,

I will give these setting a go,

How do you do you periodic node-tool repairs in the situation, for what I read we need to start doing this also.

 

https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair

 

 

From: Laing, Michael [[hidden email]]
Sent: 21 April 2015 16:26
To: [hidden email]
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

 

If you never delete except by ttl, and always write with the same ttl (or monotonically increasing), you can set gc_grace_seconds to 0.

 

That's what we do. There have been discussions on the list over the last few years re this topic.

 

ml

 

On Tue, Apr 21, 2015 at 11:14 AM, Walsh, Stephen <[hidden email]> wrote:

We were chatting to Jon Haddena about a week ago about our tombstone issue using Cassandra 2.0.14

To Summarize

 

We have a 3 node cluster with replication-factor=3 and compaction = SizeTiered

We use 1 keyspace with 1 table

Each row have about 40 columns

Each row has a TTL of 10 seconds

 

We insert about 500 rows per second in a prepared batch** (about 3mb in network overhead)

We query the entire table once per second

 

**This is too enable consistent data, E.G batch in transactional, so we get all queried data from one insert and not a mix of 2 or more.

 

 

Seems every second we insert, the rows are never deleted by the TTL, or so we thought.

After some time we got this message on the query side

 

 

#######################################

ERROR [ReadStage:91] 2015-04-21 12:27:03,902 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in keyspace.table; query aborted (see tombstone_failure_threshold)

ERROR [ReadStage:91] 2015-04-21 12:27:03,931 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:91,5,main]

java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

                at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

#######################################

 

 

So we know tombstones are infact being created.

Solution was to change the table schema and set gc_grace_seconds to run every 60 seconds.

This worked for 20 seconds, then we saw this

 

 

#######################################

Read 500 live and 30000 tombstoned cells in keyspace.table (see tombstone_warn_threshold). 10000 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647}

#######################################

 

So every 20 seconds (500 inserts x 20 seconds = 10,000 tombstones)

So now we have the gc_grace_seconds set to 10 seoncds.

But its feels very wrong to have it at a low number, especially if we move to a larger cluster. This just wont fly.

What are we doing wrong?

 

We shouldn’t increase the tombstone threshold as that is extremely dangerous.

 

 

Best Regards

Stephen Walsh

 

 

 

 

 

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

 

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.

 

 

Loading...