Deleted columns reappear after "repair"

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Deleted columns reappear after "repair"

Roman Tkachenko
Hey guys,

We're having a very strange issue: deleted columns get resurrected when "repair" is run on a node.

Info about the setup. Cassandra 2.0.13, multi datacenter with 12 nodes in one datacenter and 6 nodes in another one. Schema:

cqlsh> describe keyspace blackbook;

CREATE KEYSPACE blackbook WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'IAD': '3',
  'ORD': '3'
};

USE blackbook;

CREATE TABLE bounces (
  domainid text,
  address text,
  message text,
  "timestamp" bigint,
  PRIMARY KEY (domainid, address)
) WITH
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.000000 AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

We're using wide rows for the "bounces" table that can store hundreds of thousands of addresses for each "domainid" (in practice it's much less usually, but some rows may contain up to several million columns).

All queries are done using LOCAL_QUORUM consistency. Sometimes bounces are deleted from the table using the following CQL3 statement:

delete from bounces where domainid = 'domain.com' and address = '[hidden email]';

But the thing is, after "repair" is run on any node that owns "domain.com" key, the column gets resurrected on all nodes as if the tombstone has disappeared. We checked this multiple times using cqlsh: issue a delete statement and verify that data is not returned; then run "repair" and the deleted data is returned again.

Our gc_grace_seconds is of the default value and no nodes ever were down for anywhere close to 10 days, so it doesn't look like it's related. We also made sure all our servers are running ntpd so time synchronization should not be an issue as well.

Have you guys ever seen anything like this / have any idea as to what may be causing this behavior? What could make "tombstone" disappear during "repair" operation?

Thanks for your help. Let me know if I can provide more information.

Roman
Reply | Threaded
Open this post in threaded view
|

Re: Deleted columns reappear after "repair"

Roman Tkachenko
Hey guys,

Has anyone seen anything like this behavior or has an explanation for it? If not, I think I'm gonna file a bug report.

Thanks!

Roman

On Mon, Mar 23, 2015 at 4:45 PM, Roman Tkachenko <[hidden email]> wrote:
Hey guys,

We're having a very strange issue: deleted columns get resurrected when "repair" is run on a node.

Info about the setup. Cassandra 2.0.13, multi datacenter with 12 nodes in one datacenter and 6 nodes in another one. Schema:

cqlsh> describe keyspace blackbook;

CREATE KEYSPACE blackbook WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'IAD': '3',
  'ORD': '3'
};

USE blackbook;

CREATE TABLE bounces (
  domainid text,
  address text,
  message text,
  "timestamp" bigint,
  PRIMARY KEY (domainid, address)
) WITH
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.000000 AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

We're using wide rows for the "bounces" table that can store hundreds of thousands of addresses for each "domainid" (in practice it's much less usually, but some rows may contain up to several million columns).

All queries are done using LOCAL_QUORUM consistency. Sometimes bounces are deleted from the table using the following CQL3 statement:

delete from bounces where domainid = 'domain.com' and address = '[hidden email]';

But the thing is, after "repair" is run on any node that owns "domain.com" key, the column gets resurrected on all nodes as if the tombstone has disappeared. We checked this multiple times using cqlsh: issue a delete statement and verify that data is not returned; then run "repair" and the deleted data is returned again.

Our gc_grace_seconds is of the default value and no nodes ever were down for anywhere close to 10 days, so it doesn't look like it's related. We also made sure all our servers are running ntpd so time synchronization should not be an issue as well.

Have you guys ever seen anything like this / have any idea as to what may be causing this behavior? What could make "tombstone" disappear during "repair" operation?

Thanks for your help. Let me know if I can provide more information.

Roman

Reply | Threaded
Open this post in threaded view
|

Re: Deleted columns reappear after "repair"

Duncan Sands
Hi Roman,

On 24/03/15 17:32, Roman Tkachenko wrote:
> Hey guys,
>
> Has anyone seen anything like this behavior or has an explanation for it? If
> not, I think I'm gonna file a bug report.

this can happen if repair is run after the tombstone gc_grace_period has
expired.  I suggest you increase gc_grace_period.

Ciao, Duncan.
Reply | Threaded
Open this post in threaded view
|

Re: Deleted columns reappear after "repair"

Roman Tkachenko
Hi Duncan,

Thanks for the response!

I can try increasing gc_grace_seconds and run repair on all nodes. It does not make sense though why all *new* deletes (for the same column that resurrects after repair) I do are forgotten as well after repair? Doesn't Cassandra insert a new tombstone every time delete happens?

Also, how do I find out the value to set gc_grace_seconds to?

Thanks.

On Tue, Mar 24, 2015 at 9:38 AM, Duncan Sands <[hidden email]> wrote:
Hi Roman,

On 24/03/15 17:32, Roman Tkachenko wrote:
Hey guys,

Has anyone seen anything like this behavior or has an explanation for it? If
not, I think I'm gonna file a bug report.

this can happen if repair is run after the tombstone gc_grace_period has expired.  I suggest you increase gc_grace_period.

Ciao, Duncan.

Reply | Threaded
Open this post in threaded view
|

Re: Deleted columns reappear after "repair"

Duncan Sands
Hi Roman,

On 24/03/15 18:05, Roman Tkachenko wrote:
> Hi Duncan,
>
> Thanks for the response!
>
> I can try increasing gc_grace_seconds and run repair on all nodes. It does not
> make sense though why all *new* deletes (for the same column that resurrects
> after repair) I do are forgotten as well after repair? Doesn't Cassandra insert
> a new tombstone every time delete happens?

it does.  Maybe the data you are trying to delete has a timestamp (writetime) in
the future, for example because clocks aren't synchronized between your nodes.

>
> Also, how do I find out the value to set gc_grace_seconds to?

It needs to be big enough that you are sure to repair your entire cluster in
less than that time.  For example, observe how long repairing the entire cluster
takes and multiply by 3 or 4 (in case a repair fails or is interrupted one day).

Once incremental repair is solid maybe the whole gc_grace thing will eventually
go away, eg by modifying C* to only drop known repaired tombstones.

Ciao, Duncan.

>
> Thanks.
>
> On Tue, Mar 24, 2015 at 9:38 AM, Duncan Sands <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Roman,
>
>     On 24/03/15 17:32, Roman Tkachenko wrote:
>
>         Hey guys,
>
>         Has anyone seen anything like this behavior or has an explanation for it? If
>         not, I think I'm gonna file a bug report.
>
>
>     this can happen if repair is run after the tombstone gc_grace_period has
>     expired.  I suggest you increase gc_grace_period.
>
>     Ciao, Duncan.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Deleted columns reappear after "repair"

Roman Tkachenko
Well, as I mentioned in my original email all machines running Cassandra are running NTP. This was one of the first things I verified and I triple checked that they all show the same time. Is this sufficient to ensure clocks are synched between the nodes?

I have increased gc_grace to 100 days for now and am running repair on the affected keyspace, it should be done today. In the meanwhile if you (or anyone else) have other ideas / suggestions on how to debug this, they're much appreciated.

Thanks for your help!

Roman

On Tue, Mar 24, 2015 at 10:39 AM, Duncan Sands <[hidden email]> wrote:
Hi Roman,

On 24/03/15 18:05, Roman Tkachenko wrote:
Hi Duncan,

Thanks for the response!

I can try increasing gc_grace_seconds and run repair on all nodes. It does not
make sense though why all *new* deletes (for the same column that resurrects
after repair) I do are forgotten as well after repair? Doesn't Cassandra insert
a new tombstone every time delete happens?

it does.  Maybe the data you are trying to delete has a timestamp (writetime) in the future, for example because clocks aren't synchronized between your nodes.


Also, how do I find out the value to set gc_grace_seconds to?

It needs to be big enough that you are sure to repair your entire cluster in less than that time.  For example, observe how long repairing the entire cluster takes and multiply by 3 or 4 (in case a repair fails or is interrupted one day).

Once incremental repair is solid maybe the whole gc_grace thing will eventually go away, eg by modifying C* to only drop known repaired tombstones.

Ciao, Duncan.


Thanks.

On Tue, Mar 24, 2015 at 9:38 AM, Duncan Sands <[hidden email]
<mailto:[hidden email]>> wrote:

    Hi Roman,

    On 24/03/15 17:32, Roman Tkachenko wrote:

        Hey guys,

        Has anyone seen anything like this behavior or has an explanation for it? If
        not, I think I'm gonna file a bug report.


    this can happen if repair is run after the tombstone gc_grace_period has
    expired.  I suggest you increase gc_grace_period.

    Ciao, Duncan.




Reply | Threaded
Open this post in threaded view
|

Re: Deleted columns reappear after "repair"

Roman Tkachenko
Okay, so I'm positively going crazy :)

Increasing gc_grace + repair + decreasing gc_grace didn't help. The columns still appear after the repair. I checked in cassandra-cli and timestamps for these columns are old, not in the future, so it shouldn't be the reason.

I also did a test: updated one of columns and it was indeed updated. Then deleted it (and it was deleted), ran repair and its "updated" version reappeared again! Why wouldn't these columns just go away? Is there any way I can force their deletion permanently?

I also see this log entry on the node I'm running repair on, it mentions the row that contains the reappearing columns:

INFO [ValidationExecutor:77] 2015-03-25 20:27:43,936 CompactionController.java (line 192) Compacting large row blackbook/bounces:4ed558feba8a483733001d6a (279067683 bytes) incrementally

Can it be related to the issue?


On Tue, Mar 24, 2015 at 11:00 AM, Roman Tkachenko <[hidden email]> wrote:
Well, as I mentioned in my original email all machines running Cassandra are running NTP. This was one of the first things I verified and I triple checked that they all show the same time. Is this sufficient to ensure clocks are synched between the nodes?

I have increased gc_grace to 100 days for now and am running repair on the affected keyspace, it should be done today. In the meanwhile if you (or anyone else) have other ideas / suggestions on how to debug this, they're much appreciated.

Thanks for your help!

Roman

On Tue, Mar 24, 2015 at 10:39 AM, Duncan Sands <[hidden email]> wrote:
Hi Roman,

On 24/03/15 18:05, Roman Tkachenko wrote:
Hi Duncan,

Thanks for the response!

I can try increasing gc_grace_seconds and run repair on all nodes. It does not
make sense though why all *new* deletes (for the same column that resurrects
after repair) I do are forgotten as well after repair? Doesn't Cassandra insert
a new tombstone every time delete happens?

it does.  Maybe the data you are trying to delete has a timestamp (writetime) in the future, for example because clocks aren't synchronized between your nodes.


Also, how do I find out the value to set gc_grace_seconds to?

It needs to be big enough that you are sure to repair your entire cluster in less than that time.  For example, observe how long repairing the entire cluster takes and multiply by 3 or 4 (in case a repair fails or is interrupted one day).

Once incremental repair is solid maybe the whole gc_grace thing will eventually go away, eg by modifying C* to only drop known repaired tombstones.

Ciao, Duncan.


Thanks.

On Tue, Mar 24, 2015 at 9:38 AM, Duncan Sands <[hidden email]
<mailto:[hidden email]>> wrote:

    Hi Roman,

    On 24/03/15 17:32, Roman Tkachenko wrote:

        Hey guys,

        Has anyone seen anything like this behavior or has an explanation for it? If
        not, I think I'm gonna file a bug report.


    this can happen if repair is run after the tombstone gc_grace_period has
    expired.  I suggest you increase gc_grace_period.

    Ciao, Duncan.





Reply | Threaded
Open this post in threaded view
|

Re: Deleted columns reappear after "repair"

Robert Coli-3
On Wed, Mar 25, 2015 at 1:57 PM, Roman Tkachenko <[hidden email]> wrote:
Okay, so I'm positively going crazy :)

Increasing gc_grace + repair + decreasing gc_grace didn't help. The columns still appear after the repair. I checked in cassandra-cli and timestamps for these columns are old, not in the future, so it shouldn't be the reason.

I also did a test: updated one of columns and it was indeed updated. Then deleted it (and it was deleted), ran repair and its "updated" version reappeared again! Why wouldn't these columns just go away? Is there any way I can force their deletion permanently?

It sounds like you have done enough sanity checking of your use of Cassandra to consider filing this issue as a JIRA on the issues.apache.org JIRA.

The fact that it seems to only affect a row that is being compacted incrementally is an interesting datapoint...

=Rob
 
Reply | Threaded
Open this post in threaded view
|

Re: Deleted columns reappear after "repair"

Roman Tkachenko
Thanks Robert.

Yup, I increased "in_memory_compaction_limit_in_mb" to 512MB so the row in question fits into it and ran repair on a couple of nodes owning its key. The log entries about this particular row went away and those columns haven't reappeared, yet. If that was the reason, that's unfortunate cause we have rows much larger than 512MB and it'd effectively mean nothing can be deleted from them... Can't increase this parameter forever.

I'm gonna go ahead and file a report at JIRA.

Roman

On Wed, Mar 25, 2015 at 4:11 PM, Robert Coli <[hidden email]> wrote:
On Wed, Mar 25, 2015 at 1:57 PM, Roman Tkachenko <[hidden email]> wrote:
Okay, so I'm positively going crazy :)

Increasing gc_grace + repair + decreasing gc_grace didn't help. The columns still appear after the repair. I checked in cassandra-cli and timestamps for these columns are old, not in the future, so it shouldn't be the reason.

I also did a test: updated one of columns and it was indeed updated. Then deleted it (and it was deleted), ran repair and its "updated" version reappeared again! Why wouldn't these columns just go away? Is there any way I can force their deletion permanently?

It sounds like you have done enough sanity checking of your use of Cassandra to consider filing this issue as a JIRA on the issues.apache.org JIRA.

The fact that it seems to only affect a row that is being compacted incrementally is an interesting datapoint...

=Rob
 

Reply | Threaded
Open this post in threaded view
|

Re: Deleted columns reappear after "repair"

Robert Coli-3
On Wed, Mar 25, 2015 at 6:53 PM, Roman Tkachenko <[hidden email]> wrote:
Yup, I increased "in_memory_compaction_limit_in_mb" to 512MB so the row in question fits into it and ran repair on a couple of nodes owning its key. The log entries about this particular row went away and those columns haven't reappeared, yet. If that was the reason, that's unfortunate cause we have rows much larger than 512MB and it'd effectively mean nothing can be deleted from them... Can't increase this parameter forever.

I'm gonna go ahead and file a report at JIRA.

It would be greatly appreciated by future searchers if you inform the thread of the JIRA url assigned this issue, when you file it. :D

=Rob
 
Reply | Threaded
Open this post in threaded view
|

Re: Deleted columns reappear after "repair"

Roman Tkachenko

On Thu, Mar 26, 2015 at 4:23 PM, Robert Coli <[hidden email]> wrote:
On Wed, Mar 25, 2015 at 6:53 PM, Roman Tkachenko <[hidden email]> wrote:
Yup, I increased "in_memory_compaction_limit_in_mb" to 512MB so the row in question fits into it and ran repair on a couple of nodes owning its key. The log entries about this particular row went away and those columns haven't reappeared, yet. If that was the reason, that's unfortunate cause we have rows much larger than 512MB and it'd effectively mean nothing can be deleted from them... Can't increase this parameter forever.

I'm gonna go ahead and file a report at JIRA.

It would be greatly appreciated by future searchers if you inform the thread of the JIRA url assigned this issue, when you file it. :D

=Rob