Do I need to run repair and compaction every node?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Do I need to run repair and compaction every node?

Benyi Wang
I read the document for several times, but I still not quite sure how to run repair and compaction.

To my understanding, 
  • I need to run compaction one each node, 
  • To repair a table (column family), I only need to run repair on any of nodes.
Am I right? 

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Do I need to run repair and compaction every node?

Robert Coli-3
On Mon, Apr 13, 2015 at 1:36 PM, Benyi Wang <[hidden email]> wrote:
  • I need to run compaction one each node, 
In general, there is no requirement to manually run compaction. Minor compaction occurs in the background, automatically. 
  • To repair a table (column family), I only need to run repair on any of nodes.
It depends on whether you are doing -pr or non -pr repair.

If you are doing -pr repair, you run repair on all nodes. If you do non -pr repair, you have to figure out what set of nodes to run it on. That's why -pr exists, to simplify this. 

=Rob

Reply | Threaded
Open this post in threaded view
|

Re: Do I need to run repair and compaction every node?

Benyi Wang
What about "incremental repair" and "sequential repair"?

I ran "nodetool repair -- keyspace table" on one node. I found the repair sessions running on different nodes. Will this command repair the whole table?


Using the nodetool repair -pr (–partitioner-range) option repairs only the first range returned by the partitioner for a node. Other replicas for that range still have to perform the Merkle tree calculation, causing a validation compaction.

Does it sound like -pr runs on one node?
I'm still don't understand "the first range returned by the partitioned for a node"? 

On Mon, Apr 13, 2015 at 1:40 PM, Robert Coli <[hidden email]> wrote:
On Mon, Apr 13, 2015 at 1:36 PM, Benyi Wang <[hidden email]> wrote:
  • I need to run compaction one each node, 
In general, there is no requirement to manually run compaction. Minor compaction occurs in the background, automatically. 
  • To repair a table (column family), I only need to run repair on any of nodes.
It depends on whether you are doing -pr or non -pr repair.

If you are doing -pr repair, you run repair on all nodes. If you do non -pr repair, you have to figure out what set of nodes to run it on. That's why -pr exists, to simplify this. 

=Rob


Reply | Threaded
Open this post in threaded view
|

Re: Do I need to run repair and compaction every node?

Jeff Ferland
Nodetool repair: The basic default sequential repair covers all nodes, computes merkle trees in sequence one node at a time. Only need to run the command one node.
Nodetool repair -par: covers all nodes, computes merkle trees for each node at the same time. Much higher IO load as every copy of a key range is scanned at once. Can be totally OK with SSDs and throughput limits.  Only need to run the command one node.
Nodetool repair -pr: Only covers the ranges owned by the node's token(s). Must be run on each node because each node owns a partial share of the ring.

Incremental repair: only consider range changes since the last repair. Probably can be combined with whatever other flags.

On Apr 13, 2015, at 2:37 PM, Benyi Wang <[hidden email]> wrote:

What about "incremental repair" and "sequential repair"?

I ran "nodetool repair -- keyspace table" on one node. I found the repair sessions running on different nodes. Will this command repair the whole table?


Using the nodetool repair -pr (–partitioner-range) option repairs only the first range returned by the partitioner for a node. Other replicas for that range still have to perform the Merkle tree calculation, causing a validation compaction.

Does it sound like -pr runs on one node?
I'm still don't understand "the first range returned by the partitioned for a node"? 

On Mon, Apr 13, 2015 at 1:40 PM, Robert Coli <[hidden email]> wrote:
On Mon, Apr 13, 2015 at 1:36 PM, Benyi Wang <[hidden email]> wrote:
  • I need to run compaction one each node, 
In general, there is no requirement to manually run compaction. Minor compaction occurs in the background, automatically. 
  • To repair a table (column family), I only need to run repair on any of nodes.
It depends on whether you are doing -pr or non -pr repair.

If you are doing -pr repair, you run repair on all nodes. If you do non -pr repair, you have to figure out what set of nodes to run it on. That's why -pr exists, to simplify this. 

=Rob



Reply | Threaded
Open this post in threaded view
|

Re: Do I need to run repair and compaction every node?

Robert Coli-3
On Mon, Apr 13, 2015 at 3:33 PM, Jeff Ferland <[hidden email]> wrote:
Nodetool repair -par: covers all nodes, computes merkle trees for each node at the same time. Much higher IO load as every copy of a key range is scanned at once. Can be totally OK with SSDs and throughput limits.  Only need to run the command one node.

No? -par is just a performance (of repair) de-optimization, intended to improve service time during repair. Doing -par without -pr on a single node doesn't repair your entire cluster.

Consider the following 7 node cluster, without vnodes :

A B C D E F G
RF=3

You run a repair on node D, without -pr.

D is repaired against B's tertiary replicas.
D is repaired against C's secondary replicas.
E is repaired against D's secondary replicas.
F is repaired against D's tertiary replicas.
Nodes A and G are completely unaffected and unrepaired, because D does not share any ranges with them.

repair with or without -par only covers all *replica* nodes. Even with vnodes, you still have to run it on almost all nodes in most cases. Which is why most users should save themselves the complexity and just do a rolling -par -pr on all nodes, one by one.

=Rob

Reply | Threaded
Open this post in threaded view
|

Re: Do I need to run repair and compaction every node?

Jonathan Haddad
Or use spotify’s reaper and forget about it https://github.com/spotify/cassandra-reaper

On Apr 13, 2015, at 3:45 PM, Robert Coli <[hidden email]> wrote:

On Mon, Apr 13, 2015 at 3:33 PM, Jeff Ferland <[hidden email]> wrote:
Nodetool repair -par: covers all nodes, computes merkle trees for each node at the same time. Much higher IO load as every copy of a key range is scanned at once. Can be totally OK with SSDs and throughput limits.  Only need to run the command one node.

No? -par is just a performance (of repair) de-optimization, intended to improve service time during repair. Doing -par without -pr on a single node doesn't repair your entire cluster.

Consider the following 7 node cluster, without vnodes :

A B C D E F G
RF=3

You run a repair on node D, without -pr.

D is repaired against B's tertiary replicas.
D is repaired against C's secondary replicas.
E is repaired against D's secondary replicas.
F is repaired against D's tertiary replicas.
Nodes A and G are completely unaffected and unrepaired, because D does not share any ranges with them.

repair with or without -par only covers all *replica* nodes. Even with vnodes, you still have to run it on almost all nodes in most cases. Which is why most users should save themselves the complexity and just do a rolling -par -pr on all nodes, one by one.

=Rob


Reply | Threaded
Open this post in threaded view
|

Re: Do I need to run repair and compaction every node?

Jeff Ferland
In reply to this post by Robert Coli-3
Just read the source and well… yup. I’m guessing now that the options are indeed only rolling repair on each node (with -pr stopping the duplicate work) or -st -9223372036854775808 -et 9223372036854775807 to actually cover all ranges. I didn’t walk through to test that, though.

Glad 3.0 is getting a little bit of love on improving repairs and communications / logging about them.

-Jeff

On Apr 13, 2015, at 3:45 PM, Robert Coli <[hidden email]> wrote:

On Mon, Apr 13, 2015 at 3:33 PM, Jeff Ferland <[hidden email]> wrote:
Nodetool repair -par: covers all nodes, computes merkle trees for each node at the same time. Much higher IO load as every copy of a key range is scanned at once. Can be totally OK with SSDs and throughput limits.  Only need to run the command one node.

No? -par is just a performance (of repair) de-optimization, intended to improve service time during repair. Doing -par without -pr on a single node doesn't repair your entire cluster.

Consider the following 7 node cluster, without vnodes :

A B C D E F G
RF=3

You run a repair on node D, without -pr.

D is repaired against B's tertiary replicas.
D is repaired against C's secondary replicas.
E is repaired against D's secondary replicas.
F is repaired against D's tertiary replicas.
Nodes A and G are completely unaffected and unrepaired, because D does not share any ranges with them.

repair with or without -par only covers all *replica* nodes. Even with vnodes, you still have to run it on almost all nodes in most cases. Which is why most users should save themselves the complexity and just do a rolling -par -pr on all nodes, one by one.

=Rob


Reply | Threaded
Open this post in threaded view
|

Re: Do I need to run repair and compaction every node?

Robert Coli-3
On Mon, Apr 13, 2015 at 11:43 PM, Jeff Ferland <[hidden email]> wrote:
Just read the source and well… yup. I’m guessing now that the options are indeed only rolling repair on each node (with -pr stopping the duplicate work) or -st -9223372036854775808 -et 9223372036854775807 to actually cover all ranges. I didn’t walk through to test that, though.

Technically speaking, in the non-vnode world, you can just run a non-pr repair on certain nodes and repair 100% of the cluster.

A B C D E F G H I
N=8
RF=3

Without -pr, if you repair C (repairs A,B,D,E) and G (repairs E,F,H,I)... you're done.

Vnodes makes this sort of thing too complex to bother with, with the chance that all physical nodes share at least one range with other nodes being quite high.
 
Glad 3.0 is getting a little bit of love on improving repairs and communications / logging about them.

+1

=Rob