best way to measure repair times?

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

best way to measure repair times?

Ian Rose
Howdy -

I'd like to (a) monitor how long my repairs are taking, and (b) know when a repair is finished so that I can take some kind of followup action.  What's the best way to tackle either or both of these?

Some potentially relevant details:

- running community apache cassandra (not DSE)
- version 2.0.13
- we currently trigger repairs via an external timer that calls forceRepairAsync on the StorageService mbean via JMX

Thanks!
- Ian

Reply | Threaded
Open this post in threaded view
|

Re: best way to measure repair times?

Ali Akhtar
Just wondering - why do you have to trigger the repairs? Is that necessary in Cassandra?

(Sorry for the off topic question)

On Thu, Mar 19, 2015 at 10:30 PM, Ian Rose <[hidden email]> wrote:
Howdy -

I'd like to (a) monitor how long my repairs are taking, and (b) know when a repair is finished so that I can take some kind of followup action.  What's the best way to tackle either or both of these?

Some potentially relevant details:

- running community apache cassandra (not DSE)
- version 2.0.13
- we currently trigger repairs via an external timer that calls forceRepairAsync on the StorageService mbean via JMX

Thanks!
- Ian


Reply | Threaded
Open this post in threaded view
|

Re: best way to measure repair times?

Robert Coli-3
In reply to this post by Ian Rose
On Thu, Mar 19, 2015 at 10:30 AM, Ian Rose <[hidden email]> wrote:
I'd like to (a) monitor how long my repairs are taking, and (b) know when a repair is finished so that I can take some kind of followup action.  What's the best way to tackle either or both of these?


Also consider increasing your gc_grace_seconds to 34 days by default (CASSANDRA-5850) to decrease the frequency of repair.

=Rob
 
Reply | Threaded
Open this post in threaded view
|

Re: best way to measure repair times?

Robert Coli-3
In reply to this post by Ali Akhtar
On Thu, Mar 19, 2015 at 10:32 AM, Ali Akhtar <[hidden email]> wrote:
Just wondering - why do you have to trigger the repairs? Is that necessary in Cassandra?

Manual repair is the only mechanism in Cassandra which guarantees consistency. 

A repair must be run once per gc_grace_seconds in every column family that does DELETE-like[1] operations.

=Rob
[1] including some forms of CQL UPDATE, etc.

Reply | Threaded
Open this post in threaded view
|

Re: best way to measure repair times?

Ali Akhtar
Cassandra doesn't guarantee eventual consistency? 

On Fri, Mar 20, 2015 at 12:04 AM, Robert Coli <[hidden email]> wrote:
On Thu, Mar 19, 2015 at 10:32 AM, Ali Akhtar <[hidden email]> wrote:
Just wondering - why do you have to trigger the repairs? Is that necessary in Cassandra?

Manual repair is the only mechanism in Cassandra which guarantees consistency. 

A repair must be run once per gc_grace_seconds in every column family that does DELETE-like[1] operations.

=Rob
[1] including some forms of CQL UPDATE, etc.


Reply | Threaded
Open this post in threaded view
|

Re: best way to measure repair times?

Robert Coli-3
On Thu, Mar 19, 2015 at 12:13 PM, Ali Akhtar <[hidden email]> wrote:
Cassandra doesn't guarantee eventual consistency? 

If you run regularly scheduled repair, it does. If you do not run repair, it does not.

Hinted handoff, for example, is considered an optimization for repair, and does not assert that it provides a consistency guarantee.

=Rob 
Reply | Threaded
Open this post in threaded view
|

Re: best way to measure repair times?

Paulo Motta
From: http://www.datastax.com/dev/blog/modern-hinted-handoff

Repair and the fine print

At first glance, it may appear that Hinted Handoff lets you safely get away without needing repair. This is only true if you never have hardware failure. Hardware failure means that

  1. We lose “historical” data for which the write has already finished, so there is nothing to tell the rest of the cluster exactly what data has gone missing
  2. We can also lose hints-not-yet-replayed from requests the failed node coordinated

With sufficient dedication, you can get by with “only run repair after hardware failure and rely on hinted handoff the rest of the time,” but as your clusters grow (and hardware failure becomes more common) performing repair as a one-off special case will become increasingly difficult to do perfectly. Thus, we continue to recommend running a full repair weekly.



2015-03-19 16:42 GMT-03:00 Robert Coli <[hidden email]>:
On Thu, Mar 19, 2015 at 12:13 PM, Ali Akhtar <[hidden email]> wrote:
Cassandra doesn't guarantee eventual consistency? 

If you run regularly scheduled repair, it does. If you do not run repair, it does not.

Hinted handoff, for example, is considered an optimization for repair, and does not assert that it provides a consistency guarantee.

=Rob 



--
Paulo Ricardo

--
European Master in Distributed Computing
Royal Institute of Technology - KTH
Instituto Superior Técnico - IST
Reply | Threaded
Open this post in threaded view
|

Re: best way to measure repair times?

Robert Coli-3


On Thu, Mar 19, 2015 at 12:53 PM, Paulo Motta <[hidden email]> wrote:

This is only true if you never have hardware failure. Hardware failure means that

For the record, I hate this formulation for being a little too clever.

" This is never true, because we live in a world where hardware fails. "

Would be a better phrasing.

=Rob
Jan
Reply | Threaded
Open this post in threaded view
|

Re: best way to measure repair times?

Jan
In reply to this post by Paulo Motta
Ian; 

to respond to your specific question:

You could pipe the output of your repair into a file and subsequently determine the time taken.    
example: 
nodetool repair -dc DC1
[2014-07-24 21:59:55,326] Nothing to repair for keyspace 'system'
[2014-07-24 21:59:55,617] Starting repair command #2, repairing 490 ranges 
  for keyspace system_traces (seq=true, full=true)
[2014-07-24 22:23:14,299] Repair session 323b9490-137e-11e4-88e3-c972e09793ca 
  for range (820981369067266915,822627736366088177] finished
[2014-07-24 22:23:14,320] Repair session 38496a61-137e-11e4-88e3-c972e09793ca 
  for range (2506042417712465541,2515941262699962473] finished

What to look for: 
a)  Look for the specific name of the Keyspace & the word 'starting repair'
b)  Look for the word 'finished'. 
c)  Compute the average time per keyspace and you would be able to have a rough idea of how long your repairs would take on a regular basis.    This is only for continual operational repair, not the first time its done.  

hope this helps
Jan/





On Thursday, March 19, 2015 12:55 PM, Paulo Motta <[hidden email]> wrote:


From: http://www.datastax.com/dev/blog/modern-hinted-handoff

Repair and the fine print

At first glance, it may appear that Hinted Handoff lets you safely get away without needing repair. This is only true if you never have hardware failure. Hardware failure means that
  1. We lose “historical” data for which the write has already finished, so there is nothing to tell the rest of the cluster exactly what data has gone missing
  2. We can also lose hints-not-yet-replayed from requests the failed node coordinated
With sufficient dedication, you can get by with “only run repair after hardware failure and rely on hinted handoff the rest of the time,” but as your clusters grow (and hardware failure becomes more common) performing repair as a one-off special case will become increasingly difficult to do perfectly. Thus, we continue to recommend running a full repair weekly.


2015-03-19 16:42 GMT-03:00 Robert Coli <[hidden email]>:
On Thu, Mar 19, 2015 at 12:13 PM, Ali Akhtar <[hidden email]> wrote:
Cassandra doesn't guarantee eventual consistency? 

If you run regularly scheduled repair, it does. If you do not run repair, it does not.

Hinted handoff, for example, is considered an optimization for repair, and does not assert that it provides a consistency guarantee.

=Rob 



--
Paulo Ricardo

--
European Master in Distributed Computing
Royal Institute of Technology - KTH
Instituto Superior Técnico - IST


Reply | Threaded
Open this post in threaded view
|

Re: best way to measure repair times?

Robert Coli-3
On Thu, Mar 19, 2015 at 1:03 PM, Jan <[hidden email]> wrote:
to respond to your specific question:

You could pipe the output of your repair into a file and subsequently determine the time taken.    

By this method, what is the duration of a repair which will never complete?

=Rob
 
Reply | Threaded
Open this post in threaded view
|

Re: best way to measure repair times?

Ian Rose
In reply to this post by Jan
Thanks Jan, although I'm a bit unsure of the details.  It looks like when you run a repair this actually occurs over several "sessions".  e.g. in your example above there are 2 different "repair session [...] finished" lines.  So does it makes sense that I would want to measure between when I first see the "Starting repair command..." line until the last "repair session [...] finished" line?  If so, how do I know when I have seen the last session finish?  Is there a way to know how many sessions there will be (perhaps 1 per range)?  And how do I correlate session logs to the repair, since the session logs identify the repair like "#22f77ad0-cad0-11e4-8f34-77e1731d15ff" whereas the "starting repair" log identifies it with a much smaller number (e.g. "repair command #2").

- Ian


On Thu, Mar 19, 2015 at 4:03 PM, Jan <[hidden email]> wrote:
Ian; 

to respond to your specific question:

You could pipe the output of your repair into a file and subsequently determine the time taken.    
example: 
nodetool repair -dc DC1
[2014-07-24 21:59:55,326] Nothing to repair for keyspace 'system'
[2014-07-24 21:59:55,617] Starting repair command #2, repairing 490 ranges 
  for keyspace system_traces (seq=true, full=true)
[2014-07-24 22:23:14,299] Repair session 323b9490-137e-11e4-88e3-c972e09793ca 
  for range (820981369067266915,822627736366088177] finished
[2014-07-24 22:23:14,320] Repair session 38496a61-137e-11e4-88e3-c972e09793ca 
  for range (2506042417712465541,2515941262699962473] finished

What to look for: 
a)  Look for the specific name of the Keyspace & the word 'starting repair'
b)  Look for the word 'finished'. 
c)  Compute the average time per keyspace and you would be able to have a rough idea of how long your repairs would take on a regular basis.    This is only for continual operational repair, not the first time its done.  

hope this helps
Jan/





On Thursday, March 19, 2015 12:55 PM, Paulo Motta <[hidden email]> wrote:


From: http://www.datastax.com/dev/blog/modern-hinted-handoff

Repair and the fine print

At first glance, it may appear that Hinted Handoff lets you safely get away without needing repair. This is only true if you never have hardware failure. Hardware failure means that
  1. We lose “historical” data for which the write has already finished, so there is nothing to tell the rest of the cluster exactly what data has gone missing
  2. We can also lose hints-not-yet-replayed from requests the failed node coordinated
With sufficient dedication, you can get by with “only run repair after hardware failure and rely on hinted handoff the rest of the time,” but as your clusters grow (and hardware failure becomes more common) performing repair as a one-off special case will become increasingly difficult to do perfectly. Thus, we continue to recommend running a full repair weekly.


2015-03-19 16:42 GMT-03:00 Robert Coli <[hidden email]>:
On Thu, Mar 19, 2015 at 12:13 PM, Ali Akhtar <[hidden email]> wrote:
Cassandra doesn't guarantee eventual consistency? 

If you run regularly scheduled repair, it does. If you do not run repair, it does not.

Hinted handoff, for example, is considered an optimization for repair, and does not assert that it provides a consistency guarantee.

=Rob 



--
Paulo Ricardo

--
European Master in Distributed Computing
Royal Institute of Technology - KTH
Instituto Superior Técnico - IST



Reply | Threaded
Open this post in threaded view
|

RE: best way to measure repair times?

Jason Kushmaul | WDA
In reply to this post by Jan

Ian,

 

In my experience I don’t get any output from repair (2.0.7) that is useful until the keyspace is finished.  Perhaps this has been solved but we do something much more painful:

 

 

We tail the log on the node having repair run on it, watching for the first repair session, and then count each “session completed” line.  Each keyspace being repaired will produce num_tokens worth of messages.

 

Find the start time:

$grep AntiEntropy /var/log/cassandra/system.log | grep –m 1 "new session"  

INFO [AntiEntropySessions:1] 2015-01-06 08:00:01,817 RepairSession.java (line 244) [repair #1c1023c0-95b0-11e4-abc7-9d8c76a06ae7] new session: will sync /10.x.y.z, /10.x.y.z on range (2770269247941187446,2771538486312712323] for menomena.[x, y, z]

Note – you have to catch the *first* message, there will be more to follow.  This is something that would be great if there was a differentiator in the log output to know if it is the initial start of a repair vs a new range.

 

 

So start_time = 2015-01-06 08:00:01,817

 

 

From there you count session completed messages:

$grep AntiEntropy /var/log/cassandra/system.log | grep "session completed" | wc -l

INFO [AntiEntropySessions:192] 2015-01-06 14:35:13,874 RepairSession.java (line 282) [repair #1c1023c0-95b0-11e4-abc7-9d8c76a06ae7] session completed successfully

 

Since I have num_tokens=256; If I see a count of 412, I know that OpsCenter(256) is finished and menomena(256) is about 40% finished.

 

As Jan said, you could then use this to calculate remaining time from the start time and the remainder of the ranges.

 

I’ve found this to give me immediate indication of progress, rather than having to wait for the keyspace to be finished.  We are running 2.0.7, maybe some of this has been exposed through nodetool repair (which would be sweet).  This seems to be more or less accurate, but anyone correct me if I am wrong please.  We use this more for automatically detecting long running repairs more than to simply watch progress, which our internal zabbix server will whine about it to my team.

 

 

Jason Kushmaul | V.P. Mobile Engineering

4050 Hunsaker Drive | East Lansing, MI 48823 USA
517-337-2701 x 5225| 517-337-2754 (fax)

 

From: Jan [mailto:[hidden email]]
Sent: Thursday, March 19, 2015 4:04 PM
To: [hidden email]
Subject: Re: best way to measure repair times?

 

Ian; 

 

to respond to your specific question:

 

You could pipe the output of your repair into a file and subsequently determine the time taken.    

example: 

nodetool repair -dc DC1
[2014-07-24 21:59:55,326] Nothing to repair for keyspace 'system'
[2014-07-24 21:59:55,617] Starting repair command #2, repairing 490 ranges 
  for keyspace system_traces (seq=true, full=true)
[2014-07-24 22:23:14,299] Repair session 323b9490-137e-11e4-88e3-c972e09793ca 
  for range (820981369067266915,822627736366088177] finished
[2014-07-24 22:23:14,320] Repair session 38496a61-137e-11e4-88e3-c972e09793ca 
  for range (2506042417712465541,2515941262699962473] finished
 
What to look for: 
a)  Look for the specific name of the Keyspace & the word 'starting repair'
b)  Look for the word 'finished'. 
c)  Compute the average time per keyspace and you would be able to have a rough idea of how long your repairs would take on a regular basis.    This is only for continual operational repair, not the first time its done.  
 
hope this helps
Jan/
 

 

 

 

On Thursday, March 19, 2015 12:55 PM, Paulo Motta <[hidden email]> wrote:

 

From: http://www.datastax.com/dev/blog/modern-hinted-handoff

Repair and the fine print

At first glance, it may appear that Hinted Handoff lets you safely get away without needing repair. This is only true if you never have hardware failure. Hardware failure means that

  1. We lose “historical” data for which the write has already finished, so there is nothing to tell the rest of the cluster exactly what data has gone missing
  2. We can also lose hints-not-yet-replayed from requests the failed node coordinated

With sufficient dedication, you can get by with “only run repair after hardware failure and rely on hinted handoff the rest of the time,” but as your clusters grow (and hardware failure becomes more common) performing repair as a one-off special case will become increasingly difficult to do perfectly. Thus, we continue to recommend running a full repair weekly.

 

 

2015-03-19 16:42 GMT-03:00 Robert Coli <[hidden email]>:

On Thu, Mar 19, 2015 at 12:13 PM, Ali Akhtar <[hidden email]> wrote:

Cassandra doesn't guarantee eventual consistency? 

 

If you run regularly scheduled repair, it does. If you do not run repair, it does not.

 

Hinted handoff, for example, is considered an optimization for repair, and does not assert that it provides a consistency guarantee.

 

=Rob 




--

Paulo Ricardo

 

--
European Master in Distributed Computing

Royal Institute of Technology - KTH

Instituto Superior Técnico - IST

 

Reply | Threaded
Open this post in threaded view
|

Re: best way to measure repair times?

Rahul Neelakantan
In reply to this post by Robert Coli-3
Wouldn't GC Grace set to 34 days increase the bloat in the DB?

Rahul

On Mar 19, 2015, at 3:02 PM, Robert Coli <[hidden email]> wrote:

On Thu, Mar 19, 2015 at 10:30 AM, Ian Rose <[hidden email]> wrote:
I'd like to (a) monitor how long my repairs are taking, and (b) know when a repair is finished so that I can take some kind of followup action.  What's the best way to tackle either or both of these?


Also consider increasing your gc_grace_seconds to 34 days by default (CASSANDRA-5850) to decrease the frequency of repair.

=Rob
 
Reply | Threaded
Open this post in threaded view
|

Re: best way to measure repair times?

Robert Coli-3
On Thu, Mar 19, 2015 at 4:56 PM, Rahul Neelakantan <[hidden email]> wrote:
Wouldn't GC Grace set to 34 days increase the bloat in the DB?

Yes, but as I say in the ticket, my belief is that the fixed cost of repair combined with the fact that it frequently doesn't work at all (hangs forever, etc.) is much more expensive than the on-disk bloat. With incremental and/or snapshot repair which actually works (arriving Real Soon Now) the inputs into the cost/benefit analysis change.

On ticket, Jonathan Ellis waves his hands and asserts that current costs are likely to be equal in the typical user's case. This suggests that the typical user is doing a bunch of DELETE in a log structured database with immutable data files, which seems rather unlikely to me...

=Rob