Moving SSTables from one disk to another

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Moving SSTables from one disk to another

Roman Tkachenko
Hey guys,

We're running Cassandra with two data directories, let's say /data/sstables1 and /data/sstables2, which are in fact two separate (but identical) disks. The problem is that the disk where "sstables2" is mounted is running out of space and large SSTables stored there cannot be compacted.

So I have two questions:

* Can I just move some SSTables data files from "sstables2" to "sstables1" which has much more free disk space? Will Cassandra start fine after that and not lose any data?

* Provided multiple data dirs, should Cassandra distribute data equally between them? In what I'm observing this is almost always not true. On that particular node I mentioned above the difference is huge: 4% occupied disk space for "sstables1" and 87% for "sstables2"; on other nodes the situation is a little better but still not 50/50.

Thanks!

Roman
Reply | Threaded
Open this post in threaded view
|

Re: Moving SSTables from one disk to another

Robert Coli-3
On Fri, Apr 10, 2015 at 4:00 PM, Roman Tkachenko <[hidden email]> wrote:
* Can I just move some SSTables data files from "sstables2" to "sstables1" which has much more free disk space? Will Cassandra start fine after that and not lose any data?

Cassandra generally discovers files in its data directories and treats them as legitimate files. I do not have specific knowledge of JBOD behavior here, but I would presume it would be the same.
 
* Provided multiple data dirs, should Cassandra distribute data equally between them? In what I'm observing this is almost always not true. On that particular node I mentioned above the difference is huge: 4% occupied disk space for "sstables1" and 87% for "sstables2"; on other nodes the situation is a little better but still not 50/50.

No, and especially not when using Size Tiered Compaction.

I honestly wonder why people think JBOD is a useful feature for Cassandra. You don't really want to continue to operate a node that has lost half of its data, and managing multiple data directories seems relatively likely to be more trouble than it's worth. You have a distributed, replicated database... just replace nodes when they fail. Anyone care to set me straight about the amazing benefits they see which make the costs worthwhile?

=Rob

Reply | Threaded
Open this post in threaded view
|

Re: Moving SSTables from one disk to another

Jonathan Haddad
I had submitted this issue which could have had (in theory) some
serious performance benefit when using JBOD:
https://issues.apache.org/jira/browse/CASSANDRA-8868

However, it was pointed out to me that
https://issues.apache.org/jira/browse/CASSANDRA-6696 will be a better
solution in a lot of cases.

On Fri, Apr 10, 2015 at 4:13 PM, Robert Coli <[hidden email]> wrote:

> On Fri, Apr 10, 2015 at 4:00 PM, Roman Tkachenko <[hidden email]>
> wrote:
>>
>> * Can I just move some SSTables data files from "sstables2" to "sstables1"
>> which has much more free disk space? Will Cassandra start fine after that
>> and not lose any data?
>
>
> Cassandra generally discovers files in its data directories and treats them
> as legitimate files. I do not have specific knowledge of JBOD behavior here,
> but I would presume it would be the same.
>
>>
>> * Provided multiple data dirs, should Cassandra distribute data equally
>> between them? In what I'm observing this is almost always not true. On that
>> particular node I mentioned above the difference is huge: 4% occupied disk
>> space for "sstables1" and 87% for "sstables2"; on other nodes the situation
>> is a little better but still not 50/50.
>
>
> No, and especially not when using Size Tiered Compaction.
>
> I honestly wonder why people think JBOD is a useful feature for Cassandra.
> You don't really want to continue to operate a node that has lost half of
> its data, and managing multiple data directories seems relatively likely to
> be more trouble than it's worth. You have a distributed, replicated
> database... just replace nodes when they fail. Anyone care to set me
> straight about the amazing benefits they see which make the costs
> worthwhile?
>
> =Rob
>



--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade
Reply | Threaded
Open this post in threaded view
|

Re: Moving SSTables from one disk to another

Robert Coli-3
On Fri, Apr 10, 2015 at 4:30 PM, Jonathan Haddad <[hidden email]> wrote:
However, it was pointed out to me that
https://issues.apache.org/jira/browse/CASSANDRA-6696 will be a better
solution in a lot of cases.

Thank you for the interesting link about a theoretical usage which would make JBOD worth using.

But I really don't understand why we consider the use of the current JBOD ok, when :

"In JBOD, when someone gets a bad drive, the bad drive is replaced with a new empty one and repair is run. This can cause deleted data to come back in some cases." 

This class of issue is permanently fatal to consistency for the affected data.

Why are we encouraging people to expose themselves to this class of issue? What benefit do they get from current JBOD implementation that is worth this risk to consistency?

Yes, it's true that if an operator in this case never creates tombstones or never runs repair after losing only one disk, they're not exposed to the risk. But when they configure JBOD, the entire point is that they hope to run repair after losing only one disk, instead of rebuilding the entire node. The status quo seems to set up operators for failure when they attempt to do what the feature claims to be useful for.

I don't get "features" like this : questionable benefit, measurable risk, known serious issues and yet they sit there in the product for years on end, daring someone to use them...

=Rob