Quantcast

Cassandra on iSCSI?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Cassandra on iSCSI?

Mick Semb Wever
Does anyone have any experiences with Cassandra on iSCSI?

I'm currently testing a (soon-to-be) production server using both local
raid-5 and iSCSI disks. Our hosting provider is pushing us hard towards
the iSCSI disks because it is easier for them to run (and to meet our
needs for increasing disk capacity overtime).

I'm worried that iSCSI is a non-scalable solution for an otherwise
scalable application (all cassandra nodes will have separate partitions
to the one iSCSI).

To go with raid-5 disks our hosting provider requires proof that iSCSI
won't work. I tried various things (eg `nodetool cleanup` on 12Gb load
giving 5k IOPS) but iSCSI seems to keep up to the performance of the
local raid-5 disks...

Should i be worried about using iSCSI?
Are there better tests i should be running?

~mck

--
"The turtle only makes progress when it's neck is stuck out" Rollo May
| http://semb.wever.org | http://sesat.no
| http://finn.no       | Java XSS Filter

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cassandra on iSCSI?

Jonathan Ellis-3
On Thu, Jan 20, 2011 at 2:13 PM, Mick Semb Wever <[hidden email]> wrote:
> To go with raid-5 disks our hosting provider requires proof that iSCSI
> won't work. I tried various things (eg `nodetool cleanup` on 12Gb load
> giving 5k IOPS) but iSCSI seems to keep up to the performance of the
> local raid-5 disks...
>
> Should i be worried about using iSCSI?

It should work fine; the main reason to go with local storage is the
huge cost advantage.

Of course with a SAN you'd want RF=1 since it's replicating internally.

> Are there better tests i should be running?

I would test write scalability going from 1 machine, to half your
planned cluster size, to your full cluster size, or as close as is
feasible, using enough client machines running contrib/stress* (much
faster than contrib/py_stress) that you saturate it.

Writes should be CPU bound, so you expect those to scale roughly
linearly as you add Cassandra nodes.

Reads (once your data set can't be cached in RAM) will be i/o bound,
so I imagine with a SAN you'll be able to max that out at some number
of machines and adding more Cassandra nodes won't help.  What that
limit is depends on your SAN iops and how much of it is being consumed
by other applications.

*I just committed a README for contrib/stress to the 0.7 svn branch

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cassandra on iSCSI?

Mick Semb Wever
> It should work fine; the main reason to go with local storage is the
> huge cost advantage.

[OT] They're quoting roughly the same price for both (claiming that the
extra cost goes into having for each node a separate disk cabinet to run
local raid-5).

> *I just committed a README for contrib/stress to the 0.7 svn branch

thanks! i'll check it out.

~mck

--
“An invasion of armies can be resisted, but not an idea whose time has
come.” - Victor Hugo
| www.semb.wever.org | www.sesat.no
| www.finn.no | http://xss-http-filter.sf.net

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cassandra on iSCSI?

Zhu Han

On Fri, Jan 21, 2011 at 3:00 PM, Mick Semb Wever <[hidden email]> wrote:
> It should work fine; the main reason to go with local storage is the
> huge cost advantage.

[OT] They're quoting roughly the same price for both (claiming that the
extra cost goes into having for each node a separate disk cabinet to run
local raid-5).

You might not need raid-5 for local attached storage. Refer [1] for more information.

[1]  http://wiki.apache.org/cassandra/CassandraHardware

> *I just committed a README for contrib/stress to the 0.7 svn branch

thanks! i'll check it out.

~mck

--
“An invasion of armies can be resisted, but not an idea whose time has
come.” - Victor Hugo
| www.semb.wever.org | www.sesat.no
| www.finn.no | http://xss-http-filter.sf.net

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cassandra on iSCSI?

Mick Semb Wever
In reply to this post by Jonathan Ellis-3

> Of course with a SAN you'd want RF=1 since it's replicating
> internally.

Isn't this the same case for raid-5 as well?

And we want RF=2 if we need to keep reading while doing rolling
restarts?

~mck

--
“Anyone who lives within their means suffers from a lack of
imagination.” - Oscar Wilde
| http://semb.wever.org | http://sesat.no
| http://finn.no       | Java XSS Filter

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cassandra on iSCSI?

Mick Semb Wever
In reply to this post by Zhu Han
>         [OT] They're quoting roughly the same price for both (claiming
>         that the
>         extra cost goes into having for each node a separate disk
>         cabinet to run
>         local raid-5).
>
> You might not need raid-5 for local attached storage.

Yes we did ask. But raid-5 is the minimum being offered from our hosting
provider... We could go to raid 10, but raid 0 is out of the question...

~mck

--
"To be young, really young, takes a very long time." Picasso
| http://semb.wever.org | http://sesat.no
| http://finn.no       | Java XSS Filter

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cassandra on iSCSI?

Jonathan Ellis-3
In reply to this post by Mick Semb Wever
On Fri, Jan 21, 2011 at 2:19 AM, Mick Semb Wever <[hidden email]> wrote:
>
>> Of course with a SAN you'd want RF=1 since it's replicating
>> internally.
>
> Isn't this the same case for raid-5 as well?

No, because the replication is (mainly) to protect you from machine
failures; if the SAN is a SPOF then putting more replicas on it
doesn't help.

> And we want RF=2 if we need to keep reading while doing rolling
> restarts?

Yes.

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cassandra on iSCSI?

Edward Capriolo
On Fri, Jan 21, 2011 at 12:07 PM, Jonathan Ellis <[hidden email]> wrote:

> On Fri, Jan 21, 2011 at 2:19 AM, Mick Semb Wever <[hidden email]> wrote:
>>
>>> Of course with a SAN you'd want RF=1 since it's replicating
>>> internally.
>>
>> Isn't this the same case for raid-5 as well?
>
> No, because the replication is (mainly) to protect you from machine
> failures; if the SAN is a SPOF then putting more replicas on it
> doesn't help.
>
>> And we want RF=2 if we need to keep reading while doing rolling
>> restarts?
>
> Yes.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

If you are using cassandra with a SAN RF=1 makes sense because we are
making the assumption the san is already replicating your data. RF2
makes good sense to be not effected by outages. Another alternative is
something like linux-HA and manage each cassandra instance as a
resource. This way if a head goes down another node linux ha would
detect the failure and bring up that instance on another physical
piece of hardware.

Using LinuxHA+SAN+Cassandra would actually bring Cassandra closer to
the hbase model which you have a distributed file system but the front
end Cassandra acts like a region server.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cassandra on iSCSI?

Anthony John
Sort of - do not agree!!

This is the Shared nothing V/s Shared Disk debate. There are many mainstream RDBMS products that pretend to do horizontal scalability with Shared Disks. They have the kinds of problems that Cassandra is specifically architected to avoid!

The original question here has 2 aspects to it:-
1. Is iSCSI SAN good enough - My take is that it is still the poor man's SAN as compared to FC based SANs. Having said that,  they have found increasing adoption and the performance penalty is really marginal. Couple that with the fact that Cassandra is architected to reduce the need for high performance storage systems via features like reducing of random writes etc. So net net - a reasonable iSCSI SAN should work.
2. Does it make sense to use a SPOF SAN - again this militates again the architectural underpinnings of Cassandra, that relies on the shared nothing idea to ensure that problems - say a bad disk - are easily isolated to a particular node. On a SAN, depending on RAID configs, and how LUNs are carved out and so on, a few disk outages could affect multiple nodes. A performance problem with the SAN, could now affects your entire Cassandra cluster, and so on. Cassandra is not meant to be set up this way!

But but but...in the real world today - Large storage volumes are available only with SANs. Rackable machines do not leave a lot of space - typically - for a bunch of HDDs. On top of that, SANs provide all kinds of admin capabilities that supposedly help with uptime and performance guarantees and so on. So a Colo DC might not have any other option but shared storage! 

So if one is forced to use a SAN, how should you set up Cassandra is the interesting question - to me! Here are some thoughts:-
1. Ensure that each node gets dedicated - not shared - LUNs
2. Ensure that these LUNs do share spindles, or nodes will seize to be isolatable (this will be tough to get, given how SAN administrators think about this)
3. Most SANs deliver performance by striping (RAID 0) - sacrifice striping for isolation if push comes to shove
4. Do not share data directories from mutliple nodes onto a single location via NFS or CFS for example. They are cool in shared resource environments, but breaks the premise behind Cassandra. All data storage should be private to the cassandra node, even when on shared storage
5. Do not change any assumption around Replication Factor (RF) or Consistency Levle (CL) due to the shared storage - in fact if anything, increase your replication factor because you now have potential SPOF storage. 

My two - or maybe more - cents on the issue,

HTH,

-JA
On Fri, Jan 21, 2011 at 1:15 PM, Edward Capriolo <[hidden email]> wrote:
On Fri, Jan 21, 2011 at 12:07 PM, Jonathan Ellis <[hidden email]> wrote:
> On Fri, Jan 21, 2011 at 2:19 AM, Mick Semb Wever <[hidden email]> wrote:
>>
>>> Of course with a SAN you'd want RF=1 since it's replicating
>>> internally.
>>
>> Isn't this the same case for raid-5 as well?
>
> No, because the replication is (mainly) to protect you from machine
> failures; if the SAN is a SPOF then putting more replicas on it
> doesn't help.
>
>> And we want RF=2 if we need to keep reading while doing rolling
>> restarts?
>
> Yes.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

If you are using cassandra with a SAN RF=1 makes sense because we are
making the assumption the san is already replicating your data. RF2
makes good sense to be not effected by outages. Another alternative is
something like linux-HA and manage each cassandra instance as a
resource. This way if a head goes down another node linux ha would
detect the failure and bring up that instance on another physical
piece of hardware.

Using LinuxHA+SAN+Cassandra would actually bring Cassandra closer to
the hbase model which you have a distributed file system but the front
end Cassandra acts like a region server.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cassandra on iSCSI?

Mick Semb Wever
> So if one is forced to use a SAN, how should you set up Cassandra is

> the interesting question - to me! Here are some thoughts:-
> 1. Ensure that each node gets dedicated - not shared - LUNs
> 2. Ensure that these LUNs do share spindles, or nodes will seize to be
> isolatable (this will be tough to get, given how SAN administrators
> think about this)
> 3. Most SANs deliver performance by striping (RAID 0) - sacrifice
> striping for isolation if push comes to shove
> 4. Do not share data directories from multiple nodes onto a single
> location via NFS or CFS for example. They are cool in shared resource
> environments, but breaks the premise behind Cassandra. All data
> storage should be private to the cassandra node, even when on shared
> storage
> 5. Do not change any assumption around Replication Factor (RF) or
> Consistency Level (CL) due to the shared storage - in fact if
> anything, increase your replication factor because you now have
> potential SPOF storage.  
That was gold, and lead to a direct conversation between provider and
developer. Various tests showed IOPS will often be at 5k per node.
Therefore the iSCSI solution would need to be tailored to handle it.

Just like mentioned above our provider simply couldn't provide us so much
disk per server. But after a good discussion it became obvious (doh!)
that the application can actually save a lot of disk by using different
keyspaces with different RF. We have raw data that needs to be
collected, but can be temporarily unavailable for reading, hence RF=1
makes sense. This raw data is the vast bulk of the data so this saves
lots of disk space. The aggregated data, which is relatively small in
comparison, is critical for the application to read so we can keep in a
separate keyspace with higher RF...

~mck

--
“Anyone who lives within their means suffers from a lack of
imagination.” - Oscar Wilde
| http://semb.wever.org | http://sesat.no
| http://finn.no       | Java XSS Filter


signature.asc (205 bytes) Download Attachment
Loading...