Viability of running on EC2

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Viability of running on EC2

Anthony Molinaro-5
Hi,

  I was wondering what the viability of running cassandra on ec2 was.
I believe that it currently runs on some pretty hefty hardware at
facebook, so I'm wondering what the minimum hardware config is
(in other words can I run it on a cluster of 2core 4GB machines)?
Also, running on Amazon means no multicast, network partitions and
machines just disappearing.  How does cassandra deal with these
constraints/failures?

Thanks for information,

-Anthony

--
------------------------------------------------------------------------
Anthony Molinaro                           <[hidden email]>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Viability of running on EC2

Jonathan Ellis-3
IMO the biggest downside to running on EC2 is that IO is terrible.  I
haven't done benchmarks, but anecdotally disk performance in
particular seems like an order of magnitude slower than you'd get on
non-virtual disks.  So that is worth investigating before assuming
that the price/performance on EC2 is what you think it is.

Other than that, Cassandra is designed to emphasize availability so it
should work fine in the situations you describe.  Hinted handoff in
particular will get writes to the right nodes quickly when machines
come back online.  (However, Cassandra is not yet good at dealing with
machines becoming permanently dead.)

Of course if _all_ of some keys' replicas are temporarily partitioned
off from you you won't be able to read that data until they are
visible again.

-Jonathan

On Sat, Jun 13, 2009 at 11:20 AM, Anthony
Molinaro<[hidden email]> wrote:

> Hi,
>
>  I was wondering what the viability of running cassandra on ec2 was.
> I believe that it currently runs on some pretty hefty hardware at
> facebook, so I'm wondering what the minimum hardware config is
> (in other words can I run it on a cluster of 2core 4GB machines)?
> Also, running on Amazon means no multicast, network partitions and
> machines just disappearing.  How does cassandra deal with these
> constraints/failures?
>
> Thanks for information,
>
> -Anthony
>
> --
> ------------------------------------------------------------------------
> Anthony Molinaro                           <[hidden email]>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Viability of running on EC2

Anthony Molinaro-5
And any problems with small memory boxes?  I see some chatter on the
cassandra development list about OOM errors.  Are they more prevalent
on smaller footprint boxes?

Thanks again,

-Anthony

On Sat, Jun 13, 2009 at 11:33:21AM -0500, Jonathan Ellis wrote:

> IMO the biggest downside to running on EC2 is that IO is terrible.  I
> haven't done benchmarks, but anecdotally disk performance in
> particular seems like an order of magnitude slower than you'd get on
> non-virtual disks.  So that is worth investigating before assuming
> that the price/performance on EC2 is what you think it is.
>
> Other than that, Cassandra is designed to emphasize availability so it
> should work fine in the situations you describe.  Hinted handoff in
> particular will get writes to the right nodes quickly when machines
> come back online.  (However, Cassandra is not yet good at dealing with
> machines becoming permanently dead.)
>
> Of course if _all_ of some keys' replicas are temporarily partitioned
> off from you you won't be able to read that data until they are
> visible again.
>
> -Jonathan
>
> On Sat, Jun 13, 2009 at 11:20 AM, Anthony
> Molinaro<[hidden email]> wrote:
> > Hi,
> >
> >  I was wondering what the viability of running cassandra on ec2 was.
> > I believe that it currently runs on some pretty hefty hardware at
> > facebook, so I'm wondering what the minimum hardware config is
> > (in other words can I run it on a cluster of 2core 4GB machines)?
> > Also, running on Amazon means no multicast, network partitions and
> > machines just disappearing.  How does cassandra deal with these
> > constraints/failures?
> >
> > Thanks for information,
> >
> > -Anthony
> >
> > --
> > ------------------------------------------------------------------------
> > Anthony Molinaro                           <[hidden email]>
> >

--
------------------------------------------------------------------------
Anthony Molinaro                           <[hidden email]>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Viability of running on EC2

Jonathan Ellis-3
https://issues.apache.org/jira/browse/CASSANDRA-208 is probably the
issue you are referring to.  It is fixed in trunk.

Our goal is to run most workloads fine with 1GB of heap out of the
box, which should be fine even on a small EC2 instance iirc.

See http://wiki.apache.org/cassandra/MemtableThresholds for tuning memory use.

-Jonathan

On Sat, Jun 13, 2009 at 3:10 PM, Anthony
Molinaro<[hidden email]> wrote:

> And any problems with small memory boxes?  I see some chatter on the
> cassandra development list about OOM errors.  Are they more prevalent
> on smaller footprint boxes?
>
> Thanks again,
>
> -Anthony
>
> On Sat, Jun 13, 2009 at 11:33:21AM -0500, Jonathan Ellis wrote:
>> IMO the biggest downside to running on EC2 is that IO is terrible.  I
>> haven't done benchmarks, but anecdotally disk performance in
>> particular seems like an order of magnitude slower than you'd get on
>> non-virtual disks.  So that is worth investigating before assuming
>> that the price/performance on EC2 is what you think it is.
>>
>> Other than that, Cassandra is designed to emphasize availability so it
>> should work fine in the situations you describe.  Hinted handoff in
>> particular will get writes to the right nodes quickly when machines
>> come back online.  (However, Cassandra is not yet good at dealing with
>> machines becoming permanently dead.)
>>
>> Of course if _all_ of some keys' replicas are temporarily partitioned
>> off from you you won't be able to read that data until they are
>> visible again.
>>
>> -Jonathan
>>
>> On Sat, Jun 13, 2009 at 11:20 AM, Anthony
>> Molinaro<[hidden email]> wrote:
>> > Hi,
>> >
>> >  I was wondering what the viability of running cassandra on ec2 was.
>> > I believe that it currently runs on some pretty hefty hardware at
>> > facebook, so I'm wondering what the minimum hardware config is
>> > (in other words can I run it on a cluster of 2core 4GB machines)?
>> > Also, running on Amazon means no multicast, network partitions and
>> > machines just disappearing.  How does cassandra deal with these
>> > constraints/failures?
>> >
>> > Thanks for information,
>> >
>> > -Anthony
>> >
>> > --
>> > ------------------------------------------------------------------------
>> > Anthony Molinaro                           <[hidden email]>
>> >
>
> --
> ------------------------------------------------------------------------
> Anthony Molinaro                           <[hidden email]>
>
Loading...