Quantcast

one server or more servers?

classic Classic list List threaded Threaded
48 messages Options
123
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

one server or more servers?

mobiledreamers
I have 3 productions servers, is it better to

A. start the cassandra in one node and add other seeds later
or
B. Start cassandra in all the 3 nodes

if i do A, when i later add 2 nodes ,will cassandra pick up the other two nodes and start distributing the loads fairly

Rephrasing the question, does cassandra do the dynamic networks properly

Thanks for the input

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: one server or more servers?

Mark Robson


2009/7/14 <[hidden email]>
I have 3 productions servers, is it better to

A. start the cassandra in one node and add other seeds later
or
B. Start cassandra in all the 3 nodes

if i do A, when i later add 2 nodes ,will cassandra pick up the other two nodes and start distributing the loads fairly

My guess would be:

1. If you only have 3 production servers, Cassandra may not do much for you. You will probably only care if you have lots more servers. 3 servers is a reasonable minimum for a test / dev environment.

2. All of your servers should have static IPs. You should make sure that at least 2-3 of them are unlikely to go away, and put those in as seeds, the other servers can come and go and change IP address etc.

I would set up 2-3 servers which I expected to be unlikely to go away (i.e. they won't be taken out any time soon), and code their IPs into the seeds. The other servers can use those to find each other.

Also your ops team should then be aware, that if they get rid of those "seed" servers, at some point new boxes should be deployed to take over those IPs so there are always at least two actively running Cassandra, that way your other nodes can find one another.

Having only one seed server would place a single point of failure, which you don't want.

If you have a segmented network (e.g. routed, different racks, different datacentres with VPN between them etc), you could put two seeds in each segment, which would make discovery tolerant of a partition.

But having said that, it's relatively inconvenient to have a large number of seeds as you'd need to keep deploying new config files to all your nodes.

Mark
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Scaling from 1 to x (was: one server or more servers?)

Johan Stuyts-4
> 1. If you only have 3 production servers, Cassandra may not do much for  
> you.
> You will probably only care if you have lots more servers. 3 servers is a
> reasonable minimum for a test / dev environment.

I suspect I will use 1 Cassandra-server when I start deploying my  
application. I want to prevent having to change my application code when  
the need arises to scale to x servers.

Is it unwise to use Cassandra in production if you use less than n  
servers? I.e. is it better to use another solution for <n servers and  
switch to Cassandra once n is reached?

Regards,

Johan Stuyts
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Scaling from 1 to x (was: one server or more servers?)

Mark Robson

2009/7/14 Johan Stuyts <[hidden email]>
Is it unwise to use Cassandra in production if you use less than n servers? I.e. is it better to use another solution for <n servers and switch to Cassandra once n is reached?

If you are not sure whether N will ever be reached, then you don't need to deploy Cassandra until you reach a point where you're sure it will be reached.

If your application's scale is planned (i.e. by management who do planning-type things) to exceed what you can reasonably get out of a conventional database (with or without various types of scale-out solution), then Cassandra might be the right solution for you.

I feel that developing an application for Cassandra is a lot more difficult than a "traditional" database, and it's also a fairly immature product. There are many drawbacks, such as lack of visibility of storage usage.

I think that Cassandra is likely to become a very compelling system for large scale problems quite soon, but I don't think it's there yet.

Mark
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Scaling from 1 to x (was: one server or more servers?)

Johan Stuyts-4
> If you are not sure whether N will ever be reached, then you don't need  
> to deploy Cassandra until you reach a point where you're sure it will be  
> reached.

No, I am not sure I will even get past one, just hopeful.

> If your application's scale is planned (i.e. by management who do  
> planning-type things) to exceed what you can reasonably get out of a  
> conventional database (with or without various types of scale-out  
> solution), then Cassandra might be the right solution for you.

I am not only interested in Cassandra because of its load balancing,  
scaling and failover properties. If I understand correctly it is also an  
extremely fast datastore. Wouldn't I save a lot of effort to design and  
build a data access layer (sharding, replication and caching) by using  
Cassandra?

One of the purposes I want to use Cassandra for is custom HTTP session  
replication. Instead of storing the values in the session of the servlet  
container I want to store them individually using unique keys in  
Cassandra. I was hoping Cassandra would be fast enough for this.

> I feel that developing an application for Cassandra is a lot more  
> difficult than a "traditional" database, ...

What is a lot more difficult to do using Cassandra? I intend to use a SQL  
database for all the really important stuff (credentials and stuff  
involving money), and use Cassandra for less important information. I  
understand I have to think about doing things without transactions and  
designing things to be idempotent.

Regards,

Johan
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Scaling from 1 to x (was: one server or more servers?)

Mark Robson


2009/7/14 Johan Stuyts <[hidden email]>
One of the purposes I want to use Cassandra for is custom HTTP session replication. Instead of storing the values in the session of the servlet container I want to store them individually using unique keys in Cassandra. I was hoping Cassandra would be fast enough for this.

Fast enough - yes. Consistent enough - probably not.

As it's a http session application, you really, really want writes to be visible from anywhere immediately. If you're using it for authentication (for example "keep me logged in for today" type function), then you'll need to get the absolute latest value from the session.

The user will log on and expect to be logged on immediately, and be able to switch to different web servers transparently without being asked to log on again.

Cassandra doesn't provide the guarantees about the latest changes being available from any given node, so you can't really use it in such an application.

I don't know if the "blocking" variants of the write operations make any more guarantees, if they do then it might be suitable.

Mark
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Scaling from 1 to x (was: one server or more servers?)

Johan Stuyts-4
> Cassandra doesn't provide the guarantees about the latest changes being
> available from any given node, so you can't really use it in such an
> application.
>
> I don't know if the "blocking" variants of the write operations make any
> more guarantees, if they do then it might be suitable.

I will look into this. Thanks for your help.

Regards,

Johan
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Scaling from 1 to x (was: one server or more servers?)

Jonathan Ellis-3
In reply to this post by Mark Robson
On Tue, Jul 14, 2009 at 8:33 AM, Mark Robson<[hidden email]> wrote:
> Cassandra doesn't provide the guarantees about the latest changes being
> available from any given node, so you can't really use it in such an
> application.
>
> I don't know if the "blocking" variants of the write operations make any
> more guarantees, if they do then it might be suitable.

Yes, quorum write/read would work just fine here.

-Jonathan
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Scaling from 1 to x (was: one server or more servers?)

Mark Robson


2009/7/14 Jonathan Ellis <[hidden email]>
On Tue, Jul 14, 2009 at 8:33 AM, Mark Robson<[hidden email]> wrote:
> Cassandra doesn't provide the guarantees about the latest changes being
> available from any given node, so you can't really use it in such an
> application.
>
> I don't know if the "blocking" variants of the write operations make any
> more guarantees, if they do then it might be suitable.

Yes, quorum write/read would work just fine here.

Are those the type of writes which you get by setting the "block" parameter to 1?

Mark
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Scaling from 1 to x (was: one server or more servers?)

Jonathan Ellis-3
There are several interesting values you can pass to block_for:

0: fire-and-forget.  minimizes latency when that is more important
than robustness
1: wait for at least one node to fully ack the write before returning
(the other replicas will be finished in the background)
N/2 + 1, where N is the number of replicas: this is a quorum write;
combined with quorum reads, it means you can tolerate up to N - (N/2 +
1) nodes failing before you can get inconsistent results.  (which is
usually better than no results at all.)
N: guarantees consistent reads without having to wait for a quorum, so
you trade write latency and availability (since the write will fail if
one of the target nodes is down) for 100% consistency and reduced read
latency

-Jonathan

On Tue, Jul 14, 2009 at 9:18 AM, Mark Robson<[hidden email]> wrote:

>
>
> 2009/7/14 Jonathan Ellis <[hidden email]>
>>
>> On Tue, Jul 14, 2009 at 8:33 AM, Mark Robson<[hidden email]> wrote:
>> > Cassandra doesn't provide the guarantees about the latest changes being
>> > available from any given node, so you can't really use it in such an
>> > application.
>> >
>> > I don't know if the "blocking" variants of the write operations make any
>> > more guarantees, if they do then it might be suitable.
>>
>> Yes, quorum write/read would work just fine here.
>
> Are those the type of writes which you get by setting the "block" parameter
> to 1?
>
> Mark
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Scaling from 1 to x (was: one server or more servers?)

Matt Revelle-2
Is that documented anywhere?  I've been wondering how block_for should  
be used...

On Jul 14, 2009, at 10:26 AM, Jonathan Ellis wrote:

> There are several interesting values you can pass to block_for:
>
> 0: fire-and-forget.  minimizes latency when that is more important
> than robustness
> 1: wait for at least one node to fully ack the write before returning
> (the other replicas will be finished in the background)
> N/2 + 1, where N is the number of replicas: this is a quorum write;
> combined with quorum reads, it means you can tolerate up to N - (N/2 +
> 1) nodes failing before you can get inconsistent results.  (which is
> usually better than no results at all.)
> N: guarantees consistent reads without having to wait for a quorum, so
> you trade write latency and availability (since the write will fail if
> one of the target nodes is down) for 100% consistency and reduced read
> latency
>
> -Jonathan
>
> On Tue, Jul 14, 2009 at 9:18 AM, Mark Robson<[hidden email]> wrote:
>>
>>
>> 2009/7/14 Jonathan Ellis <[hidden email]>
>>>
>>> On Tue, Jul 14, 2009 at 8:33 AM, Mark Robson<[hidden email]>  
>>> wrote:
>>>> Cassandra doesn't provide the guarantees about the latest changes  
>>>> being
>>>> available from any given node, so you can't really use it in such  
>>>> an
>>>> application.
>>>>
>>>> I don't know if the "blocking" variants of the write operations  
>>>> make any
>>>> more guarantees, if they do then it might be suitable.
>>>
>>> Yes, quorum write/read would work just fine here.
>>
>> Are those the type of writes which you get by setting the "block"  
>> parameter
>> to 1?
>>
>> Mark
>>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Scaling from 1 to x (was: one server or more servers?)

Michael Greene
Cassandra borrows many concepts from Dynamo, and its paper describes
this well: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf

http://wiki.apache.org/cassandra/DataModelAndOperations contains some
documentation that references block_for, but this documentation needs
to be expanded.

Michael

On Tue, Jul 14, 2009 at 9:33 AM, Matt Revelle<[hidden email]> wrote:

> Is that documented anywhere?  I've been wondering how block_for should be
> used...
>
> On Jul 14, 2009, at 10:26 AM, Jonathan Ellis wrote:
>
>> There are several interesting values you can pass to block_for:
>>
>> 0: fire-and-forget.  minimizes latency when that is more important
>> than robustness
>> 1: wait for at least one node to fully ack the write before returning
>> (the other replicas will be finished in the background)
>> N/2 + 1, where N is the number of replicas: this is a quorum write;
>> combined with quorum reads, it means you can tolerate up to N - (N/2 +
>> 1) nodes failing before you can get inconsistent results.  (which is
>> usually better than no results at all.)
>> N: guarantees consistent reads without having to wait for a quorum, so
>> you trade write latency and availability (since the write will fail if
>> one of the target nodes is down) for 100% consistency and reduced read
>> latency
>>
>> -Jonathan
>>
>> On Tue, Jul 14, 2009 at 9:18 AM, Mark Robson<[hidden email]> wrote:
>>>
>>>
>>> 2009/7/14 Jonathan Ellis <[hidden email]>
>>>>
>>>> On Tue, Jul 14, 2009 at 8:33 AM, Mark Robson<[hidden email]> wrote:
>>>>>
>>>>> Cassandra doesn't provide the guarantees about the latest changes being
>>>>> available from any given node, so you can't really use it in such an
>>>>> application.
>>>>>
>>>>> I don't know if the "blocking" variants of the write operations make
>>>>> any
>>>>> more guarantees, if they do then it might be suitable.
>>>>
>>>> Yes, quorum write/read would work just fine here.
>>>
>>> Are those the type of writes which you get by setting the "block"
>>> parameter
>>> to 1?
>>>
>>> Mark
>>>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: one server or more servers?

mobiledreamers
In reply to this post by Mark Robson
2. All of your servers should have static IPs. You should make sure that at least 2-3 of them are unlikely to go away, and put those in as seeds, the other servers can come and go and change IP address etc.

I would set up 2-3 servers which I expected to be unlikely to go away (i.e. they won't be taken out any time soon), and code their IPs into the seeds. The other servers can use those to find each other.


Hey mark
thanks for the detailed reply explaining the example of Seeds

How do we add servers other than Seeds as there is no such place in conf file

thanks

On Tue, Jul 14, 2009 at 3:00 AM, Mark Robson <[hidden email]> wrote:


2009/7/14 <[hidden email]>
I have 3 productions servers, is it better to

A. start the cassandra in one node and add other seeds later
or
B. Start cassandra in all the 3 nodes

if i do A, when i later add 2 nodes ,will cassandra pick up the other two nodes and start distributing the loads fairly

My guess would be:

1. If you only have 3 production servers, Cassandra may not do much for you. You will probably only care if you have lots more servers. 3 servers is a reasonable minimum for a test / dev environment.

2. All of your servers should have static IPs. You should make sure that at least 2-3 of them are unlikely to go away, and put those in as seeds, the other servers can come and go and change IP address etc.

I would set up 2-3 servers which I expected to be unlikely to go away (i.e. they won't be taken out any time soon), and code their IPs into the seeds. The other servers can use those to find each other.

Also your ops team should then be aware, that if they get rid of those "seed" servers, at some point new boxes should be deployed to take over those IPs so there are always at least two actively running Cassandra, that way your other nodes can find one another.

Having only one seed server would place a single point of failure, which you don't want.

If you have a segmented network (e.g. routed, different racks, different datacentres with VPN between them etc), you could put two seeds in each segment, which would make discovery tolerant of a partition.

But having said that, it's relatively inconvenient to have a large number of seeds as you'd need to keep deploying new config files to all your nodes.

Mark



--
Bidegg worlds best auction site
http://bidegg.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: one server or more servers?

Jonathan Ellis-3
gossip distributes the cluster status.

the seeds are there to be an initial contact point.

On Tue, Jul 14, 2009 at 10:04 AM, <[hidden email]> wrote:
> Hey mark
> thanks for the detailed reply explaining the example of Seeds
>
> How do we add servers other than Seeds as there is no such place in conf
> file
>
> thanks
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: one server or more servers?

Mark Robson
In reply to this post by mobiledreamers


2009/7/14 <[hidden email]>
How do we add servers other than Seeds as there is no such place in conf file

Servers other than seeds are automatically picked up by the cluster when they start up; the nodes talk amongst themselves to figure out who's there.

Only the seeds need to be explicitly configured.

This is a Good Thing :)

Mark
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: one server or more servers?

mobiledreamers
Mark and Jonathan
I m lost here

Dont we need to specify atleast the server ip address in  the conf file. How would cassandra know which ips they are running in ie the other servers.

I can see there is a way to specify seed but how would the seeds pick up the other servers if they do not know their ip address

Also given the unlimited # of ips it cannot jus go thru each one of the ips and ping 7001

Servers other than seeds are automatically picked up by the cluster when they start up; the nodes talk amongst themselves to figure out who's there.
On Tue, Jul 14, 2009 at 8:06 AM, Mark Robson <[hidden email]> wrote:


2009/7/14 <[hidden email]>
How do we add servers other than Seeds as there is no such place in conf file

Servers other than seeds are automatically picked up by the cluster when they start up; the nodes talk amongst themselves to figure out who's there.

Only the seeds need to be explicitly configured.

This is a Good Thing :)

Mark



--
Bidegg worlds best auction site
http://bidegg.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: one server or more servers?

mobiledreamers
i think i get it

the other servers specify the seeds so they join the cluster

On Tue, Jul 14, 2009 at 8:10 AM, <[hidden email]> wrote:
Mark and Jonathan
I m lost here

Dont we need to specify atleast the server ip address in  the conf file. How would cassandra know which ips they are running in ie the other servers.

I can see there is a way to specify seed but how would the seeds pick up the other servers if they do not know their ip address

Also given the unlimited # of ips it cannot jus go thru each one of the ips and ping 7001

Servers other than seeds are automatically picked up by the cluster when they start up; the nodes talk amongst themselves to figure out who's there.
On Tue, Jul 14, 2009 at 8:06 AM, Mark Robson <[hidden email]> wrote:


2009/7/14 <[hidden email]>
How do we add servers other than Seeds as there is no such place in conf file

Servers other than seeds are automatically picked up by the cluster when they start up; the nodes talk amongst themselves to figure out who's there.

Only the seeds need to be explicitly configured.

This is a Good Thing :)

Mark



--
Bidegg worlds best auction site
http://bidegg.com



--
Bidegg worlds best auction site
http://bidegg.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: one server or more servers?

Jonathan Ellis-3
In reply to this post by mobiledreamers
the new servers contact the seeds, not the other way around

On Tue, Jul 14, 2009 at 10:10 AM, <[hidden email]> wrote:

> Mark and Jonathan
> I m lost here
> Dont we need to specify atleast the server ip address in  the conf file. How
> would cassandra know which ips they are running in ie the other servers.
> I can see there is a way to specify seed but how would the seeds pick up the
> other servers if they do not know their ip address
> Also given the unlimited # of ips it cannot jus go thru each one of the ips
> and ping 7001
> Servers other than seeds are automatically picked up by the cluster when
> they start up; the nodes talk amongst themselves to figure out who's there.
> On Tue, Jul 14, 2009 at 8:06 AM, Mark Robson <[hidden email]> wrote:
>>
>>
>> 2009/7/14 <[hidden email]>
>>>
>>> How do we add servers other than Seeds as there is no such place in conf
>>> file
>>
>> Servers other than seeds are automatically picked up by the cluster when
>> they start up; the nodes talk amongst themselves to figure out who's there.
>>
>> Only the seeds need to be explicitly configured.
>>
>> This is a Good Thing :)
>>
>> Mark
>
>
>
> --
> Bidegg worlds best auction site
> http://bidegg.com
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: one server or more servers?

Matt Revelle-2
In reply to this post by mobiledreamers
Your new node will contact a seed.

On Jul 14, 2009, at 11:10 AM, [hidden email] wrote:

Mark and Jonathan
I m lost here

Dont we need to specify atleast the server ip address in  the conf file. How would cassandra know which ips they are running in ie the other servers.

I can see there is a way to specify seed but how would the seeds pick up the other servers if they do not know their ip address

Also given the unlimited # of ips it cannot jus go thru each one of the ips and ping 7001

Servers other than seeds are automatically picked up by the cluster when they start up; the nodes talk amongst themselves to figure out who's there.
On Tue, Jul 14, 2009 at 8:06 AM, Mark Robson <[hidden email]> wrote:


2009/7/14 <[hidden email]>
How do we add servers other than Seeds as there is no such place in conf file

Servers other than seeds are automatically picked up by the cluster when they start up; the nodes talk amongst themselves to figure out who's there.

Only the seeds need to be explicitly configured.

This is a Good Thing :)

Mark



--
Bidegg worlds best auction site
http://bidegg.com

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: one server or more servers?

mobiledreamers
In reply to this post by mobiledreamers
But since the other servers join the cluster

is there a limitation of where reads/writes can go ie.,

reads can go to all servers - seeds+nonseeds

writes can go only to seeds

On Tue, Jul 14, 2009 at 8:11 AM, <[hidden email]> wrote:
i think i get it

the other servers specify the seeds so they join the cluster

On Tue, Jul 14, 2009 at 8:10 AM, <[hidden email]> wrote:
Mark and Jonathan
I m lost here

Dont we need to specify atleast the server ip address in  the conf file. How would cassandra know which ips they are running in ie the other servers.

I can see there is a way to specify seed but how would the seeds pick up the other servers if they do not know their ip address

Also given the unlimited # of ips it cannot jus go thru each one of the ips and ping 7001

Servers other than seeds are automatically picked up by the cluster when they start up; the nodes talk amongst themselves to figure out who's there.
On Tue, Jul 14, 2009 at 8:06 AM, Mark Robson <[hidden email]> wrote:


2009/7/14 <[hidden email]>
How do we add servers other than Seeds as there is no such place in conf file

Servers other than seeds are automatically picked up by the cluster when they start up; the nodes talk amongst themselves to figure out who's there.

Only the seeds need to be explicitly configured.

This is a Good Thing :)

Mark



--
Bidegg worlds best auction site
http://bidegg.com



--
Bidegg worlds best auction site
http://bidegg.com



--
Bidegg worlds best auction site
http://bidegg.com
123
Loading...