Best practice: Multiple clusters vs multiple tables in a single cluster?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Best practice: Multiple clusters vs multiple tables in a single cluster?

Ian Rose
Hi all -

We currently have a single cassandra cluster that is dedicated to a relatively narrow purpose, with just 2 tables.  Soon we will need cassandra for another, unrelated, system, and my debate is whether to just add the new tables to our existing cassandra cluster or whether to spin up an entirely new, separate cluster for this new system.

Does anyone have pros/cons to share on this?  It appears from watching talks and such online that the big users (e.g. Netflix, Spotify) tend to favor multiple, single-purpose clusters, and thus that was my initial preference.  But we are (for now) no where close to them in traffic so I'm wondering if running an entirely separate cluster would be a premature optimization which wouldn't pay for the (nontrivial) overhead in configuration management and ops.  While we are still small it might be much smarter to reuse our existing clusters so that I can get it done faster...

Thanks!
- Ian

Reply | Threaded
Open this post in threaded view
|

Re: Best practice: Multiple clusters vs multiple tables in a single cluster?

Jack Krupansky-2
There is an old saying in the software industry: The structure of a system follows from the structure of the organization that created it (Conway's Law). Seriously, the main, first question for your end is who owns the applications in terms of executive management, such that if management makes a decision that dramatically affects the app's impact on the cluster, is it likely that they will have done so with the concurrence of management who owns the other app. Trust me, you do not want to be in the middle when two managers are in dispute over whose app is more important. IOW, if one manager owns both apps, you are probably safe, but if two different managers might have differing views of each other's priorities, tread with caution.

In any case, be prepared to move one of the apps to a different cluster if and when usage patterns cause them to conflict.

There is also the concept of devOps, where the app developers also own operations. You really can't have two separate development teams administer operations for one set of hardware.

If you are dedicated to operations for both app teams and the teams seem to be reasonably compatible, then it could be fine.

In short, sure, technically a single cluster can support  any number of key spaces, but mostly it will come down to whether there might be an excess of contention for load and operations of the cluster in production.

And then little things like software upgrades - one app might really need a disruptive or risky upgrade or need to bounce the entire cluster, but then the other app may be impacted even though it had no need for the upgrade or be bounced.

Are the apps synergistic in some way, such that there is an architectural benefit from running on the same hardware?

In the end, the simplest solution is typically the better solution, unless any of these other factors loom too large.


-- Jack Krupansky

On Thu, Apr 2, 2015 at 9:06 AM, Ian Rose <[hidden email]> wrote:
Hi all -

We currently have a single cassandra cluster that is dedicated to a relatively narrow purpose, with just 2 tables.  Soon we will need cassandra for another, unrelated, system, and my debate is whether to just add the new tables to our existing cassandra cluster or whether to spin up an entirely new, separate cluster for this new system.

Does anyone have pros/cons to share on this?  It appears from watching talks and such online that the big users (e.g. Netflix, Spotify) tend to favor multiple, single-purpose clusters, and thus that was my initial preference.  But we are (for now) no where close to them in traffic so I'm wondering if running an entirely separate cluster would be a premature optimization which wouldn't pay for the (nontrivial) overhead in configuration management and ops.  While we are still small it might be much smarter to reuse our existing clusters so that I can get it done faster...

Thanks!
- Ian


Reply | Threaded
Open this post in threaded view
|

Re: Best practice: Multiple clusters vs multiple tables in a single cluster?

Carlos Rolo
In reply to this post by Ian Rose
Adding a new keyspace should be perfectly fine. Unless you have completely distinct workloads for the different keyspaces. Even so you can balanced some stuff at keyspace/table level. But I would go with a new keyspace not with a new cluster given the small size you say you have.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant
 
Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649

On Thu, Apr 2, 2015 at 3:06 PM, Ian Rose <[hidden email]> wrote:
Hi all -

We currently have a single cassandra cluster that is dedicated to a relatively narrow purpose, with just 2 tables.  Soon we will need cassandra for another, unrelated, system, and my debate is whether to just add the new tables to our existing cassandra cluster or whether to spin up an entirely new, separate cluster for this new system.

Does anyone have pros/cons to share on this?  It appears from watching talks and such online that the big users (e.g. Netflix, Spotify) tend to favor multiple, single-purpose clusters, and thus that was my initial preference.  But we are (for now) no where close to them in traffic so I'm wondering if running an entirely separate cluster would be a premature optimization which wouldn't pay for the (nontrivial) overhead in configuration management and ops.  While we are still small it might be much smarter to reuse our existing clusters so that I can get it done faster...

Thanks!
- Ian



--



Reply | Threaded
Open this post in threaded view
|

Re: Best practice: Multiple clusters vs multiple tables in a single cluster?

Ian Rose
Thanks for the input, folks!

As a startup, we don't really have different dev teams / apps - everything is in service of "the product", so given these responses, I think putting both into the same cluster is the best idea.  And if we want to split them out in the future we are still small enough that it would be a pain but not the end of the world...

Cheers,
Ian


On Thu, Apr 2, 2015 at 9:57 AM, Carlos Rolo <[hidden email]> wrote:
Adding a new keyspace should be perfectly fine. Unless you have completely distinct workloads for the different keyspaces. Even so you can balanced some stuff at keyspace/table level. But I would go with a new keyspace not with a new cluster given the small size you say you have.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant
 
Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: <a href="tel:%2B1%20613%20565%208696%20x1649" value="+16135658696" target="_blank">+1 613 565 8696 x1649

On Thu, Apr 2, 2015 at 3:06 PM, Ian Rose <[hidden email]> wrote:
Hi all -

We currently have a single cassandra cluster that is dedicated to a relatively narrow purpose, with just 2 tables.  Soon we will need cassandra for another, unrelated, system, and my debate is whether to just add the new tables to our existing cassandra cluster or whether to spin up an entirely new, separate cluster for this new system.

Does anyone have pros/cons to share on this?  It appears from watching talks and such online that the big users (e.g. Netflix, Spotify) tend to favor multiple, single-purpose clusters, and thus that was my initial preference.  But we are (for now) no where close to them in traffic so I'm wondering if running an entirely separate cluster would be a premature optimization which wouldn't pay for the (nontrivial) overhead in configuration management and ops.  While we are still small it might be much smarter to reuse our existing clusters so that I can get it done faster...

Thanks!
- Ian



--




Reply | Threaded
Open this post in threaded view
|

Re: Best practice: Multiple clusters vs multiple tables in a single cluster?

Jack Krupansky-2
Sounds very appropriate for your situation.

Also... you have the option of creating separate data centers, so that one cluster can service multiple work loads, so you get the benefits of both worlds, but that would mean you need separate nodes for the different key spaces for your use case, so it would probably not be a big benefit to you until you reach the stage of having many nodes. This would let you manage the load between the various apps, without requiring separate clusters per se. Typically multi-DC operation relates to sharing the same data, so if you are really talking about disjoint key spaces, the benefit is not so great.

-- Jack Krupansky

On Thu, Apr 2, 2015 at 11:51 AM, Ian Rose <[hidden email]> wrote:
Thanks for the input, folks!

As a startup, we don't really have different dev teams / apps - everything is in service of "the product", so given these responses, I think putting both into the same cluster is the best idea.  And if we want to split them out in the future we are still small enough that it would be a pain but not the end of the world...

Cheers,
Ian


On Thu, Apr 2, 2015 at 9:57 AM, Carlos Rolo <[hidden email]> wrote:
Adding a new keyspace should be perfectly fine. Unless you have completely distinct workloads for the different keyspaces. Even so you can balanced some stuff at keyspace/table level. But I would go with a new keyspace not with a new cluster given the small size you say you have.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant
 
Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: <a href="tel:%2B1%20613%20565%208696%20x1649" value="+16135658696" target="_blank">+1 613 565 8696 x1649

On Thu, Apr 2, 2015 at 3:06 PM, Ian Rose <[hidden email]> wrote:
Hi all -

We currently have a single cassandra cluster that is dedicated to a relatively narrow purpose, with just 2 tables.  Soon we will need cassandra for another, unrelated, system, and my debate is whether to just add the new tables to our existing cassandra cluster or whether to spin up an entirely new, separate cluster for this new system.

Does anyone have pros/cons to share on this?  It appears from watching talks and such online that the big users (e.g. Netflix, Spotify) tend to favor multiple, single-purpose clusters, and thus that was my initial preference.  But we are (for now) no where close to them in traffic so I'm wondering if running an entirely separate cluster would be a premature optimization which wouldn't pay for the (nontrivial) overhead in configuration management and ops.  While we are still small it might be much smarter to reuse our existing clusters so that I can get it done faster...

Thanks!
- Ian



--





Reply | Threaded
Open this post in threaded view
|

Re: Best practice: Multiple clusters vs multiple tables in a single cluster?

daemeon reiydelle
In reply to this post by Jack Krupansky-2
Jack did a superb job of explaining all of your issues, and his last sentence seems to fit your needs (and my experience) very well. The only other point I would add is to ascertain if the use patterns commend microservices to abstract from data locality, even if the initial deployment is a noop to a single cluster. This depends on whether you see a rapid stream of special purpose business functions. A second question is about data access ... does Pig support your data access response times? Many clients find Hadoop ideally suited to a sophisticated ECTL (extract, cleanup, transformation, and load) model to fast, schema oriented, repositories like e.g. MySQL. All depends on the use case, growth & fragmentation expectations for your business model(s), etc.

Good luck.

PS, Jack thanks, for your succint comment.




On Thu, Apr 2, 2015 at 6:33 AM, Jack Krupansky <[hidden email]> wrote:
There is an old saying in the software industry: The structure of a system follows from the structure of the organization that created it (Conway's Law). Seriously, the main, first question for your end is who owns the applications in terms of executive management, such that if management makes a decision that dramatically affects the app's impact on the cluster, is it likely that they will have done so with the concurrence of management who owns the other app. Trust me, you do not want to be in the middle when two managers are in dispute over whose app is more important. IOW, if one manager owns both apps, you are probably safe, but if two different managers might have differing views of each other's priorities, tread with caution.

In any case, be prepared to move one of the apps to a different cluster if and when usage patterns cause them to conflict.

There is also the concept of devOps, where the app developers also own operations. You really can't have two separate development teams administer operations for one set of hardware.

If you are dedicated to operations for both app teams and the teams seem to be reasonably compatible, then it could be fine.

In short, sure, technically a single cluster can support  any number of key spaces, but mostly it will come down to whether there might be an excess of contention for load and operations of the cluster in production.

And then little things like software upgrades - one app might really need a disruptive or risky upgrade or need to bounce the entire cluster, but then the other app may be impacted even though it had no need for the upgrade or be bounced.

Are the apps synergistic in some way, such that there is an architectural benefit from running on the same hardware?

In the end, the simplest solution is typically the better solution, unless any of these other factors loom too large.


-- Jack Krupansky

On Thu, Apr 2, 2015 at 9:06 AM, Ian Rose <[hidden email]> wrote:
Hi all -

We currently have a single cassandra cluster that is dedicated to a relatively narrow purpose, with just 2 tables.  Soon we will need cassandra for another, unrelated, system, and my debate is whether to just add the new tables to our existing cassandra cluster or whether to spin up an entirely new, separate cluster for this new system.

Does anyone have pros/cons to share on this?  It appears from watching talks and such online that the big users (e.g. Netflix, Spotify) tend to favor multiple, single-purpose clusters, and thus that was my initial preference.  But we are (for now) no where close to them in traffic so I'm wondering if running an entirely separate cluster would be a premature optimization which wouldn't pay for the (nontrivial) overhead in configuration management and ops.  While we are still small it might be much smarter to reuse our existing clusters so that I can get it done faster...

Thanks!
- Ian