Quantcast

Should I use Cassandra for general purpose DB?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Should I use Cassandra for general purpose DB?

Soichi Hayashi
Hi.

So, I am interested in using Cassandra not because of large amount of data, but because of following reasons.

1) It's easy to administrate and handle fail-over (and scale, of course)
2) Easy to write an application that makes sense to developers (Developers' fully in control of how data is orchestrated - indexed, queried, etc..)
3) Easy to expand an application to some extend - as long as changes only applies to adding /removing new column (not column family..)

Are these good enough reasons to start experimenting with Cassandra as a general purpose data store? Or Cassandra, or any NOSQL solution really makes no sense if you don't have or expect to have TB of data?

For bullet 3) above.. If I have 100 nodes that runs Cassandra, and want to add a new table (..ColumnFamily) does that mean I have to update storage.xml on all 100 nodes and restart them? For example, if user wants me to add a capability to sort "stuff" in ways that I haven't supported yet, I might have to do following.

1. Create a new ColumnFamily that orders "stuff" based on a new foreign key currently stored inside one of column for "stuff".
2. Populate this new ColumnFamily based on all "stuff" records currently exist.
3. Update application that access this new ColumnFamily for new sort options.
4. Update application so that everytime "stuff" is added or removed, also update this new ColumnFamily.
5. Update the storage.xml on ALL nodes in the cluster and restart them!

If I use a regular DB, I only have to do 3.. Does this mean, unless I have some *very* stable application that no such user requirement could happen, I should stick to using a regular DB? If this is the case, Cassandra only makes sense in some special case where size of the data simply does not work for regular DB (meaning - if data size is not an issue stick to regular DB).

Thanks,
Soichi

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Should I use Cassandra for general purpose DB?

Miguel Verde
On Wed, Apr 21, 2010 at 12:56 PM, Soichi Hayashi <[hidden email]> wrote:
So, I am interested in using Cassandra not because of large amount of data, but because of following reasons.

1) It's easy to administrate and handle fail-over (and scale, of course)
2) Easy to write an application that makes sense to developers (Developers' fully in control of how data is orchestrated - indexed, queried, etc..)
3) Easy to expand an application to some extend - as long as changes only applies to adding /removing new column (not column family..)

Are these good enough reasons to start experimenting with Cassandra as a general purpose data store? Or Cassandra, or any NOSQL solution really makes no sense if you don't have or expect to have TB of data?
You don't need a good reason to experiment, go for it!  Those are all accurate points in Cassandra's favor. There are many potential arguments about actually adopting such a solution for production use, but personally if I didn't have or foresee scalability or availability problems Cassandra would not be my choice.
 
For bullet 3) above.. If I have 100 nodes that runs Cassandra, and want to add a new table (..ColumnFamily) does that mean I have to update storage.xml on all 100 nodes and restart them?
 
Currently, yes.  You can do a rolling restart, so the cluster remains up the whole time, but the nodes would need to be restarted.  However, 0.7 will include https://issues.apache.org/jira/browse/CASSANDRA-44 (live schema updates), and this problem will finally go away.

Loading...