random n00b question

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

random n00b question

Eric Bowman
Hi,

I'm just getting started looking at Cassandra, and wondering if is a
possible fit around the shape of the problem we think we need to solve. :)

In a nutshell, what I'm wondering is whether it would be possible to use
Cassandra as front-end kind of session database, with commits eventually
(offline?) trickled serially into a legacy SQL database.

Could we achieve this kind of thing using the existing API, or would be
need to integrate somewhat more deeply?

One thing I wouldn't want to do, is limit the kind of range queries we
could do, in order to make this kind of thing work.

Kind of vague, but any suggestions much appreciated.

Thanks,
Eric

--
Eric Bowman
Boboco Ltd
[hidden email]
http://www.boboco.ie/ebowman/pubkey.pgp
+35318394189/+353872801532

Reply | Threaded
Open this post in threaded view
|

Re: random n00b question

Jonathan Ellis-3
On Mon, Sep 14, 2009 at 5:05 AM, Eric Bowman <[hidden email]> wrote:
> Hi,
>
> I'm just getting started looking at Cassandra, and wondering if is a
> possible fit around the shape of the problem we think we need to solve. :)
>
> In a nutshell, what I'm wondering is whether it would be possible to use
> Cassandra as front-end kind of session database, with commits eventually
> (offline?) trickled serially into a legacy SQL database.

I'd probably use some kind of queue-based approach so that you're not
blocking on the SQL db, but you don't have to repeatedly scan
Cassandra looking for stuff to pull over.

Other than "if your SQL db can't handle write + read load now, it
probably won't be able to handle write load soon enough" I don't see
any fundamental problems here. :)

-Jonathan
Reply | Threaded
Open this post in threaded view
|

Re: random n00b question

Matt Kydd
I'm looking at a similar use for Cass - storing sessions and some
denormalised data for fast frontend use.

The app will be Rails and was planned to be using Memcache for
partials, but I'm looking at ways I can get those in to Cass too -
eliminating Memcached altogether from the architecture.

MK
Reply | Threaded
Open this post in threaded view
|

Re: random n00b question

Joe Stump
I'd recommend still using Memcached for sessions. The reason is  
because Memcached has built in garbage collection of zombie sessions  
(via LRU) and Cassandra does not.

--Joe

On Sep 14, 2009, at 5:09 PM, Matt Kydd wrote:

> I'm looking at a similar use for Cass - storing sessions and some
> denormalised data for fast frontend use.
>
> The app will be Rails and was planned to be using Memcache for
> partials, but I'm looking at ways I can get those in to Cass too -
> eliminating Memcached altogether from the architecture.
>
> MK

Reply | Threaded
Open this post in threaded view
|

Re: random n00b question

Matt Kydd
We need to persist the sessions and associated shopping baskets /
activity summaries somewhere and Cass seems like a good fit, without
the restrictions imposed by SQL there would be less necessity to purge
old sessions.

I take the point on though and have made a note to do some sanity
checking on a session before persisting it.

MK

2009/9/15 Joe Stump <[hidden email]>:

> I'd recommend still using Memcached for sessions. The reason is because
> Memcached has built in garbage collection of zombie sessions (via LRU) and
> Cassandra does not.
>
> --Joe
>
> On Sep 14, 2009, at 5:09 PM, Matt Kydd wrote:
>
>> I'm looking at a similar use for Cass - storing sessions and some
>> denormalised data for fast frontend use.
>>
>> The app will be Rails and was planned to be using Memcache for
>> partials, but I'm looking at ways I can get those in to Cass too -
>> eliminating Memcached altogether from the architecture.
>>
>> MK
>
>
Reply | Threaded
Open this post in threaded view
|

Re: random n00b question

Mark Robson


2009/9/15 Matt Kydd <[hidden email]>
We need to persist the sessions and associated shopping baskets /
activity summaries somewhere and Cass seems like a good fit, without
the restrictions imposed by SQL there would be less necessity to purge
old sessions.

Purging the old sessions in Cassandra would be nontrivial. Moreover, as Cassandra doesn't give you consistency, it's a very bad session store.

Also seeing as session data are typically very small (If they're not, you have more problems), the motivation for storing them in Cassandra would be little.

Why not use a conventional database with some redundancy solution - you'll get consistency and for the volumes of data that a web site - even a very busy one - has in its sessions, it won't be a problem.

Mark
Reply | Threaded
Open this post in threaded view
|

Re: random n00b question

Eric Bowman
Mark Robson wrote:

> 2009/9/15 Matt Kydd <[hidden email] <mailto:[hidden email]>>
>
>     We need to persist the sessions and associated shopping baskets /
>     activity summaries somewhere and Cass seems like a good fit, without
>     the restrictions imposed by SQL there would be less necessity to purge
>     old sessions.
>
>
> Purging the old sessions in Cassandra would be nontrivial. Moreover,
> as Cassandra doesn't give you consistency, it's a very bad session store.
>
> Also seeing as session data are typically very small (If they're not,
> you have more problems), the motivation for storing them in Cassandra
> would be little.
>
> Why not use a conventional database with some redundancy solution -
> you'll get consistency and for the volumes of data that a web site -
> even a very busy one - has in its sessions, it won't be a problem.

With regard to consistency, is it not possible with Cassandra to achieve
"Session consistency" as described here:
http://www.allthingsdistributed.com/2008/12/eventually_consistent.html

Or, if not possible, not worth it?

Thanks,
Eric



Reply | Threaded
Open this post in threaded view
|

Re: random n00b question

Jonathan Ellis-3
We don't currently have any optimizations to provide "lightweight"
session consistency (see #132), but if you do quorum reads + quorum
writes then you are guaranteed to read the most recent write which
should be fine for most apps.

On Tue, Sep 15, 2009 at 5:30 AM, Eric Bowman <[hidden email]> wrote:

> Mark Robson wrote:
>>
>> 2009/9/15 Matt Kydd <[hidden email] <mailto:[hidden email]>>
>>
>>    We need to persist the sessions and associated shopping baskets /
>>    activity summaries somewhere and Cass seems like a good fit, without
>>    the restrictions imposed by SQL there would be less necessity to purge
>>    old sessions.
>>
>>
>> Purging the old sessions in Cassandra would be nontrivial. Moreover, as
>> Cassandra doesn't give you consistency, it's a very bad session store.
>>
>> Also seeing as session data are typically very small (If they're not, you
>> have more problems), the motivation for storing them in Cassandra would be
>> little.
>>
>> Why not use a conventional database with some redundancy solution - you'll
>> get consistency and for the volumes of data that a web site - even a very
>> busy one - has in its sessions, it won't be a problem.
>
> With regard to consistency, is it not possible with Cassandra to achieve
> "Session consistency" as described here:
> http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
>
> Or, if not possible, not worth it?
>
> Thanks,
> Eric
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: random n00b question

Mark Robson


2009/9/15 Jonathan Ellis <[hidden email]>
We don't currently have any optimizations to provide "lightweight"
session consistency (see #132), but if you do quorum reads + quorum
writes then you are guaranteed to read the most recent write which
should be fine for most apps.

Quorum read / write would be required, yes.

But the typical model used by web page session handlers also requires locking, otherwise you can lose data by concurrent updates.

Consider the read / modify / write scenario typically used, a traditional database might do:

BEGIN TRANSACTION;
SELECT sessiondata FROM sessions WHERE id='my session id' FOR UPDATE;
... with session in place, execute the page code, modifying sessiondata in memory
UPDATE sessions SET sessiondata='modified session data' WHERE id='my session id';
COMMIT;

Cassandra has no way to emulate this behaviour, therefore functionality would be lost if you moved from a traditional database session handler to Cassandra.

Even using quorum reads and writes, if a user in the same session has two pages active at once, session data would be trashed.

Mark
Reply | Threaded
Open this post in threaded view
|

Re: random n00b question

Jonathan Ellis-3
On Tue, Sep 15, 2009 at 10:09 AM, Mark Robson <[hidden email]> wrote:
> Even using quorum reads and writes, if a user in the same session has two
> pages active at once, session data would be trashed.

True.  But for most web apps I've seen, last-write-wins is just fine.  YMMV. :)

-Jonathan
Reply | Threaded
Open this post in threaded view
|

Re: random n00b question

Chris Goffinet-2
In reply to this post by Mark Robson

On Sep 15, 2009, at 8:09 AM, Mark Robson wrote:



2009/9/15 Jonathan Ellis <[hidden email]>
We don't currently have any optimizations to provide "lightweight"
session consistency (see #132), but if you do quorum reads + quorum
writes then you are guaranteed to read the most recent write which
should be fine for most apps.

Quorum read / write would be required, yes.

But the typical model used by web page session handlers also requires locking, otherwise you can lose data by concurrent updates.


Do you really expect a user to open up multiple tabs and start clicking concurrently? Is the use case for bots? Remember, if you're trying to capture a user's activity and think they might open up many windows, I wouldn't be saving that into a session in general. 

Consider the read / modify / write scenario typically used, a traditional database might do:

BEGIN TRANSACTION;
SELECT sessiondata FROM sessions WHERE id='my session id' FOR UPDATE;
... with session in place, execute the page code, modifying sessiondata in memory
UPDATE sessions SET sessiondata='modified session data' WHERE id='my session id';
COMMIT;


That's doable. But in best practice, just a very bad idea. You're adding overhead to what your trying to accomplish. If you're sticking all your data into the session, that might just be a bad idea in general. I worked at company where the previous programmer tried to get very clever, and add memcache locks for sessions. Cleverness is almost always a _bad_ idea.

Cassandra has no way to emulate this behaviour, therefore functionality would be lost if you moved from a traditional database session handler to Cassandra.

Even using quorum reads and writes, if a user in the same session has two pages active at once, session data would be trashed.

Mark

Reply | Threaded
Open this post in threaded view
|

Re: random n00b question

Mark Robson


2009/9/15 Chris Goffinet <[hidden email]>

Do you really expect a user to open up multiple tabs and start clicking concurrently? Is the use case for bots? Remember, if you're trying to capture a user's activity and think they might open up many windows, I wouldn't be saving that into a session in general. 

No, but what I often see is that web developers create pages with a frame set (or iframe, AJAX stuff etc) which try to share the session state on multiple simultaneous requests coming from the same browser in the same window. They then race to update the session; sometimes this works, sometimes it doesn't.

The session locks avoid this. I'm not sure whether they are a good idea in general.

Mark

Reply | Threaded
Open this post in threaded view
|

Re: random n00b question

contact-15
In reply to this post by Jonathan Ellis-3
Is there a specific reason to store the session data in the database? For my web-app, I use a memcached cluster, which alleviates the database load.

Shahan

On Tue, 15 Sep 2009 10:13:41 -0500, Jonathan Ellis 
wrote:
> On Tue, Sep 15, 2009 at 10:09 AM, Mark Robson  wrote:
>> Even using quorum reads and writes, if a user in the same session has two
>> pages active at once, session data would be trashed.
> 
> True.  But for most web apps I've seen, last-write-wins is just fine. 
> YMMV. :)
> 
> -Jonathan
Reply | Threaded
Open this post in threaded view
|

Re: random n00b question

Chris Goffinet-2
Using memcached as a write-back cache is good, using it solely for  
sessions IMHO is bad idea. Data can be easily pushed out if your  
thrashing the more common slabs (unless you really tuned it properly).

On Sep 15, 2009, at 10:56 PM, <[hidden email]> <[hidden email]>  
wrote:

> Is there a specific reason to store the session data in the  
> database? For my web-app, I use a memcached cluster, which  
> alleviates the database load.
>
> Shahan
>
> On Tue, 15 Sep 2009 10:13:41 -0500, Jonathan Ellis
> wrote:
> > On Tue, Sep 15, 2009 at 10:09 AM, Mark Robson  wrote:
> >> Even using quorum reads and writes, if a user in the same session  
> has two
> >> pages active at once, session data would be trashed.
> >
> > True.  But for most web apps I've seen, last-write-wins is just  
> fine.
> > YMMV. :)
> >
> > -Jonathan

Reply | Threaded
Open this post in threaded view
|

RE: random n00b question

daniel.granat
You mentioned earlier you also need persistance.
I think a good alternative can be memcacheDB or Tokyo Tyrant. Both will give you persistancy and the set an expiration.
Tokyo Tyrant has better performance.

Daniel.

-----Original Message-----
From: Chris Goffinet [mailto:[hidden email]]
Sent: Wednesday, September 16, 2009 9:00 AM
To: [hidden email]
Subject: Re: random n00b question

Using memcached as a write-back cache is good, using it solely for  
sessions IMHO is bad idea. Data can be easily pushed out if your  
thrashing the more common slabs (unless you really tuned it properly).

On Sep 15, 2009, at 10:56 PM, <[hidden email]> <[hidden email]>  
wrote:

> Is there a specific reason to store the session data in the  
> database? For my web-app, I use a memcached cluster, which  
> alleviates the database load.
>
> Shahan
>
> On Tue, 15 Sep 2009 10:13:41 -0500, Jonathan Ellis
> wrote:
> > On Tue, Sep 15, 2009 at 10:09 AM, Mark Robson  wrote:
> >> Even using quorum reads and writes, if a user in the same session  
> has two
> >> pages active at once, session data would be trashed.
> >
> > True.  But for most web apps I've seen, last-write-wins is just  
> fine.
> > YMMV. :)
> >
> > -Jonathan

Reply | Threaded
Open this post in threaded view
|

Re: random n00b question

contact-15
In reply to this post by Chris Goffinet-2
You're right! At the moment, I don't have enough users to make that
happen, but I do plan to switch to Redis, as its protocol is compatible
with memcached.

Shahan

On Tue, 15 Sep 2009 22:59:56 -0700, Chris Goffinet <[hidden email]>
wrote:

> Using memcached as a write-back cache is good, using it solely for  
> sessions IMHO is bad idea. Data can be easily pushed out if your  
> thrashing the more common slabs (unless you really tuned it properly).
>
> On Sep 15, 2009, at 10:56 PM, <[hidden email]> <[hidden email]>  
> wrote:
>
>> Is there a specific reason to store the session data in the  
>> database? For my web-app, I use a memcached cluster, which  
>> alleviates the database load.
>>
>> Shahan
>>
>> On Tue, 15 Sep 2009 10:13:41 -0500, Jonathan Ellis
>> wrote:
>> > On Tue, Sep 15, 2009 at 10:09 AM, Mark Robson  wrote:
>> >> Even using quorum reads and writes, if a user in the same session  
>> has two
>> >> pages active at once, session data would be trashed.
>> >
>> > True.  But for most web apps I've seen, last-write-wins is just  
>> fine.
>> > YMMV. :)
>> >
>> > -Jonathan