MVCC

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

MVCC

Ivan Chang
Does Cassandra support MVCC?
 
I am building an application with concurrent updates (add, update, delete) and one of the requirements is to be able to run audits that reproduce all the update histories and the data objects in different versions.  What's the best way to go about this in Cassandra?  As long as histories and versions are maintained?  Does Cassandra support MVCC?
 
-Ivan
Reply | Threaded
Open this post in threaded view
|

Re: MVCC

Jun Rao

Ivan,

The original cassandra keeps multiple versions of the column data. However, that support has been removed in the apache code. Right now, only the latest version is kept. In the future, we could add the versioning support back.

Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA 95120-6099

[hidden email]


Inactive hide details for Ivan Chang ---08/03/2009 08:24:50 AM---Does Cassandra support MVCC? I am building an application withIvan Chang ---08/03/2009 08:24:50 AM---Does Cassandra support MVCC? I am building an application with concurrent updates (add, update, dele


From:

Ivan Chang <[hidden email]>

To:

[hidden email]

Date:

08/03/2009 08:24 AM

Subject:

MVCC




Does Cassandra support MVCC?
 
I am building an application with concurrent updates (add, update, delete) and one of the requirements is to be able to run audits that reproduce all the update histories and the data objects in different versions.  What's the best way to go about this in Cassandra?  As long as histories and versions are maintained?  Does Cassandra support MVCC?
 
-Ivan

Reply | Threaded
Open this post in threaded view
|

Re: MVCC

Chris Goffinet-2
How was it used in the original? 

On Aug 3, 2009, at 8:49 AM, Jun Rao wrote:

Ivan,

The original cassandra keeps multiple versions of the column data. However, that support has been removed in the apache code. Right now, only the latest version is kept. In the future, we could add the versioning support back.

Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA 95120-6099

[hidden email]


<graycol.gif>Ivan Chang ---08/03/2009 08:24:50 AM---Does Cassandra support MVCC? I am building an application with concurrent updates (add, update, dele

<ecblank.gif>
From:
<ecblank.gif>
Ivan Chang <[hidden email]>
<ecblank.gif>
To:
<ecblank.gif>
[hidden email]
<ecblank.gif>
Date:
<ecblank.gif>
08/03/2009 08:24 AM
<ecblank.gif>
Subject:
<ecblank.gif>
MVCC





Does Cassandra support MVCC?
 
I am building an application with concurrent updates (add, update, delete) and one of the requirements is to be able to run audits that reproduce all the update histories and the data objects in different versions.  What's the best way to go about this in Cassandra?  As long as histories and versions are maintained?  Does Cassandra support MVCC?
 
-Ivan


Reply | Threaded
Open this post in threaded view
|

Re: MVCC

Jonathan Ellis-3
In reply to this post by Jun Rao
On Mon, Aug 3, 2009 at 10:49 AM, Jun Rao<[hidden email]> wrote:
> Ivan,
>
> The original cassandra keeps multiple versions of the column data.

No, it didn't.  (It had versioning-related bugs but multiple versions
a la Bigtable was never part of the design.)

-Jonathan
Reply | Threaded
Open this post in threaded view
|

Re: MVCC

mobiledreamers
I always thought cassandra had free multiple versions and we needed to manually delete the older versions

On Mon, Aug 3, 2009 at 8:56 AM, Jonathan Ellis <[hidden email]> wrote:
On Mon, Aug 3, 2009 at 10:49 AM, Jun Rao<[hidden email]> wrote:
> Ivan,
>
> The original cassandra keeps multiple versions of the column data.

No, it didn't.  (It had versioning-related bugs but multiple versions
a la Bigtable was never part of the design.)

-Jonathan



--
Bidegg worlds best auction site
http://bidegg.com
Reply | Threaded
Open this post in threaded view
|

Re: MVCC

Mark McBride
If this is the case, what does the timestamp passed in to the remove
call do?  I assumed you had to have it match up with a specific
version...

On Mon, Aug 3, 2009 at 9:53 AM, <[hidden email]> wrote:

> I always thought cassandra had free multiple versions and we needed to
> manually delete the older versions
>
> On Mon, Aug 3, 2009 at 8:56 AM, Jonathan Ellis <[hidden email]> wrote:
>>
>> On Mon, Aug 3, 2009 at 10:49 AM, Jun Rao<[hidden email]> wrote:
>> > Ivan,
>> >
>> > The original cassandra keeps multiple versions of the column data.
>>
>> No, it didn't.  (It had versioning-related bugs but multiple versions
>> a la Bigtable was never part of the design.)
>>
>> -Jonathan
>
>
>
> --
> Bidegg worlds best auction site
> http://bidegg.com
>
Reply | Threaded
Open this post in threaded view
|

Re: MVCC

Jonathan Ellis-3
It's there for the same reason as the other timestamps: it lets
cassandra ignore obsolete operations.  So if you do a delete at time X
and an insert at time Y where X < Y, the insert will not be deleted by
mistake even if a node is down temporarily and gets the delete later.

-Jonathan

On Mon, Aug 3, 2009 at 11:59 AM, Mark McBride<[hidden email]> wrote:

> If this is the case, what does the timestamp passed in to the remove
> call do?  I assumed you had to have it match up with a specific
> version...
>
> On Mon, Aug 3, 2009 at 9:53 AM, <[hidden email]> wrote:
>> I always thought cassandra had free multiple versions and we needed to
>> manually delete the older versions
>>
>> On Mon, Aug 3, 2009 at 8:56 AM, Jonathan Ellis <[hidden email]> wrote:
>>>
>>> On Mon, Aug 3, 2009 at 10:49 AM, Jun Rao<[hidden email]> wrote:
>>> > Ivan,
>>> >
>>> > The original cassandra keeps multiple versions of the column data.
>>>
>>> No, it didn't.  (It had versioning-related bugs but multiple versions
>>> a la Bigtable was never part of the design.)
>>>
>>> -Jonathan
>>
>>
>>
>> --
>> Bidegg worlds best auction site
>> http://bidegg.com
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: MVCC

Wilson Mar
So if different servers are not synchronized in time (to a Tier 1 time
server), then updates from slower server will not be updated on faster
servers?
Reply | Threaded
Open this post in threaded view
|

Re: MVCC

Mark McBride
Thanks, that makes sense.  Is it an ok general rule that the
timestamps should be set to

1) The time that the data to be mutated was generated
2) The current system time if the time the data was mutated isn't available

Looking around at code it seems like time 0 is used a lot, which seems
pretty dangerous.

   ---Mark

On Mon, Aug 3, 2009 at 10:10 AM, Wilson Mar<[hidden email]> wrote:
> So if different servers are not synchronized in time (to a Tier 1 time
> server), then updates from slower server will not be updated on faster
> servers?
>
Reply | Threaded
Open this post in threaded view
|

Re: MVCC

Jonathan Ellis-3
In reply to this post by Wilson Mar
Strictly speaking, no; timestamp is client-provided.

But in the sense that "you'd better use ntpd on your clients," yes.

On Mon, Aug 3, 2009 at 12:10 PM, Wilson Mar<[hidden email]> wrote:
> So if different servers are not synchronized in time (to a Tier 1 time
> server), then updates from slower server will not be updated on faster
> servers?
>
Reply | Threaded
Open this post in threaded view
|

Re: MVCC

Jonathan Ellis-3
In reply to this post by Mark McBride
On Mon, Aug 3, 2009 at 12:12 PM, Mark McBride<[hidden email]> wrote:
> Thanks, that makes sense.  Is it an ok general rule that the
> timestamps should be set to
>
> 1) The time that the data to be mutated was generated
> 2) The current system time if the time the data was mutated isn't available

Yes.

> Looking around at code it seems like time 0 is used a lot, which seems
> pretty dangerous.

We do this in test code to make it obviously clock-independent, yes.

-Jonathan
Reply | Threaded
Open this post in threaded view
|

Re: MVCC

Mark McBride
Cool.  There are a few things I've found out recently that should
probably go into the wiki (this, the fact that get_columns_since
silently returns no results if your column family isn't ordered by
time)... is it moderated at all?  Should I run changes by the mailing
list?

On Mon, Aug 3, 2009 at 10:15 AM, Jonathan Ellis<[hidden email]> wrote:

> On Mon, Aug 3, 2009 at 12:12 PM, Mark McBride<[hidden email]> wrote:
>> Thanks, that makes sense.  Is it an ok general rule that the
>> timestamps should be set to
>>
>> 1) The time that the data to be mutated was generated
>> 2) The current system time if the time the data was mutated isn't available
>
> Yes.
>
>> Looking around at code it seems like time 0 is used a lot, which seems
>> pretty dangerous.
>
> We do this in test code to make it obviously clock-independent, yes.
>
> -Jonathan
>
Reply | Threaded
Open this post in threaded view
|

Re: MVCC

Jonathan Ellis-3
It's not moderated (click the login link to get to a signup form).
Changes are sent to the -commits list where anyone interested (like me
:) can review them.

-Jonathan

P.S. sorry for the signup captcha questions -- someone apparently
thought they were cute, but they typically take a bit of googling to
answer.  Not our fault! :)

On Mon, Aug 3, 2009 at 12:43 PM, Mark McBride<[hidden email]> wrote:

> Cool.  There are a few things I've found out recently that should
> probably go into the wiki (this, the fact that get_columns_since
> silently returns no results if your column family isn't ordered by
> time)... is it moderated at all?  Should I run changes by the mailing
> list?
>
> On Mon, Aug 3, 2009 at 10:15 AM, Jonathan Ellis<[hidden email]> wrote:
>> On Mon, Aug 3, 2009 at 12:12 PM, Mark McBride<[hidden email]> wrote:
>>> Thanks, that makes sense.  Is it an ok general rule that the
>>> timestamps should be set to
>>>
>>> 1) The time that the data to be mutated was generated
>>> 2) The current system time if the time the data was mutated isn't available
>>
>> Yes.
>>
>>> Looking around at code it seems like time 0 is used a lot, which seems
>>> pretty dangerous.
>>
>> We do this in test code to make it obviously clock-independent, yes.
>>
>> -Jonathan
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: MVCC

Ivan Chang
In reply to this post by Jonathan Ellis-3
Is this going to be an inherent limitation of Cassandra?
 
There is no doubt many applications will benefit from db with build-in support for mutliple versions of the same data - features that allow reversal of operations, applications that require historical data maintained (e.g. credit/debit application) for indefinite amount of time or number of versions.
 
It would be nice to be able to configure column famillies with versioning attrbutes.
Would we ever get that or we have to implement our own version stack in Cassandra.
 
-Ivan

On Mon, Aug 3, 2009 at 11:56 AM, Jonathan Ellis <[hidden email]> wrote:
On Mon, Aug 3, 2009 at 10:49 AM, Jun Rao<[hidden email]> wrote:
> Ivan,
>
> The original cassandra keeps multiple versions of the column data.

No, it didn't.  (It had versioning-related bugs but multiple versions
a la Bigtable was never part of the design.)

-Jonathan

Reply | Threaded
Open this post in threaded view
|

Re: MVCC

Evan Weaver
You can support this at the domain level with custom comparators, I
think. It doesn't need to be in Cassandra itself as a first-class
operation.

Evan

On Mon, Aug 3, 2009 at 1:39 PM, Ivan Chang<[hidden email]> wrote:

> Is this going to be an inherent limitation of Cassandra?
>
> There is no doubt many applications will benefit from db with build-in
> support for mutliple versions of the same data - features that allow
> reversal of operations, applications that require historical data maintained
> (e.g. credit/debit application) for indefinite amount of time or number of
> versions.
>
> It would be nice to be able to configure column famillies with versioning
> attrbutes.
> Would we ever get that or we have to implement our own version stack in
> Cassandra.
>
> -Ivan
>
> On Mon, Aug 3, 2009 at 11:56 AM, Jonathan Ellis <[hidden email]> wrote:
>>
>> On Mon, Aug 3, 2009 at 10:49 AM, Jun Rao<[hidden email]> wrote:
>> > Ivan,
>> >
>> > The original cassandra keeps multiple versions of the column data.
>>
>> No, it didn't.  (It had versioning-related bugs but multiple versions
>> a la Bigtable was never part of the design.)
>>
>> -Jonathan
>
>



--
Evan Weaver
Reply | Threaded
Open this post in threaded view
|

Re: MVCC

Jonathan Ellis-3
In reply to this post by Ivan Chang
On Mon, Aug 3, 2009 at 3:39 PM, Ivan Chang<[hidden email]> wrote:
> Is this going to be an inherent limitation of Cassandra?

If someone writes a patch that adds multi-version support without
compromising single-version performance then I don't see any reasons
to turn it down.

-Jonathan