row cache vs frequent row updates vs write through row cache

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

row cache vs frequent row updates vs write through row cache

Jeremy Davis-3
I saw in the Riptano "Tuning Cassandra" slide deck that the row cache can be detrimental if there are a lot of updates to the cached row. Is this because the cache is not write through, and every update necessitates creation of a new row?
I see there is an open issue: https://issues.apache.org/jira/browse/CASSANDRA-860  for implementing write through in 0.8.


-JD

Reply | Threaded
Open this post in threaded view
|

Re: row cache vs frequent row updates vs write through row cache

Brandon Williams
On Fri, Nov 5, 2010 at 1:41 PM, Jeremy Davis <[hidden email]> wrote:
I saw in the Riptano "Tuning Cassandra" slide deck that the row cache can be detrimental if there are a lot of updates to the cached row. Is this because the cache is not write through, and every update necessitates creation of a new row?
I see there is an open issue: https://issues.apache.org/jira/browse/CASSANDRA-860  for implementing write through in 0.8.

The problem is that if the row is being updated a lot, the cache is turning over quickly, and this exerts GC pressure on the JVM.  Even if it were write-through, row cache is probably a bad match for this kind of row, it's much better at mostly static rows.  Rely on keycache and OS file cache for these instead.

-Brandon 
Reply | Threaded
Open this post in threaded view
|

Re: row cache vs frequent row updates vs write through row cache

Jeremy Davis-3
What do you mean by "Turning Over quickly"? What is Turning over? If it needs to create an entirely new row, then that would create GC pressure for sure... But if you are just updating a column in a row that is already in the cache, then I would think that would be the optimal situation.

OTOH, you may be talking about continuously evicting rows from the cache (because the cache is too small )... Assuming that is not the case, should I turn on Row Cache?
In short, it seems like the general advice is unless you have a set of nearly static rows, AND they all fit in the cache, then rowcache is not recommended.

-JD

On Fri, Nov 5, 2010 at 11:49 AM, Brandon Williams <[hidden email]> wrote:
On Fri, Nov 5, 2010 at 1:41 PM, Jeremy Davis <[hidden email]> wrote:
I saw in the Riptano "Tuning Cassandra" slide deck that the row cache can be detrimental if there are a lot of updates to the cached row. Is this because the cache is not write through, and every update necessitates creation of a new row?
I see there is an open issue: https://issues.apache.org/jira/browse/CASSANDRA-860  for implementing write through in 0.8.

The problem is that if the row is being updated a lot, the cache is turning over quickly, and this exerts GC pressure on the JVM.  Even if it were write-through, row cache is probably a bad match for this kind of row, it's much better at mostly static rows.  Rely on keycache and OS file cache for these instead.

-Brandon 

Reply | Threaded
Open this post in threaded view
|

Re: row cache vs frequent row updates vs write through row cache

Dave Gardner
> In short, it seems like the general advice is unless you have a set of nearly static rows, AND they all fit in the cache, then rowcache is not recommended.

That's been our experience. Leave the memory for the OS cache instead.

Dave

On Friday, November 5, 2010, Jeremy Davis <[hidden email]> wrote:

> What do you mean by "Turning Over quickly"? What is Turning over? If it needs to create an entirely new row, then that would create GC pressure for sure... But if you are just updating a column in a row that is already in the cache, then I would think that would be the optimal situation.
>
> OTOH, you may be talking about continuously evicting rows from the cache (because the cache is too small )... Assuming that is not the case, should I turn on Row Cache?
> In short, it seems like the general advice is unless you have a set of nearly static rows, AND they all fit in the cache, then rowcache is not recommended.
>
> -JD
>
> On Fri, Nov 5, 2010 at 11:49 AM, Brandon Williams <[hidden email]> wrote:
>
> On Fri, Nov 5, 2010 at 1:41 PM, Jeremy Davis <[hidden email]> wrote:
>
>
> I saw in the Riptano "Tuning Cassandra" slide deck that the row cache can be detrimental if there are a lot of updates to the cached row. Is this because the cache is not write through, and every update necessitates creation of a new row?
> I see there is an open issue: https://issues.apache.org/jira/browse/CASSANDRA-860  for implementing write through in 0.8.
>
>
>
> The problem is that if the row is being updated a lot, the cache is turning over quickly, and this exerts GC pressure on the JVM.  Even if it were write-through, row cache is probably a bad match for this kind of row, it's much better at mostly static rows.  Rely on keycache and OS file cache for these instead.
>
>
>
> -Brandon
>
>

--
*Dave Gardner*
Technical Architect

[image: imagini_58mmX15mm.png]   [image: VisualDNA-Logo-small.png]

*Imagini Europe Limited*
7 Moor Street, London W1D 5NB

[image: phone_icon.png] +44 20 7734 7033
[image: skype_icon.png] daveg79
[image: emailIcon.png] [hidden email]
[image: icon-web.png] http://www.visualdna.com

Imagini Europe Limited, Company number 5565112 (England
and Wales), Registered address: c/o Bird & Bird,
90 Fetter Lane, London, EC4A 1EQ, United Kingdom
Reply | Threaded
Open this post in threaded view
|

Re: row cache vs frequent row updates vs write through row cache

Brandon Williams
In reply to this post by Jeremy Davis-3
On Fri, Nov 5, 2010 at 2:41 PM, Jeremy Davis <[hidden email]> wrote:
What do you mean by "Turning Over quickly"? What is Turning over? If it needs to create an entirely new row, then that would create GC pressure for sure... But if you are just updating a column in a row that is already in the cache, then I would think that would be the optimal situation.

That would be cheaper, yes, but ultimately it's still extra GC if the row is being mutated often.
 
OTOH, you may be talking about continuously evicting rows from the cache (because the cache is too small )... Assuming that is not the case, should I turn on Row Cache?

This is a problem too.  You can't make the cache huge because of GC pressure, and if your read pattern is largely random then the eviction will cause GC pressure.
 
In short, it seems like the general advice is unless you have a set of nearly static rows, AND they all fit in the cache, then rowcache is not recommended.

Row cache is good for a small amount of mostly static rows that are being read very often (and aren't enormous themselves.)  They don't all have to fit in the cache, but the working set should to avoid the eviction problem.  If that's not your use case, don't use row cache.
 
-Brandon