On Fri, Jul 31, 2009 at 5:42 PM, Colin Mollenhour<[hidden email]> wrote:
> This reply keeps getting blocked as spam so I am just sending to you
> Jonathan, thank you very much for the excellent response. If I may, a few
> more questions (inline):
> > One caveat is that the subcolumns of supercolumns are not indexed.
> > When you query those, Cassandra reads the entire Supercolumn into
> > memory. So they are best suited for small bunches of attributes, not
> > up to 60k events.
> Given that subcolumns of SCs are not indexed it seems that the only time it
> makes sense to use them is when some or most of the subcolumns will be
> needed within the same request, otherwise you could just have a separate
> simple CF for each sub-group of data. Is there any other reason to use a SC?
Most generally, it's useful when you want a dynamic "container", since
supercolumns can come into existence as needed but CFs are more
> For example on Evan Weavers blog post he gives this diagram:
> http://blog.evanweaver.com/files/cassandra/twitter.jpg with subcolumns
> user_timeline and home_timeline of the UserRelationships SC. But, because
> they will never be requested simultaneously, these would be better off if
> they were each their own simple CF, right?
That's what it looks like to me.
> > If the event names cannot clash with user names then you might just
> > put all of the data / event / permissions data in the same row without
> > extra namespacing. Otherwise, you will have to put each of those
> > types of data in a single row. Which is better depends on your query
> > needs. (My initial impression is the 2nd is a better fit for you
> > here.)
> I'm not sure I follow you here but the reason I had them as SC:CF is that
> pending_events is something I need to be able to add/remove from easily and
> permissions will always be retrieved as a full list. In many cases I think
> these will need to be fetched to serve the same request. What is the
> drawback of this approach that I am failing to see?
My impression was that pending_events is likely to be large, in which
case per the above it is a bad fit for a SC. Otherwise it is fine.
> > There's a related problem with your type index: Cassandra still
> > materializes entire rows in memory at compaction time (see
> > CASSANDRA-16). So for now you might want to split those across rows
> > as $type|$journalid, in a simple columnfamily with each row only about
> > that one journal. Then you can do range queries to get the journals
> > needed, then slice for the events as needed.
> Cool. Will it ever be possible to retrieve the actual columns from a range
> query rather than just the keys within the range?
Yes. The only question is when someone will need it enough to code it. :)
> > One other suggestion would be that it generally simplifies things to
> > use natural keys, rather than surrogate (_id keys). And if you do use
> > surrogate keys, use UUIDs rather than numeric counters.
> I am having trouble finding anything on how to use UUIDs. Even a search on
> the wiki for UUID has no results and all of the examples set the id
> explicitly.. How do I do this using the Thrift interface?
Column names are byte now, and a UUID is just 16 bytes laid out the
right way. How you generate the UUID in the first place and serialize
it to byte is going to be client language dependent. (For Python,
the tests in test/system/test_server.py have an example.)
> > No. If anything, you may not be denormalizing enough. Having CFs
> > like the event details off by itself when that's not directly needing
> > to be queried looks fishy.
> The take-away seems to be, "Design your schema as if you are using a
> key/value hash and then group CFs together under a SC only if they are
> frequently retrieved in-full by the same app request.". Is there a point at
> which this wouldn't be true because your data was so denormalized that you
> had too many indexes, or does that just mean that Cassandra is not a good
> fit for the application?
In general, Cassandra is a poor fit where you need to do lots of
ad-hoc queries. But I don't think that's what you have here.
|Free forum by Nabble||Edit this page|