Cassandra + Hadoop + BMT

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Cassandra + Hadoop + BMT

Chris Goffinet-2
Hi Guys

This is long overdue but I have posted a very rough rough example  
(with Digg stuff removed) for getting BMT working with Cassandra.  
Patches are coming next up for the JIRA tickets. I'll try to get a  
more generic map/reduce job finished by end of the week that  
integrates Hive output.

http://github.com/lenn0x/Cassandra-Hadoop-BMT/tree/master

-Chris
Reply | Threaded
Open this post in threaded view
|

Re: Cassandra + Hadoop + BMT

Jonathan Ellis-3
Thanks, Chris!

On Mon, Aug 24, 2009 at 9:44 PM, Chris Goffinet<[hidden email]> wrote:

> Hi Guys
>
> This is long overdue but I have posted a very rough rough example (with Digg
> stuff removed) for getting BMT working with Cassandra. Patches are coming
> next up for the JIRA tickets. I'll try to get a more generic map/reduce job
> finished by end of the week that integrates Hive output.
>
> http://github.com/lenn0x/Cassandra-Hadoop-BMT/tree/master
>
> -Chris
>
Reply | Threaded
Open this post in threaded view
|

Re: Cassandra + Hadoop + BMT

Johan Oskarsson
In reply to this post by Chris Goffinet-2
I have slapped together a basic Hadoop 0.18 CassandraOutputFormat based
on the code Chris put up.

Usage:
conf.setOutputKeyClass(RowColumn.class);
conf.setOutputValueClass(BytesWritable.class);

conf.setOutputFormat(CassandraOutputFormat.class);
conf.set(CassandraOutputFormat.CONF_COLUMN_FAMILY_NAME, "columnfamilyname");
conf.set(CassandraOutputFormat.CONF_KEYSPACE, "keyspacename");

DistributedCache.addCacheFile(new URI("uri_to_storage-conf.xml"), conf);

+ your job specific settings.

Then after the job run this method: CassandraOutputFormat.forceFlush

Source code here:
http://github.com/johanoskarsson/cassandraoutputformat/tree/master

Big thanks to Chris for figuring out the mystery that is BinaryMemtable

/Johan

Chris Goffinet wrote:

> Hi Guys
>
> This is long overdue but I have posted a very rough rough example (with
> Digg stuff removed) for getting BMT working with Cassandra. Patches are
> coming next up for the JIRA tickets. I'll try to get a more generic
> map/reduce job finished by end of the week that integrates Hive output.
>
> http://github.com/lenn0x/Cassandra-Hadoop-BMT/tree/master
>
> -Chris