Thoughts on adding complex queries to Cassandra

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Thoughts on adding complex queries to Cassandra

Jeremy Davis-3

Are there any thoughts on adding a more complex query to Cassandra?

At a high level what I'm wondering is: Would it be possible/desirable/in keeping with the Cassandra plan, to add something like a javascript blob on to a get range slice etc, that does some further filtering on the results before returning them. The goal being to trade off some CPU on Cassandra for network bandwidth.

-JD
Reply | Threaded
Open this post in threaded view
|

RE: Thoughts on adding complex queries to Cassandra

Nicholas Sun

I’m very curious on this topic as well.  Mainly, I’d like to know is this functionality handled through Map/Reduce HADOOP operations?

 

Nick

 

From: Jeremy Davis [mailto:[hidden email]]
Sent: Wednesday, May 26, 2010 3:31 PM
To: [hidden email]
Subject: Thoughts on adding complex queries to Cassandra

 


Are there any thoughts on adding a more complex query to Cassandra?

At a high level what I'm wondering is: Would it be possible/desirable/in keeping with the Cassandra plan, to add something like a javascript blob on to a get range slice etc, that does some further filtering on the results before returning them. The goal being to trade off some CPU on Cassandra for network bandwidth.

-JD

Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on adding complex queries to Cassandra

Jonathan Ellis-3
In reply to this post by Jeremy Davis-3
There definitely seems to be demand for something like this.  Maybe for 0.8?

On Wed, May 26, 2010 at 4:31 PM, Jeremy Davis
<[hidden email]> wrote:

>
> Are there any thoughts on adding a more complex query to Cassandra?
>
> At a high level what I'm wondering is: Would it be possible/desirable/in
> keeping with the Cassandra plan, to add something like a javascript blob on
> to a get range slice etc, that does some further filtering on the results
> before returning them. The goal being to trade off some CPU on Cassandra for
> network bandwidth.
>
> -JD
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com
Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on adding complex queries to Cassandra

Vick Khera
On Thu, May 27, 2010 at 9:50 AM, Jonathan Ellis <[hidden email]> wrote:
> There definitely seems to be demand for something like this.  Maybe for 0.8?
>

The Riak data store has something like this: you can submit queries
(and map reduce jobs) written in javascript that run on the data nodes
using data local to that node.  It is a very compelling feature.
Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on adding complex queries to Cassandra

Steve Lihn
Mongo has it too. It could save a lot of development time if one can figure out porting Mongo's query API and stored javascript to Cassandra.
It would be great if scala's list comprehension can be facilitated to write query-like code against Cassandra schema.

On Thu, May 27, 2010 at 11:05 AM, Vick Khera <[hidden email]> wrote:
On Thu, May 27, 2010 at 9:50 AM, Jonathan Ellis <[hidden email]> wrote:
> There definitely seems to be demand for something like this.  Maybe for 0.8?
>

The Riak data store has something like this: you can submit queries
(and map reduce jobs) written in javascript that run on the data nodes
using data local to that node.  It is a very compelling feature.

Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on adding complex queries to Cassandra

Jake Luciani
I've secretly started working on this but nothing to show yet :( I'm calling it SliceDiceReduce or SliceReduce.  

 The plan is to use the js thrift bindings I've added for 0.3 release of thrift (out very soon?)

This will allow the supplied js to access the results like any other thrift client.  

Adding a new verb handler and SEDA stage that will execute on a local node and pass this nodes slice data into the supplied js "dice" function via the thrift js bindings. 

The resulting js from each node would then be passed into another supplied js reduce function on the starting node.  

The result of this would then return a single JSON or string result.  The reason I'm keeping the results in json is you can do more than filter. You can do things like word count etc. 

Anyway this is little more than an idea now. But if people like this approach maybe I'll get motivated!

Jake



 

On May 27, 2010, at 7:36 PM, Steve Lihn <[hidden email]> wrote:

Mongo has it too. It could save a lot of development time if one can figure out porting Mongo's query API and stored javascript to Cassandra.
It would be great if scala's list comprehension can be facilitated to write query-like code against Cassandra schema.

On Thu, May 27, 2010 at 11:05 AM, Vick Khera <[hidden email]> wrote:
On Thu, May 27, 2010 at 9:50 AM, Jonathan Ellis <[hidden email]> wrote:
> There definitely seems to be demand for something like this.  Maybe for 0.8?
>

The Riak data store has something like this: you can submit queries
(and map reduce jobs) written in javascript that run on the data nodes
using data local to that node.  It is a very compelling feature.

Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on adding complex queries to Cassandra

Jeremy Davis-3

I agree, I had more than filter results in mind.
Though I had envisioned the results to continue to use the List<ColumnOrSuperColumn> (and not JSON). You could still create new result columns that do not in any way exist in Cassandra, and you could still stuff JSON in to any of result columns.

I had envisioned:
list<ColumnOrSuperColumn> get_slice(keyspace, key, column_parent, predicate, consistency_level, javascript_blob )

-JD




On Thu, May 27, 2010 at 5:01 PM, Jake Luciani <[hidden email]> wrote:
I've secretly started working on this but nothing to show yet :( I'm calling it SliceDiceReduce or SliceReduce.  

 The plan is to use the js thrift bindings I've added for 0.3 release of thrift (out very soon?)

This will allow the supplied js to access the results like any other thrift client.  

Adding a new verb handler and SEDA stage that will execute on a local node and pass this nodes slice data into the supplied js "dice" function via the thrift js bindings. 

The resulting js from each node would then be passed into another supplied js reduce function on the starting node.  

The result of this would then return a single JSON or string result.  The reason I'm keeping the results in json is you can do more than filter. You can do things like word count etc. 

Anyway this is little more than an idea now. But if people like this approach maybe I'll get motivated!

Jake



 

On May 27, 2010, at 7:36 PM, Steve Lihn <[hidden email]> wrote:

Mongo has it too. It could save a lot of development time if one can figure out porting Mongo's query API and stored javascript to Cassandra.
It would be great if scala's list comprehension can be facilitated to write query-like code against Cassandra schema.

On Thu, May 27, 2010 at 11:05 AM, Vick Khera <[hidden email][hidden email]> wrote:
On Thu, May 27, 2010 at 9:50 AM, Jonathan Ellis <[hidden email][hidden email]> wrote:
> There definitely seems to be demand for something like this.  Maybe for 0.8?
>

The Riak data store has something like this: you can submit queries
(and map reduce jobs) written in javascript that run on the data nodes
using data local to that node.  It is a very compelling feature.


Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on adding complex queries to Cassandra

Jake Luciani
I had this:  


string slice_dice_reduce(1:required list<binary> key, 
                                      2:required ColumnParent column_parent, 
                                      3:required SlicePredicate predicate, 
                                      4:required ConsistencyLevel consistency_level=ONE,
                                      5:required string dice_js,
                                      6:required string reduce_js)
                            throws (1:InvalidRequestException ire, 2:UnavailableException ue, 3:TimedOutException te),
  
I guess it could use a union of sorts and return either.



On Thu, May 27, 2010 at 8:36 PM, Jeremy Davis <[hidden email]> wrote:

I agree, I had more than filter results in mind.
Though I had envisioned the results to continue to use the List<ColumnOrSuperColumn> (and not JSON). You could still create new result columns that do not in any way exist in Cassandra, and you could still stuff JSON in to any of result columns.

I had envisioned:
list<ColumnOrSuperColumn> get_slice(keyspace, key, column_parent, predicate, consistency_level, javascript_blob )

-JD





On Thu, May 27, 2010 at 5:01 PM, Jake Luciani <[hidden email]> wrote:
I've secretly started working on this but nothing to show yet :( I'm calling it SliceDiceReduce or SliceReduce.  

 The plan is to use the js thrift bindings I've added for 0.3 release of thrift (out very soon?)

This will allow the supplied js to access the results like any other thrift client.  

Adding a new verb handler and SEDA stage that will execute on a local node and pass this nodes slice data into the supplied js "dice" function via the thrift js bindings. 

The resulting js from each node would then be passed into another supplied js reduce function on the starting node.  

The result of this would then return a single JSON or string result.  The reason I'm keeping the results in json is you can do more than filter. You can do things like word count etc. 

Anyway this is little more than an idea now. But if people like this approach maybe I'll get motivated!

Jake



 

On May 27, 2010, at 7:36 PM, Steve Lihn <[hidden email]> wrote:

Mongo has it too. It could save a lot of development time if one can figure out porting Mongo's query API and stored javascript to Cassandra.
It would be great if scala's list comprehension can be facilitated to write query-like code against Cassandra schema.

On Thu, May 27, 2010 at 11:05 AM, Vick Khera <[hidden email][hidden email]> wrote:
On Thu, May 27, 2010 at 9:50 AM, Jonathan Ellis <[hidden email][hidden email]> wrote:
> There definitely seems to be demand for something like this.  Maybe for 0.8?
>

The Riak data store has something like this: you can submit queries
(and map reduce jobs) written in javascript that run on the data nodes
using data local to that node.  It is a very compelling feature.



Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on adding complex queries to Cassandra

Jeremy Davis-3
I wonder if any of the main project committers would like to weigh in on what a desired API would look like, or perhaps we should start an unscheduled Jira ticket?

On Thu, May 27, 2010 at 5:39 PM, Jake Luciani <[hidden email]> wrote:
I had this:  


string slice_dice_reduce(1:required list<binary> key, 
                                      2:required ColumnParent column_parent, 
                                      3:required SlicePredicate predicate, 
                                      4:required ConsistencyLevel consistency_level=ONE,
                                      5:required string dice_js,
                                      6:required string reduce_js)
                            throws (1:InvalidRequestException ire, 2:UnavailableException ue, 3:TimedOutException te),
  
I guess it could use a union of sorts and return either.



On Thu, May 27, 2010 at 8:36 PM, Jeremy Davis <[hidden email]> wrote:

I agree, I had more than filter results in mind.
Though I had envisioned the results to continue to use the List<ColumnOrSuperColumn> (and not JSON). You could still create new result columns that do not in any way exist in Cassandra, and you could still stuff JSON in to any of result columns.

I had envisioned:
list<ColumnOrSuperColumn> get_slice(keyspace, key, column_parent, predicate, consistency_level, javascript_blob )

-JD





On Thu, May 27, 2010 at 5:01 PM, Jake Luciani <[hidden email]> wrote:
I've secretly started working on this but nothing to show yet :( I'm calling it SliceDiceReduce or SliceReduce.  

 The plan is to use the js thrift bindings I've added for 0.3 release of thrift (out very soon?)

This will allow the supplied js to access the results like any other thrift client.  

Adding a new verb handler and SEDA stage that will execute on a local node and pass this nodes slice data into the supplied js "dice" function via the thrift js bindings. 

The resulting js from each node would then be passed into another supplied js reduce function on the starting node.  

The result of this would then return a single JSON or string result.  The reason I'm keeping the results in json is you can do more than filter. You can do things like word count etc. 

Anyway this is little more than an idea now. But if people like this approach maybe I'll get motivated!

Jake



 

On May 27, 2010, at 7:36 PM, Steve Lihn <[hidden email]> wrote:

Mongo has it too. It could save a lot of development time if one can figure out porting Mongo's query API and stored javascript to Cassandra.
It would be great if scala's list comprehension can be facilitated to write query-like code against Cassandra schema.

On Thu, May 27, 2010 at 11:05 AM, Vick Khera <[hidden email][hidden email]> wrote:
On Thu, May 27, 2010 at 9:50 AM, Jonathan Ellis <[hidden email][hidden email]> wrote:
> There definitely seems to be demand for something like this.  Maybe for 0.8?
>

The Riak data store has something like this: you can submit queries
(and map reduce jobs) written in javascript that run on the data nodes
using data local to that node.  It is a very compelling feature.