Adhoc querying in Cassandra?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Adhoc querying in Cassandra?

Matthew Johnson

Hi all,

 

Currently we are setting up a “big” data cluster, but we are only going to have a couple of servers to start with but we need to be able to scale out quickly when usage ramps up. Previously we have used Hadoop/HBase for our big data cluster, but since we are starting this one on only two nodes I think Cassandra will be a much better fit, as Hadoop and HBase really need at least 3 to achieve any sort of resilience (zookeeper quorum etc).

 

My question is this:

 

I have used Apache Phoenix as a JDBC layer on top of HBase, which allows me to issue ad-hoc SQL-style queries. (eg count the number of times users have clicked on a certain button after clicking a different button in the last 3 weeks etc). My understanding is that CQL does not support this style of adhoc aggregate querying out of the box. Is there a recommended way to do count, sum, average etc without writing client code (in my case Java) every time I want to run one? I have been looking at projects like Drill, Spark etc that could potentially sit on top of Cassandra but without actually setting everything up and testing them it is difficult to figure out what they would give us.

 

Does anyone else interactively issue adhoc aggregate queries to Cassandra, and if so, what stack do you use?

 

Thanks!

Matt

 

Reply | Threaded
Open this post in threaded view
|

Re: Adhoc querying in Cassandra?

Ali Akhtar

You might find it better to use elasticsearch for your aggregate queries and analytics. Cassandra is more of just a data store.

On Apr 22, 2015 4:42 PM, "Matthew Johnson" <[hidden email]> wrote:

Hi all,

 

Currently we are setting up a “big” data cluster, but we are only going to have a couple of servers to start with but we need to be able to scale out quickly when usage ramps up. Previously we have used Hadoop/HBase for our big data cluster, but since we are starting this one on only two nodes I think Cassandra will be a much better fit, as Hadoop and HBase really need at least 3 to achieve any sort of resilience (zookeeper quorum etc).

 

My question is this:

 

I have used Apache Phoenix as a JDBC layer on top of HBase, which allows me to issue ad-hoc SQL-style queries. (eg count the number of times users have clicked on a certain button after clicking a different button in the last 3 weeks etc). My understanding is that CQL does not support this style of adhoc aggregate querying out of the box. Is there a recommended way to do count, sum, average etc without writing client code (in my case Java) every time I want to run one? I have been looking at projects like Drill, Spark etc that could potentially sit on top of Cassandra but without actually setting everything up and testing them it is difficult to figure out what they would give us.

 

Does anyone else interactively issue adhoc aggregate queries to Cassandra, and if so, what stack do you use?

 

Thanks!

Matt

 

Reply | Threaded
Open this post in threaded view
|

Re: Adhoc querying in Cassandra?

Brian O'Neill

+1, I think many organizations (including ours) pair Elastic Search with Cassandra.
Use Cassandra as your system of record, then index the data with ES.

-brian

---

Brian O'Neill 

Chief Technology Officer

Health Market Science, a LexisNexis Company

215.588.6024 Mobile @boneill42 


This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.

 


From: Ali Akhtar <[hidden email]>
Reply-To: <[hidden email]>
Date: Wednesday, April 22, 2015 at 7:52 AM
To: <[hidden email]>
Subject: Re: Adhoc querying in Cassandra?

You might find it better to use elasticsearch for your aggregate queries and analytics. Cassandra is more of just a data store.

On Apr 22, 2015 4:42 PM, "Matthew Johnson" <[hidden email]> wrote:

Hi all,

 

Currently we are setting up a “big” data cluster, but we are only going to have a couple of servers to start with but we need to be able to scale out quickly when usage ramps up. Previously we have used Hadoop/HBase for our big data cluster, but since we are starting this one on only two nodes I think Cassandra will be a much better fit, as Hadoop and HBase really need at least 3 to achieve any sort of resilience (zookeeper quorum etc).

 

My question is this:

 

I have used Apache Phoenix as a JDBC layer on top of HBase, which allows me to issue ad-hoc SQL-style queries. (eg count the number of times users have clicked on a certain button after clicking a different button in the last 3 weeks etc). My understanding is that CQL does not support this style of adhoc aggregate querying out of the box. Is there a recommended way to do count, sum, average etc without writing client code (in my case Java) every time I want to run one? I have been looking at projects like Drill, Spark etc that could potentially sit on top of Cassandra but without actually setting everything up and testing them it is difficult to figure out what they would give us.

 

Does anyone else interactively issue adhoc aggregate queries to Cassandra, and if so, what stack do you use?

 

Thanks!

Matt

 

Reply | Threaded
Open this post in threaded view
|

RE: Adhoc querying in Cassandra?

Matthew Johnson

Hi Ali, Brian,

 

Thanks for the suggestion – we have previously used Solr (SolrCloud for distribution) for a lot of other products, presumably this will do the same job as ElasticSearch? Or does ElasticSearch have specifically better integration with Cassandra or better support for aggregate queries?

 

Would it be an ok architecture to have a Cassandra node and a Solr/ES instance on each box, so they scale together? Or is it better to have separate servers for storage and search?

 

Cheers,

Matt

 

From: Brian O'Neill [mailto:[hidden email]] On Behalf Of Brian O'Neill
Sent: 22 April 2015 12:56
To: [hidden email]
Subject: Re: Adhoc querying in Cassandra?

 

 

+1, I think many organizations (including ours) pair Elastic Search with Cassandra.

Use Cassandra as your system of record, then index the data with ES.

 

-brian

 

---

Brian O'Neill 

Chief Technology Officer

Health Market Science, a LexisNexis Company

215.588.6024 Mobile @boneill42 

 

This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.

 

 

From: Ali Akhtar <[hidden email]>
Reply-To: <[hidden email]>
Date: Wednesday, April 22, 2015 at 7:52 AM
To: <[hidden email]>
Subject: Re: Adhoc querying in Cassandra?

 

You might find it better to use elasticsearch for your aggregate queries and analytics. Cassandra is more of just a data store.

On Apr 22, 2015 4:42 PM, "Matthew Johnson" <[hidden email]> wrote:

Hi all,

 

Currently we are setting up a “big” data cluster, but we are only going to have a couple of servers to start with but we need to be able to scale out quickly when usage ramps up. Previously we have used Hadoop/HBase for our big data cluster, but since we are starting this one on only two nodes I think Cassandra will be a much better fit, as Hadoop and HBase really need at least 3 to achieve any sort of resilience (zookeeper quorum etc).

 

My question is this:

 

I have used Apache Phoenix as a JDBC layer on top of HBase, which allows me to issue ad-hoc SQL-style queries. (eg count the number of times users have clicked on a certain button after clicking a different button in the last 3 weeks etc). My understanding is that CQL does not support this style of adhoc aggregate querying out of the box. Is there a recommended way to do count, sum, average etc without writing client code (in my case Java) every time I want to run one? I have been looking at projects like Drill, Spark etc that could potentially sit on top of Cassandra but without actually setting everything up and testing them it is difficult to figure out what they would give us.

 

Does anyone else interactively issue adhoc aggregate queries to Cassandra, and if so, what stack do you use?

 

Thanks!

Matt

 

Reply | Threaded
Open this post in threaded view
|

Re: Adhoc querying in Cassandra?

Ali Akhtar
I believe ElasticSearch has better support for scaling horizontally (by adding nodes) than Solr does. Some benchmarks that I've looked at, also show it as performing better under high load.

I probably wouldn't run them both on the same node, or you might see low performance as they compete for resources. 

What type of usage do you expect - mostly read, or mostly write?

On Wed, Apr 22, 2015 at 5:06 PM, Matthew Johnson <[hidden email]> wrote:

Hi Ali, Brian,

 

Thanks for the suggestion – we have previously used Solr (SolrCloud for distribution) for a lot of other products, presumably this will do the same job as ElasticSearch? Or does ElasticSearch have specifically better integration with Cassandra or better support for aggregate queries?

 

Would it be an ok architecture to have a Cassandra node and a Solr/ES instance on each box, so they scale together? Or is it better to have separate servers for storage and search?

 

Cheers,

Matt

 

From: Brian O'Neill [mailto:[hidden email]] On Behalf Of Brian O'Neill
Sent: 22 April 2015 12:56
To: [hidden email]
Subject: Re: Adhoc querying in Cassandra?

 

 

+1, I think many organizations (including ours) pair Elastic Search with Cassandra.

Use Cassandra as your system of record, then index the data with ES.

 

-brian

 

---

Brian O'Neill 

Chief Technology Officer

Health Market Science, a LexisNexis Company

215.588.6024 Mobile @boneill42 

 

This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.

 

 

From: Ali Akhtar <[hidden email]>
Reply-To: <[hidden email]>
Date: Wednesday, April 22, 2015 at 7:52 AM
To: <[hidden email]>
Subject: Re: Adhoc querying in Cassandra?

 

You might find it better to use elasticsearch for your aggregate queries and analytics. Cassandra is more of just a data store.

On Apr 22, 2015 4:42 PM, "Matthew Johnson" <[hidden email]> wrote:

Hi all,

 

Currently we are setting up a “big” data cluster, but we are only going to have a couple of servers to start with but we need to be able to scale out quickly when usage ramps up. Previously we have used Hadoop/HBase for our big data cluster, but since we are starting this one on only two nodes I think Cassandra will be a much better fit, as Hadoop and HBase really need at least 3 to achieve any sort of resilience (zookeeper quorum etc).

 

My question is this:

 

I have used Apache Phoenix as a JDBC layer on top of HBase, which allows me to issue ad-hoc SQL-style queries. (eg count the number of times users have clicked on a certain button after clicking a different button in the last 3 weeks etc). My understanding is that CQL does not support this style of adhoc aggregate querying out of the box. Is there a recommended way to do count, sum, average etc without writing client code (in my case Java) every time I want to run one? I have been looking at projects like Drill, Spark etc that could potentially sit on top of Cassandra but without actually setting everything up and testing them it is difficult to figure out what they would give us.

 

Does anyone else interactively issue adhoc aggregate queries to Cassandra, and if so, what stack do you use?

 

Thanks!

Matt

 


Reply | Threaded
Open this post in threaded view
|

Re: Adhoc querying in Cassandra?

Brian O'Neill
Again — agreed.

They have different usage patterns (C* heavy writes, ES heavy read), I would separate them.
SOLR should be sufficient.  I believe DSE is a tight integration between SOLR and C*.

-brian

---

Brian O'Neill 

Chief Technology Officer

Health Market Science, a LexisNexis Company

215.588.6024 Mobile @boneill42 


This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.

 


From: Ali Akhtar <[hidden email]>
Reply-To: <[hidden email]>
Date: Wednesday, April 22, 2015 at 8:10 AM
To: <[hidden email]>
Subject: Re: Adhoc querying in Cassandra?

I believe ElasticSearch has better support for scaling horizontally (by adding nodes) than Solr does. Some benchmarks that I've looked at, also show it as performing better under high load.

I probably wouldn't run them both on the same node, or you might see low performance as they compete for resources. 

What type of usage do you expect - mostly read, or mostly write?

On Wed, Apr 22, 2015 at 5:06 PM, Matthew Johnson <[hidden email]> wrote:

Hi Ali, Brian,

 

Thanks for the suggestion – we have previously used Solr (SolrCloud for distribution) for a lot of other products, presumably this will do the same job as ElasticSearch? Or does ElasticSearch have specifically better integration with Cassandra or better support for aggregate queries?

 

Would it be an ok architecture to have a Cassandra node and a Solr/ES instance on each box, so they scale together? Or is it better to have separate servers for storage and search?

 

Cheers,

Matt

 

From: Brian O'Neill [mailto:[hidden email]] On Behalf Of Brian O'Neill
Sent: 22 April 2015 12:56
To: [hidden email]
Subject: Re: Adhoc querying in Cassandra?

 

 

+1, I think many organizations (including ours) pair Elastic Search with Cassandra.

Use Cassandra as your system of record, then index the data with ES.

 

-brian

 

---

Brian O'Neill 

Chief Technology Officer

Health Market Science, a LexisNexis Company

215.588.6024 Mobile @boneill42 

 

This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.

 

 

From: Ali Akhtar <[hidden email]>
Reply-To: <[hidden email]>
Date: Wednesday, April 22, 2015 at 7:52 AM
To: <[hidden email]>
Subject: Re: Adhoc querying in Cassandra?

 

You might find it better to use elasticsearch for your aggregate queries and analytics. Cassandra is more of just a data store.

On Apr 22, 2015 4:42 PM, "Matthew Johnson" <[hidden email]> wrote:

Hi all,

 

Currently we are setting up a “big” data cluster, but we are only going to have a couple of servers to start with but we need to be able to scale out quickly when usage ramps up. Previously we have used Hadoop/HBase for our big data cluster, but since we are starting this one on only two nodes I think Cassandra will be a much better fit, as Hadoop and HBase really need at least 3 to achieve any sort of resilience (zookeeper quorum etc).

 

My question is this:

 

I have used Apache Phoenix as a JDBC layer on top of HBase, which allows me to issue ad-hoc SQL-style queries. (eg count the number of times users have clicked on a certain button after clicking a different button in the last 3 weeks etc). My understanding is that CQL does not support this style of adhoc aggregate querying out of the box. Is there a recommended way to do count, sum, average etc without writing client code (in my case Java) every time I want to run one? I have been looking at projects like Drill, Spark etc that could potentially sit on top of Cassandra but without actually setting everything up and testing them it is difficult to figure out what they would give us.

 

Does anyone else interactively issue adhoc aggregate queries to Cassandra, and if so, what stack do you use?

 

Thanks!

Matt

 


Reply | Threaded
Open this post in threaded view
|

RE: Adhoc querying in Cassandra?

Matthew Johnson

Our requirements are somewhat in flux at the moment, but initially it will be mostly writes with periodic read spikes (probably overnight etc) for various analytics. Going forwards however, as our application usage scales up, we may end up using it as a read/write replacement for MySQL in some cases.

 

Thanks for the ideas – I’ll take a look at both Solr and ES and see how DSE has done it as well.

 

Cheers,

Matt

 

 

From: Brian O'Neill [mailto:[hidden email]] On Behalf Of Brian O'Neill
Sent: 22 April 2015 13:17
To: [hidden email]
Subject: Re: Adhoc querying in Cassandra?

 

Again — agreed.

 

They have different usage patterns (C* heavy writes, ES heavy read), I would separate them.

SOLR should be sufficient.  I believe DSE is a tight integration between SOLR and C*.

 

-brian

 

---

Brian O'Neill 

Chief Technology Officer

Health Market Science, a LexisNexis Company

215.588.6024 Mobile @boneill42 

 

This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.

 

 

From: Ali Akhtar <[hidden email]>
Reply-To: <[hidden email]>
Date: Wednesday, April 22, 2015 at 8:10 AM
To: <[hidden email]>
Subject: Re: Adhoc querying in Cassandra?

 

I believe ElasticSearch has better support for scaling horizontally (by adding nodes) than Solr does. Some benchmarks that I've looked at, also show it as performing better under high load.

 

I probably wouldn't run them both on the same node, or you might see low performance as they compete for resources. 

 

What type of usage do you expect - mostly read, or mostly write?

 

On Wed, Apr 22, 2015 at 5:06 PM, Matthew Johnson <[hidden email]> wrote:

Hi Ali, Brian,

 

Thanks for the suggestion – we have previously used Solr (SolrCloud for distribution) for a lot of other products, presumably this will do the same job as ElasticSearch? Or does ElasticSearch have specifically better integration with Cassandra or better support for aggregate queries?

 

Would it be an ok architecture to have a Cassandra node and a Solr/ES instance on each box, so they scale together? Or is it better to have separate servers for storage and search?

 

Cheers,

Matt

 

From: Brian O'Neill [mailto:[hidden email]] On Behalf Of Brian O'Neill
Sent: 22 April 2015 12:56
To: [hidden email]
Subject: Re: Adhoc querying in Cassandra?

 

 

+1, I think many organizations (including ours) pair Elastic Search with Cassandra.

Use Cassandra as your system of record, then index the data with ES.

 

-brian

 

---

Brian O'Neill 

Chief Technology Officer

Health Market Science, a LexisNexis Company

215.588.6024 Mobile @boneill42 

 

This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.

 

 

From: Ali Akhtar <[hidden email]>
Reply-To: <[hidden email]>
Date: Wednesday, April 22, 2015 at 7:52 AM
To: <[hidden email]>
Subject: Re: Adhoc querying in Cassandra?

 

You might find it better to use elasticsearch for your aggregate queries and analytics. Cassandra is more of just a data store.

On Apr 22, 2015 4:42 PM, "Matthew Johnson" <[hidden email]> wrote:

Hi all,

 

Currently we are setting up a “big” data cluster, but we are only going to have a couple of servers to start with but we need to be able to scale out quickly when usage ramps up. Previously we have used Hadoop/HBase for our big data cluster, but since we are starting this one on only two nodes I think Cassandra will be a much better fit, as Hadoop and HBase really need at least 3 to achieve any sort of resilience (zookeeper quorum etc).

 

My question is this:

 

I have used Apache Phoenix as a JDBC layer on top of HBase, which allows me to issue ad-hoc SQL-style queries. (eg count the number of times users have clicked on a certain button after clicking a different button in the last 3 weeks etc). My understanding is that CQL does not support this style of adhoc aggregate querying out of the box. Is there a recommended way to do count, sum, average etc without writing client code (in my case Java) every time I want to run one? I have been looking at projects like Drill, Spark etc that could potentially sit on top of Cassandra but without actually setting everything up and testing them it is difficult to figure out what they would give us.

 

Does anyone else interactively issue adhoc aggregate queries to Cassandra, and if so, what stack do you use?

 

Thanks!

Matt

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Adhoc querying in Cassandra?

Nathan Bijnens
For Analytics workloads combining Spark and Cassandra will bring you lots of flexibility and performance. However you will have to setup and learn Spark. The Spark Cassandra connector is very performant and a joy to work with. 

N.

On Wed, Apr 22, 2015 at 4:09 PM Matthew Johnson <[hidden email]> wrote:

Our requirements are somewhat in flux at the moment, but initially it will be mostly writes with periodic read spikes (probably overnight etc) for various analytics. Going forwards however, as our application usage scales up, we may end up using it as a read/write replacement for MySQL in some cases.

 

Thanks for the ideas – I’ll take a look at both Solr and ES and see how DSE has done it as well.

 

Cheers,

Matt

 

 

From: Brian O'Neill [mailto:[hidden email]] On Behalf Of Brian O'Neill
Sent: 22 April 2015 13:17


To: [hidden email]
Subject: Re: Adhoc querying in Cassandra?

 

Again — agreed.

 

They have different usage patterns (C* heavy writes, ES heavy read), I would separate them.

SOLR should be sufficient.  I believe DSE is a tight integration between SOLR and C*.

 

-brian

 

---

Brian O'Neill 

Chief Technology Officer

Health Market Science, a LexisNexis Company

215.588.6024 Mobile @boneill42 

 

This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.

 

 

From: Ali Akhtar <[hidden email]>
Reply-To: <[hidden email]>
Date: Wednesday, April 22, 2015 at 8:10 AM
To: <[hidden email]>
Subject: Re: Adhoc querying in Cassandra?

 

I believe ElasticSearch has better support for scaling horizontally (by adding nodes) than Solr does. Some benchmarks that I've looked at, also show it as performing better under high load.

 

I probably wouldn't run them both on the same node, or you might see low performance as they compete for resources. 

 

What type of usage do you expect - mostly read, or mostly write?

 

On Wed, Apr 22, 2015 at 5:06 PM, Matthew Johnson <[hidden email]> wrote:

Hi Ali, Brian,

 

Thanks for the suggestion – we have previously used Solr (SolrCloud for distribution) for a lot of other products, presumably this will do the same job as ElasticSearch? Or does ElasticSearch have specifically better integration with Cassandra or better support for aggregate queries?

 

Would it be an ok architecture to have a Cassandra node and a Solr/ES instance on each box, so they scale together? Or is it better to have separate servers for storage and search?

 

Cheers,

Matt

 

From: Brian O'Neill [mailto:[hidden email]] On Behalf Of Brian O'Neill
Sent: 22 April 2015 12:56
To: [hidden email]
Subject: Re: Adhoc querying in Cassandra?

 

 

+1, I think many organizations (including ours) pair Elastic Search with Cassandra.

Use Cassandra as your system of record, then index the data with ES.

 

-brian

 

---

Brian O'Neill 

Chief Technology Officer

Health Market Science, a LexisNexis Company

215.588.6024 Mobile @boneill42 

 

This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.

 

 

From: Ali Akhtar <[hidden email]>
Reply-To: <[hidden email]>
Date: Wednesday, April 22, 2015 at 7:52 AM
To: <[hidden email]>
Subject: Re: Adhoc querying in Cassandra?

 

You might find it better to use elasticsearch for your aggregate queries and analytics. Cassandra is more of just a data store.

On Apr 22, 2015 4:42 PM, "Matthew Johnson" <[hidden email]> wrote:

Hi all,

 

Currently we are setting up a “big” data cluster, but we are only going to have a couple of servers to start with but we need to be able to scale out quickly when usage ramps up. Previously we have used Hadoop/HBase for our big data cluster, but since we are starting this one on only two nodes I think Cassandra will be a much better fit, as Hadoop and HBase really need at least 3 to achieve any sort of resilience (zookeeper quorum etc).

 

My question is this:

 

I have used Apache Phoenix as a JDBC layer on top of HBase, which allows me to issue ad-hoc SQL-style queries. (eg count the number of times users have clicked on a certain button after clicking a different button in the last 3 weeks etc). My understanding is that CQL does not support this style of adhoc aggregate querying out of the box. Is there a recommended way to do count, sum, average etc without writing client code (in my case Java) every time I want to run one? I have been looking at projects like Drill, Spark etc that could potentially sit on top of Cassandra but without actually setting everything up and testing them it is difficult to figure out what they would give us.

 

Does anyone else interactively issue adhoc aggregate queries to Cassandra, and if so, what stack do you use?

 

Thanks!

Matt