New Features - Future releases

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

New Features - Future releases

Vijay-19-3
Hi Guys,

I am doing a presentation in my corp, which has a slide dedicated to future releases of cassendra.... Would like to get your input on what you think will be useful feature to have and what you have in mind. So we can work on it. Just wanted to see the direction where we are headed...

BTW: i understand this is not a promise and i also understand that the discussions are just a forward looking statements... Can and cannot be done.... May or may not be there in the future releases. Just wanted to see your option's..... at the same time we all can get benefited.

Was also wondering, where are we at multiple data center replication.... Multiple rings working together and exchanging data? not only 2 rings?

Regards,
</VJ>



Reply | Threaded
Open this post in threaded view
|

Re: New Features - Future releases

contact-15

great features that would make cassandra a killer db:

  1. ACL
  2. Search features. Or maybe integration with Sphinx
  3. Multiple data center replication in the background. maybe a multi master type thing
Shahan

On Fri, 18 Sep 2009 16:50:30 -0700, Vijay <[hidden email]> wrote:

Hi Guys,

I am doing a presentation in my corp, which has a slide dedicated to future releases of cassendra.... Would like to get your input on what you think will be useful feature to have and what you have in mind. So we can work on it. Just wanted to see the direction where we are headed...

BTW: i understand this is not a promise and i also understand that the discussions are just a forward looking statements... Can and cannot be done.... May or may not be there in the future releases. Just wanted to see your option's..... at the same time we all can get benefited.

Was also wondering, where are we at multiple data center replication.... Multiple rings working together and exchanging data? not only 2 rings?

Regards,



 

Reply | Threaded
Open this post in threaded view
|

Re: New Features - Future releases

Joe Stump

On Sep 18, 2009, at 9:33 PM, <[hidden email]> wrote:

> • ACL

I'm strongly against ACL. Cassandra was built for highly scalable and  
highly distributed environments, which always sit behind firewalls.  
ALC's can easily be implemented in a service layer in front of  
Cassandra.

> • Multiple data center replication in the background. maybe a multi  
> master type thing

It already has this. It was built from the ground up for this. It's  
highly tolerant to partitioning and has always available writes. All  
replication is done in the background (unless you specifically set a  
write to a high consistency level).

--Joe

Reply | Threaded
Open this post in threaded view
|

Re: New Features - Future releases

contact-15
On Fri, 18 Sep 2009 21:41:48 -0400, Joe Stump <[hidden email]> wrote:
> On Sep 18, 2009, at 9:33 PM, <[hidden email]> wrote:
>
>> • ACL
>
> I'm strongly against ACL. Cassandra was built for highly scalable and  
> highly distributed environments, which always sit behind firewalls.  
> ALC's can easily be implemented in a service layer in front of  
> Cassandra.
Your idea is not bad: having a service layer in front of Cassandra. How
about a separate opensource project or a standard/spec for ACL in the
service layer?
>
>> • Multiple data center replication in the background. maybe a multi  
>> master type thing
>
> It already has this. It was built from the ground up for this. It's  
> highly tolerant to partitioning and has always available writes. All  
> replication is done in the background (unless you specifically set a  
> write to a high consistency level).
I'm not an expert in Cassandra, so thank you for pointing this out.

Shahan
>
> --Joe
Reply | Threaded
Open this post in threaded view
|

Re: New Features - Future releases

Joe Stump

On Sep 18, 2009, at 9:46 PM, <[hidden email]> wrote:

> Your idea is not bad: having a service layer in front of Cassandra.  
> How
> about a separate opensource project or a standard/spec for ACL in the
> service layer?

Sure. SOLR is kind of like this for Lucene.

--Joe
Reply | Threaded
Open this post in threaded view
|

Re: New Features - Future releases

Jeffrey Damick
Speaking of lucene, has anyone done any integration with lucene for
cassandra or are there plans to provide full-text searches within cassandra?

Thanks
-jeff


On 9/18/09 9:49 PM, "Joe Stump" <[hidden email]> wrote:

>
> On Sep 18, 2009, at 9:46 PM, <[hidden email]> wrote:
>
>> Your idea is not bad: having a service layer in front of Cassandra.
>> How
>> about a separate opensource project or a standard/spec for ACL in the
>> service layer?
>
> Sure. SOLR is kind of like this for Lucene.
>
> --Joe

Reply | Threaded
Open this post in threaded view
|

Re: New Features - Future releases

Ian Holsman-3
There was mention of lucene integration in the initial FB release.

On Sep 18, 2009, at 9:59 PM, Jeffrey Damick wrote:

> Speaking of lucene, has anyone done any integration with lucene for
> cassandra or are there plans to provide full-text searches within  
> cassandra?
>
> Thanks
> -jeff
>
>
> On 9/18/09 9:49 PM, "Joe Stump" <[hidden email]> wrote:
>
>>
>> On Sep 18, 2009, at 9:46 PM, <[hidden email]> wrote:
>>
>>> Your idea is not bad: having a service layer in front of Cassandra.
>>> How
>>> about a separate opensource project or a standard/spec for ACL in  
>>> the
>>> service layer?
>>
>> Sure. SOLR is kind of like this for Lucene.
>>
>> --Joe
>

--
Ian Holsman
[hidden email]



Reply | Threaded
Open this post in threaded view
|

Re: New Features - Future releases

Jonathan Mischo
In reply to this post by Joe Stump
On Sep 18, 2009, at 8:41 PM, Joe Stump wrote:

>
> On Sep 18, 2009, at 9:33 PM, <[hidden email]> wrote:
>
>> • ACL
>
> I'm strongly against ACL. Cassandra was built for highly scalable  
> and highly distributed environments, which always sit behind  
> firewalls. ALC's can easily be implemented in a service layer in  
> front of Cassandra.
>
ACLs could also be implemented as a pluggable model that defaults to  
off, if you really needed a per-CF or per-keyspace ACL.  Honestly, for  
what Cassandra does best, I think it'd have to be as lightweight as  
possible.

>> • Multiple data center replication in the background. maybe a  
>> multi master type thing
>
> It already has this. It was built from the ground up for this. It's  
> highly tolerant to partitioning and has always available writes. All  
> replication is done in the background (unless you specifically set a  
> write to a high consistency level).
>

You know, it does and it doesn't.  RackAwareStrategy isn't a true N+1  
scaling solution.  Currently, RackAwareStrategy only guarantees that  
it will try to replicate data to one other data center and/or one  
other rack, depending on the number of replicas specified.  It's just  
a problem with the logic used, if the partitioner has already found a  
node in another data center, it stops caring about whether additional  
replicas go to another data center, the same applies to racks...if  
it's already found a node in another racks, it stops trying to ensure  
additional replicas go to different racks.  I'd go into more detail on  
this, but it gets into code, so it's really more appropriate for the  
dev list, or you can open a JIRA ticket and I'll comment on it in more  
detail.

This is something I'm considering working on, after I finish my work  
on mapped and classless EndPoint snitches.
Reply | Threaded
Open this post in threaded view
|

Re: New Features - Future releases

Jonathan Ellis-3
On Fri, Sep 18, 2009 at 9:09 PM, Jonathan Mischo <[hidden email]> wrote:

>>>        • Multiple data center replication in the background. maybe a
>>> multi master type thing
>>
>> It already has this. It was built from the ground up for this. It's highly
>> tolerant to partitioning and has always available writes. All replication is
>> done in the background (unless you specifically set a write to a high
>> consistency level).
>
> You know, it does and it doesn't.  RackAwareStrategy isn't a true N+1
> scaling solution. Currently, RackAwareStrategy only guarantees that it will
> try to replicate data to one other data center and/or one other rack,
> depending on the number of replicas specified.

Yes; that's what it's supposed to do, and it's satisfying a very real
use case: "I want my data's primary data center to be DC A, but I want
one replica in DC B in case A is completely unavailable."

Other use cases can use different Strategies.  That's why they're
pluggable.  It's not one-size-fits-all and it's not supposed to be.

-Jonathan
Reply | Threaded
Open this post in threaded view
|

Re: New Features - Future releases

Jake Luciani
In reply to this post by Ian Holsman-3
I've been working on integrating lucene with Cassandra.  I'll put what  
I've got on github Sunday if people are interested.

Sent from my iPhone

On Sep 18, 2009, at 10:02 PM, Ian Holsman <[hidden email]> wrote:

> There was mention of lucene integration in the initial FB release.
>
> On Sep 18, 2009, at 9:59 PM, Jeffrey Damick wrote:
>
>> Speaking of lucene, has anyone done any integration with lucene for
>> cassandra or are there plans to provide full-text searches within  
>> cassandra?
>>
>> Thanks
>> -jeff
>>
>>
>> On 9/18/09 9:49 PM, "Joe Stump" <[hidden email]> wrote:
>>
>>>
>>> On Sep 18, 2009, at 9:46 PM, <[hidden email]> wrote:
>>>
>>>> Your idea is not bad: having a service layer in front of Cassandra.
>>>> How
>>>> about a separate opensource project or a standard/spec for ACL in  
>>>> the
>>>> service layer?
>>>
>>> Sure. SOLR is kind of like this for Lucene.
>>>
>>> --Joe
>>
>
> --
> Ian Holsman
> [hidden email]
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: New Features - Future releases

contact-15
That sounds very good!
Shahan

On Sat, 19 Sep 2009 07:24:33 -0400, Jake Luciani <[hidden email]> wrote:

> I've been working on integrating lucene with Cassandra.  I'll put what  
> I've got on github Sunday if people are interested.
>
> Sent from my iPhone
>
> On Sep 18, 2009, at 10:02 PM, Ian Holsman <[hidden email]> wrote:
>
>> There was mention of lucene integration in the initial FB release.
>>
>> On Sep 18, 2009, at 9:59 PM, Jeffrey Damick wrote:
>>
>>> Speaking of lucene, has anyone done any integration with lucene for
>>> cassandra or are there plans to provide full-text searches within  
>>> cassandra?
>>>
>>> Thanks
>>> -jeff
>>>
>>>
>>> On 9/18/09 9:49 PM, "Joe Stump" <[hidden email]> wrote:
>>>
>>>>
>>>> On Sep 18, 2009, at 9:46 PM, <[hidden email]> wrote:
>>>>
>>>>> Your idea is not bad: having a service layer in front of Cassandra.
>>>>> How
>>>>> about a separate opensource project or a standard/spec for ACL in  
>>>>> the
>>>>> service layer?
>>>>
>>>> Sure. SOLR is kind of like this for Lucene.
>>>>
>>>> --Joe
>>>
>>
>> --
>> Ian Holsman
>> [hidden email]
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Replication Strategies WAS: New Features - Future releases

Jonathan Mischo
In reply to this post by Jonathan Ellis-3

On Sep 18, 2009, at 9:55 PM, Jonathan Ellis wrote:

On Fri, Sep 18, 2009 at 9:09 PM, Jonathan Mischo <[hidden email]> wrote:
       • Multiple data center replication in the background. maybe a
multi master type thing

It already has this. It was built from the ground up for this. It's highly
tolerant to partitioning and has always available writes. All replication is
done in the background (unless you specifically set a write to a high
consistency level).

You know, it does and it doesn't.  RackAwareStrategy isn't a true N+1
scaling solution. Currently, RackAwareStrategy only guarantees that it will
try to replicate data to one other data center and/or one other rack,
depending on the number of replicas specified.

Yes; that's what it's supposed to do, and it's satisfying a very real
use case: "I want my data's primary data center to be DC A, but I want
one replica in DC B in case A is completely unavailable."

Other use cases can use different Strategies.  That's why they're
pluggable.  It's not one-size-fits-all and it's not supposed to be.

Yeah, you're right, if N+1 is a concern, it should probably be a separate strategy, unless we can keep the complexity virtually the same, because of how heavily it's called. RackAwareStrategy is perfectly fine for what it does - guarantee a replica in a different DC and/or a replica in a different rack after that, if you configure it to store more than 1 replica. Above 3 replicas, it can start to get unbalanced, though, since it's just iterating through the node list, which really has no value.  We could probably just document that for RackAwareStrategy.

I know we're trying to solve for the biggest wins for effort, but, as the Cassandra user base grows (and it will, because it fills a niche that no other KVS or RDBMS quite fills), I think N+1 capability is going to be something that will need to be solved for fairly soon for widespread adoption.

-Jon