2d or multi dimension range query in cassandra CQL

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

2d or multi dimension range query in cassandra CQL

Mehak Mehta
Hi,

I want to perform range queries (as in x and y ranges) on a large data billions of rows. 
CQL allows me to put Non EQ restrictions on only one of the clustering columns. 
Its not allowing me to filter the data using any other column even with use of Allow Filtering option.

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 order by x allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

My table definition is :

CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h double, normalized varchar, type varchar, filehost varchar, filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar, image_mpp_x double, image_mpp_y double, image_width double, image_height double, objective double, cancer_type varchar,  Area float, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x));

Kindly suggest me how to model is data and perform 2d or in general multi dimension range queries in Cassandra efficiently.


Thanks,
Mehak
Reply | Threaded
Open this post in threaded view
|

Re: 2d or multi dimension range query in cassandra CQL

Jack Krupansky-2
1. Create multiple secondary indexes, one for each non-key column you need to index on. Not recommended. Considered an anti-pattern for Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.

-- Jack Krupansky

On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

I want to perform range queries (as in x and y ranges) on a large data billions of rows. 
CQL allows me to put Non EQ restrictions on only one of the clustering columns. 
Its not allowing me to filter the data using any other column even with use of Allow Filtering option.

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 order by x allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

My table definition is :

CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h double, normalized varchar, type varchar, filehost varchar, filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar, image_mpp_x double, image_mpp_y double, image_width double, image_height double, objective double, cancer_type varchar,  Area float, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x));

Kindly suggest me how to model is data and perform 2d or in general multi dimension range queries in Cassandra efficiently.


Thanks,
Mehak

Reply | Threaded
Open this post in threaded view
|

Re: 2d or multi dimension range query in cassandra CQL

健峰 高
Lucene or Solr is better. I’m using lucene
在 2015年3月18日,上午2:11,Jack Krupansky <[hidden email]> 写道:

1. Create multiple secondary indexes, one for each non-key column you need to index on. Not recommended. Considered an anti-pattern for Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.

-- Jack Krupansky

On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

I want to perform range queries (as in x and y ranges) on a large data billions of rows. 
CQL allows me to put Non EQ restrictions on only one of the clustering columns. 
Its not allowing me to filter the data using any other column even with use of Allow Filtering option.

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 order by x allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

My table definition is :

CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h double, normalized varchar, type varchar, filehost varchar, filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar, image_mpp_x double, image_mpp_y double, image_width double, image_height double, objective double, cancer_type varchar,  Area float, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x));

Kindly suggest me how to model is data and perform 2d or in general multi dimension range queries in Cassandra efficiently.


Thanks,
Mehak


Reply | Threaded
Open this post in threaded view
|

Re: 2d or multi dimension range query in cassandra CQL

Mehak Mehta
Hi,

In my case I am just trying to do range queries on 2 dimensions i.e. x and y.

cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY (image_caseid));

With secondary indexes on x and y:
cqlsh:images> create index results1_y on results1(y);
cqlsh:images> create index results1_x on results1(x);
 
But when I try the following the query it is not allowed with an error. 

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

Can you tell me why this is not working. I can't understand why Cassandra is not allowing me to use Non EQ range query on secondary indexes even when I have restricted my primary key by EQ as specified in error message.

Also please elaborate on how to use Lucene/Solr in this case with Cassandra.

Thanks,
Mehak

On Tue, Mar 17, 2015 at 2:26 PM, 健峰 高 <[hidden email]> wrote:
Lucene or Solr is better. I’m using lucene
在 2015年3月18日,上午2:11,Jack Krupansky <[hidden email]> 写道:

1. Create multiple secondary indexes, one for each non-key column you need to index on. Not recommended. Considered an anti-pattern for Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.

-- Jack Krupansky

On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

I want to perform range queries (as in x and y ranges) on a large data billions of rows. 
CQL allows me to put Non EQ restrictions on only one of the clustering columns. 
Its not allowing me to filter the data using any other column even with use of Allow Filtering option.

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 order by x allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

My table definition is :

CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h double, normalized varchar, type varchar, filehost varchar, filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar, image_mpp_x double, image_mpp_y double, image_width double, image_height double, objective double, cancer_type varchar,  Area float, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x));

Kindly suggest me how to model is data and perform 2d or in general multi dimension range queries in Cassandra efficiently.


Thanks,
Mehak



Reply | Threaded
Open this post in threaded view
|

Re: 2d or multi dimension range query in cassandra CQL

Jack Krupansky-2
Yeah, you may have to add a dummy column populated with a constant, or just pick some other additional column to index that you can apply = to. It's just a requirement to assure that a filtered select isn't too-too slow.

Uh.... if you are applying = to your primary key then there is no need for any filtering. What did you really mean to do? It makes no sense the way you have it!

Either go with DSE Search/Solr, or google "Tuplejump Stargate" or "Stratio".

-- Jack Krupansky

On Tue, Mar 17, 2015 at 4:51 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

In my case I am just trying to do range queries on 2 dimensions i.e. x and y.

cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY (image_caseid));

With secondary indexes on x and y:
cqlsh:images> create index results1_y on results1(y);
cqlsh:images> create index results1_x on results1(x);
 
But when I try the following the query it is not allowed with an error. 

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

Can you tell me why this is not working. I can't understand why Cassandra is not allowing me to use Non EQ range query on secondary indexes even when I have restricted my primary key by EQ as specified in error message.

Also please elaborate on how to use Lucene/Solr in this case with Cassandra.

Thanks,
Mehak

On Tue, Mar 17, 2015 at 2:26 PM, 健峰 高 <[hidden email]> wrote:
Lucene or Solr is better. I’m using lucene
在 2015年3月18日,上午2:11,Jack Krupansky <[hidden email]> 写道:

1. Create multiple secondary indexes, one for each non-key column you need to index on. Not recommended. Considered an anti-pattern for Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.

-- Jack Krupansky

On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

I want to perform range queries (as in x and y ranges) on a large data billions of rows. 
CQL allows me to put Non EQ restrictions on only one of the clustering columns. 
Its not allowing me to filter the data using any other column even with use of Allow Filtering option.

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 order by x allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

My table definition is :

CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h double, normalized varchar, type varchar, filehost varchar, filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar, image_mpp_x double, image_mpp_y double, image_width double, image_height double, objective double, cancer_type varchar,  Area float, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x));

Kindly suggest me how to model is data and perform 2d or in general multi dimension range queries in Cassandra efficiently.


Thanks,
Mehak




Reply | Threaded
Open this post in threaded view
|

Re: 2d or multi dimension range query in cassandra CQL

Mehak Mehta
Sorry I gave you wrong table definition for query. Here a composite key of image_caseid, x and uuid which is unique. I have used x in clustering columns to query it. And used secondary index on y column. 

1. Example
cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x,uuid));
cqlsh:images> create index results1_y on results1(y);

In the below query you can see I have image_caseid as primary key which is filtered. Even then it is giving error that "No indexed columns present

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y<100 order by image_caseid asc;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

2. Example
I also tried including both x and y columns as composite key even then query gives following error:

cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x,y,uuid));

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y<100 order by image_caseid asc;
code=2200 [Invalid query] message="PRIMARY KEY column "y" cannot be restricted (preceding column "ColumnDefinition{name=x, type=org.apache.cassandra.db.marshal.DoubleType, kind=CLUSTERING_COLUMN, componentIndex=0, indexName=null, indexType=null}" is either not restricted or by a non-EQ relation)"

Thanks,
Mehak 


On Tue, Mar 17, 2015 at 5:19 PM, Jack Krupansky <[hidden email]> wrote:
Yeah, you may have to add a dummy column populated with a constant, or just pick some other additional column to index that you can apply = to. It's just a requirement to assure that a filtered select isn't too-too slow.

Uh.... if you are applying = to your primary key then there is no need for any filtering. What did you really mean to do? It makes no sense the way you have it!

Either go with DSE Search/Solr, or google "Tuplejump Stargate" or "Stratio".

-- Jack Krupansky

On Tue, Mar 17, 2015 at 4:51 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

In my case I am just trying to do range queries on 2 dimensions i.e. x and y.

cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY (image_caseid));

With secondary indexes on x and y:
cqlsh:images> create index results1_y on results1(y);
cqlsh:images> create index results1_x on results1(x);
 
But when I try the following the query it is not allowed with an error. 

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

Can you tell me why this is not working. I can't understand why Cassandra is not allowing me to use Non EQ range query on secondary indexes even when I have restricted my primary key by EQ as specified in error message.

Also please elaborate on how to use Lucene/Solr in this case with Cassandra.

Thanks,
Mehak

On Tue, Mar 17, 2015 at 2:26 PM, 健峰 高 <[hidden email]> wrote:
Lucene or Solr is better. I’m using lucene
在 2015年3月18日,上午2:11,Jack Krupansky <[hidden email]> 写道:

1. Create multiple secondary indexes, one for each non-key column you need to index on. Not recommended. Considered an anti-pattern for Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.

-- Jack Krupansky

On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

I want to perform range queries (as in x and y ranges) on a large data billions of rows. 
CQL allows me to put Non EQ restrictions on only one of the clustering columns. 
Its not allowing me to filter the data using any other column even with use of Allow Filtering option.

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 order by x allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

My table definition is :

CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h double, normalized varchar, type varchar, filehost varchar, filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar, image_mpp_x double, image_mpp_y double, image_width double, image_height double, objective double, cancer_type varchar,  Area float, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x));

Kindly suggest me how to model is data and perform 2d or in general multi dimension range queries in Cassandra efficiently.


Thanks,
Mehak





Reply | Threaded
Open this post in threaded view
|

Re: 2d or multi dimension range query in cassandra CQL

Andres de la Peña
Hi,

With Stratio Cassandra you can create Lucene based indexes for multidimensional queries this way:

ALTER TABLE images.results1 ADD lucene text ;

CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene) 
USING 'com.stratio.cassandra.index.RowIndex' 
WITH OPTIONS = {
 'refresh_seconds':'1', 
 'schema':'{
  fields:{
  image_caseid:{type:"string"},
x:{type:"double"}, 
y:{type:"double"} } } '};

Then you can perform the query using the dummy column:

SELECT * FROM images.results1 WHERE lucene='{ filter:{type:"boolean", must:[
{field:"image_caseid", type:"match", value:"mehak"},
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';

However, you can take advantage of partition key to route the query only to the nodes owning the data:

SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{ filter:{type:"boolean", must:[
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';

Or, even better:

SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x>100 AND lucene='{ filter:{field:"y", type:"range", lower:100}}';

Additionally, if your data are geospatial (latitude and longitude), soon you will can use the incoming spatial features.



2015-03-17 23:01 GMT+01:00 Mehak Mehta <[hidden email]>:
Sorry I gave you wrong table definition for query. Here a composite key of image_caseid, x and uuid which is unique. I have used x in clustering columns to query it. And used secondary index on y column. 

1. Example
cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x,uuid));
cqlsh:images> create index results1_y on results1(y);

In the below query you can see I have image_caseid as primary key which is filtered. Even then it is giving error that "No indexed columns present

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y<100 order by image_caseid asc;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

2. Example
I also tried including both x and y columns as composite key even then query gives following error:

cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x,y,uuid));

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y<100 order by image_caseid asc;
code=2200 [Invalid query] message="PRIMARY KEY column "y" cannot be restricted (preceding column "ColumnDefinition{name=x, type=org.apache.cassandra.db.marshal.DoubleType, kind=CLUSTERING_COLUMN, componentIndex=0, indexName=null, indexType=null}" is either not restricted or by a non-EQ relation)"

Thanks,
Mehak 


On Tue, Mar 17, 2015 at 5:19 PM, Jack Krupansky <[hidden email]> wrote:
Yeah, you may have to add a dummy column populated with a constant, or just pick some other additional column to index that you can apply = to. It's just a requirement to assure that a filtered select isn't too-too slow.

Uh.... if you are applying = to your primary key then there is no need for any filtering. What did you really mean to do? It makes no sense the way you have it!

Either go with DSE Search/Solr, or google "Tuplejump Stargate" or "Stratio".

-- Jack Krupansky

On Tue, Mar 17, 2015 at 4:51 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

In my case I am just trying to do range queries on 2 dimensions i.e. x and y.

cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY (image_caseid));

With secondary indexes on x and y:
cqlsh:images> create index results1_y on results1(y);
cqlsh:images> create index results1_x on results1(x);
 
But when I try the following the query it is not allowed with an error. 

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

Can you tell me why this is not working. I can't understand why Cassandra is not allowing me to use Non EQ range query on secondary indexes even when I have restricted my primary key by EQ as specified in error message.

Also please elaborate on how to use Lucene/Solr in this case with Cassandra.

Thanks,
Mehak

On Tue, Mar 17, 2015 at 2:26 PM, 健峰 高 <[hidden email]> wrote:
Lucene or Solr is better. I’m using lucene
在 2015年3月18日,上午2:11,Jack Krupansky <[hidden email]> 写道:

1. Create multiple secondary indexes, one for each non-key column you need to index on. Not recommended. Considered an anti-pattern for Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.

-- Jack Krupansky

On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

I want to perform range queries (as in x and y ranges) on a large data billions of rows. 
CQL allows me to put Non EQ restrictions on only one of the clustering columns. 
Its not allowing me to filter the data using any other column even with use of Allow Filtering option.

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 order by x allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

My table definition is :

CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h double, normalized varchar, type varchar, filehost varchar, filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar, image_mpp_x double, image_mpp_y double, image_width double, image_height double, objective double, cancer_type varchar,  Area float, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x));

Kindly suggest me how to model is data and perform 2d or in general multi dimension range queries in Cassandra efficiently.


Thanks,
Mehak








--

Andrés de la Peña


Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // @stratiobd
Reply | Threaded
Open this post in threaded view
|

Re: 2d or multi dimension range query in cassandra CQL

Mehak Mehta
Hi,

On the basis of some suggestions, I tried using tuplejump for multidimensional queries. Since other mostly needed root permissions (for building ) which I don't have on my cluster account. 

I found a major problem in tuplejump (stargate-core). When I am using it with a list type field in my table. It stops working.
For e.g.

create table person (
    id int primary key,
    isActive boolean,
    age int,
    eyeColor varchar,
    name text,
    gender varchar,
    company varchar,
    email varchar,
    phone varchar,
    address text,
    points list<double>,
    stargate text
);

with indexing as: 
CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some 
'com.tuplejump.stargate.RowIndex' WITH options =
{
        'sg_options':'{
                "fields":{
                "eyeColor":{},
                        "age":{},
                        "phone":{}
                        }
        }'
};

If I insert data in the table along with points list. The following query won't give any results (0 rows):

SELECT * FROM RESULTS1 WHERE stargate ='{
    filter: {
        type: "range",
        field: "x",
        lower: 0
    }
}';

I tried removing points list<double> from the table and it works i.e. same query will return results.
Can somebody help me with this problem as I couldn't find much support from Stargate.

Please note that I am using Cassandra 2.0.9 compatible with Stargate-core as given in link (http://stargate-core.readthedocs.org/en/latest/quickstart.html).

Thanks,
Mehak


On Wed, Mar 18, 2015 at 5:45 AM, Andres de la Peña <[hidden email]> wrote:
Hi,

With Stratio Cassandra you can create Lucene based indexes for multidimensional queries this way:

ALTER TABLE images.results1 ADD lucene text ;

CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene) 
USING 'com.stratio.cassandra.index.RowIndex' 
WITH OPTIONS = {
 'refresh_seconds':'1', 
 'schema':'{
  fields:{
  image_caseid:{type:"string"},
x:{type:"double"}, 
y:{type:"double"} } } '};

Then you can perform the query using the dummy column:

SELECT * FROM images.results1 WHERE lucene='{ filter:{type:"boolean", must:[
{field:"image_caseid", type:"match", value:"mehak"},
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';

However, you can take advantage of partition key to route the query only to the nodes owning the data:

SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{ filter:{type:"boolean", must:[
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';

Or, even better:

SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x>100 AND lucene='{ filter:{field:"y", type:"range", lower:100}}';

Additionally, if your data are geospatial (latitude and longitude), soon you will can use the incoming spatial features.



2015-03-17 23:01 GMT+01:00 Mehak Mehta <[hidden email]>:
Sorry I gave you wrong table definition for query. Here a composite key of image_caseid, x and uuid which is unique. I have used x in clustering columns to query it. And used secondary index on y column. 

1. Example
cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x,uuid));
cqlsh:images> create index results1_y on results1(y);

In the below query you can see I have image_caseid as primary key which is filtered. Even then it is giving error that "No indexed columns present

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y<100 order by image_caseid asc;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

2. Example
I also tried including both x and y columns as composite key even then query gives following error:

cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x,y,uuid));

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y<100 order by image_caseid asc;
code=2200 [Invalid query] message="PRIMARY KEY column "y" cannot be restricted (preceding column "ColumnDefinition{name=x, type=org.apache.cassandra.db.marshal.DoubleType, kind=CLUSTERING_COLUMN, componentIndex=0, indexName=null, indexType=null}" is either not restricted or by a non-EQ relation)"

Thanks,
Mehak 


On Tue, Mar 17, 2015 at 5:19 PM, Jack Krupansky <[hidden email]> wrote:
Yeah, you may have to add a dummy column populated with a constant, or just pick some other additional column to index that you can apply = to. It's just a requirement to assure that a filtered select isn't too-too slow.

Uh.... if you are applying = to your primary key then there is no need for any filtering. What did you really mean to do? It makes no sense the way you have it!

Either go with DSE Search/Solr, or google "Tuplejump Stargate" or "Stratio".

-- Jack Krupansky

On Tue, Mar 17, 2015 at 4:51 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

In my case I am just trying to do range queries on 2 dimensions i.e. x and y.

cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY (image_caseid));

With secondary indexes on x and y:
cqlsh:images> create index results1_y on results1(y);
cqlsh:images> create index results1_x on results1(x);
 
But when I try the following the query it is not allowed with an error. 

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

Can you tell me why this is not working. I can't understand why Cassandra is not allowing me to use Non EQ range query on secondary indexes even when I have restricted my primary key by EQ as specified in error message.

Also please elaborate on how to use Lucene/Solr in this case with Cassandra.

Thanks,
Mehak

On Tue, Mar 17, 2015 at 2:26 PM, 健峰 高 <[hidden email]> wrote:
Lucene or Solr is better. I’m using lucene
在 2015年3月18日,上午2:11,Jack Krupansky <[hidden email]> 写道:

1. Create multiple secondary indexes, one for each non-key column you need to index on. Not recommended. Considered an anti-pattern for Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.

-- Jack Krupansky

On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

I want to perform range queries (as in x and y ranges) on a large data billions of rows. 
CQL allows me to put Non EQ restrictions on only one of the clustering columns. 
Its not allowing me to filter the data using any other column even with use of Allow Filtering option.

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 order by x allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

My table definition is :

CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h double, normalized varchar, type varchar, filehost varchar, filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar, image_mpp_x double, image_mpp_y double, image_width double, image_height double, objective double, cancer_type varchar,  Area float, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x));

Kindly suggest me how to model is data and perform 2d or in general multi dimension range queries in Cassandra efficiently.


Thanks,
Mehak








--

Andrés de la Peña


Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: <a href="tel:%2B34%2091%20352%2059%2042" value="+34913525942" target="_blank">+34 91 352 59 42 // @stratiobd

Reply | Threaded
Open this post in threaded view
|

Re: 2d or multi dimension range query in cassandra CQL

Asit KAUSHIK
i am using Startio Cassandra it way better than stargate as it works on the latest release of Cassandra and is better on my performance.

we are using it for fulltext search use case

Regards
Asit

On Sun, Mar 22, 2015 at 12:14 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

On the basis of some suggestions, I tried using tuplejump for multidimensional queries. Since other mostly needed root permissions (for building ) which I don't have on my cluster account. 

I found a major problem in tuplejump (stargate-core). When I am using it with a list type field in my table. It stops working.
For e.g.

create table person (
    id int primary key,
    isActive boolean,
    age int,
    eyeColor varchar,
    name text,
    gender varchar,
    company varchar,
    email varchar,
    phone varchar,
    address text,
    points list<double>,
    stargate text
);

with indexing as: 
CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some 
'com.tuplejump.stargate.RowIndex' WITH options =
{
        'sg_options':'{
                "fields":{
                "eyeColor":{},
                        "age":{},
                        "phone":{}
                        }
        }'
};

If I insert data in the table along with points list. The following query won't give any results (0 rows):

SELECT * FROM RESULTS1 WHERE stargate ='{
    filter: {
        type: "range",
        field: "x",
        lower: 0
    }
}';

I tried removing points list<double> from the table and it works i.e. same query will return results.
Can somebody help me with this problem as I couldn't find much support from Stargate.

Please note that I am using Cassandra 2.0.9 compatible with Stargate-core as given in link (http://stargate-core.readthedocs.org/en/latest/quickstart.html).

Thanks,
Mehak


On Wed, Mar 18, 2015 at 5:45 AM, Andres de la Peña <[hidden email]> wrote:
Hi,

With Stratio Cassandra you can create Lucene based indexes for multidimensional queries this way:

ALTER TABLE images.results1 ADD lucene text ;

CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene) 
USING 'com.stratio.cassandra.index.RowIndex' 
WITH OPTIONS = {
 'refresh_seconds':'1', 
 'schema':'{
  fields:{
  image_caseid:{type:"string"},
x:{type:"double"}, 
y:{type:"double"} } } '};

Then you can perform the query using the dummy column:

SELECT * FROM images.results1 WHERE lucene='{ filter:{type:"boolean", must:[
{field:"image_caseid", type:"match", value:"mehak"},
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';

However, you can take advantage of partition key to route the query only to the nodes owning the data:

SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{ filter:{type:"boolean", must:[
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';

Or, even better:

SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x>100 AND lucene='{ filter:{field:"y", type:"range", lower:100}}';

Additionally, if your data are geospatial (latitude and longitude), soon you will can use the incoming spatial features.



2015-03-17 23:01 GMT+01:00 Mehak Mehta <[hidden email]>:
Sorry I gave you wrong table definition for query. Here a composite key of image_caseid, x and uuid which is unique. I have used x in clustering columns to query it. And used secondary index on y column. 

1. Example
cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x,uuid));
cqlsh:images> create index results1_y on results1(y);

In the below query you can see I have image_caseid as primary key which is filtered. Even then it is giving error that "No indexed columns present

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y<100 order by image_caseid asc;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

2. Example
I also tried including both x and y columns as composite key even then query gives following error:

cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x,y,uuid));

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y<100 order by image_caseid asc;
code=2200 [Invalid query] message="PRIMARY KEY column "y" cannot be restricted (preceding column "ColumnDefinition{name=x, type=org.apache.cassandra.db.marshal.DoubleType, kind=CLUSTERING_COLUMN, componentIndex=0, indexName=null, indexType=null}" is either not restricted or by a non-EQ relation)"

Thanks,
Mehak 


On Tue, Mar 17, 2015 at 5:19 PM, Jack Krupansky <[hidden email]> wrote:
Yeah, you may have to add a dummy column populated with a constant, or just pick some other additional column to index that you can apply = to. It's just a requirement to assure that a filtered select isn't too-too slow.

Uh.... if you are applying = to your primary key then there is no need for any filtering. What did you really mean to do? It makes no sense the way you have it!

Either go with DSE Search/Solr, or google "Tuplejump Stargate" or "Stratio".

-- Jack Krupansky

On Tue, Mar 17, 2015 at 4:51 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

In my case I am just trying to do range queries on 2 dimensions i.e. x and y.

cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY (image_caseid));

With secondary indexes on x and y:
cqlsh:images> create index results1_y on results1(y);
cqlsh:images> create index results1_x on results1(x);
 
But when I try the following the query it is not allowed with an error. 

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

Can you tell me why this is not working. I can't understand why Cassandra is not allowing me to use Non EQ range query on secondary indexes even when I have restricted my primary key by EQ as specified in error message.

Also please elaborate on how to use Lucene/Solr in this case with Cassandra.

Thanks,
Mehak

On Tue, Mar 17, 2015 at 2:26 PM, 健峰 高 <[hidden email]> wrote:
Lucene or Solr is better. I’m using lucene
在 2015年3月18日,上午2:11,Jack Krupansky <[hidden email]> 写道:

1. Create multiple secondary indexes, one for each non-key column you need to index on. Not recommended. Considered an anti-pattern for Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.

-- Jack Krupansky

On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

I want to perform range queries (as in x and y ranges) on a large data billions of rows. 
CQL allows me to put Non EQ restrictions on only one of the clustering columns. 
Its not allowing me to filter the data using any other column even with use of Allow Filtering option.

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 order by x allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

My table definition is :

CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h double, normalized varchar, type varchar, filehost varchar, filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar, image_mpp_x double, image_mpp_y double, image_width double, image_height double, objective double, cancer_type varchar,  Area float, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x));

Kindly suggest me how to model is data and perform 2d or in general multi dimension range queries in Cassandra efficiently.


Thanks,
Mehak








--

Andrés de la Peña


Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: <a href="tel:%2B34%2091%20352%2059%2042" value="+34913525942" target="_blank">+34 91 352 59 42 // @stratiobd


Reply | Threaded
Open this post in threaded view
|

Re: 2d or multi dimension range query in cassandra CQL

Mehak Mehta
Hi,

I checked Startio Cassandra but couldn't get any good documentation for the same.
Can you give me some pointers on how to use it. 
Do I have to build it from the source or I can use it directly with jar files as in case of Stargate. 
Since I was looking for solution which I don't need a full build and can be used with existing tar of cassandra because I have some restrictions on installing stuff on my server.

Thanks,
Mehak

On Mon, Mar 23, 2015 at 8:17 AM, Asit KAUSHIK <[hidden email]> wrote:
i am using Startio Cassandra it way better than stargate as it works on the latest release of Cassandra and is better on my performance.

we are using it for fulltext search use case

Regards
Asit

On Sun, Mar 22, 2015 at 12:14 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

On the basis of some suggestions, I tried using tuplejump for multidimensional queries. Since other mostly needed root permissions (for building ) which I don't have on my cluster account. 

I found a major problem in tuplejump (stargate-core). When I am using it with a list type field in my table. It stops working.
For e.g.

create table person (
    id int primary key,
    isActive boolean,
    age int,
    eyeColor varchar,
    name text,
    gender varchar,
    company varchar,
    email varchar,
    phone varchar,
    address text,
    points list<double>,
    stargate text
);

with indexing as: 
CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some 
'com.tuplejump.stargate.RowIndex' WITH options =
{
        'sg_options':'{
                "fields":{
                "eyeColor":{},
                        "age":{},
                        "phone":{}
                        }
        }'
};

If I insert data in the table along with points list. The following query won't give any results (0 rows):

SELECT * FROM RESULTS1 WHERE stargate ='{
    filter: {
        type: "range",
        field: "x",
        lower: 0
    }
}';

I tried removing points list<double> from the table and it works i.e. same query will return results.
Can somebody help me with this problem as I couldn't find much support from Stargate.

Please note that I am using Cassandra 2.0.9 compatible with Stargate-core as given in link (http://stargate-core.readthedocs.org/en/latest/quickstart.html).

Thanks,
Mehak


On Wed, Mar 18, 2015 at 5:45 AM, Andres de la Peña <[hidden email]> wrote:
Hi,

With Stratio Cassandra you can create Lucene based indexes for multidimensional queries this way:

ALTER TABLE images.results1 ADD lucene text ;

CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene) 
USING 'com.stratio.cassandra.index.RowIndex' 
WITH OPTIONS = {
 'refresh_seconds':'1', 
 'schema':'{
  fields:{
  image_caseid:{type:"string"},
x:{type:"double"}, 
y:{type:"double"} } } '};

Then you can perform the query using the dummy column:

SELECT * FROM images.results1 WHERE lucene='{ filter:{type:"boolean", must:[
{field:"image_caseid", type:"match", value:"mehak"},
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';

However, you can take advantage of partition key to route the query only to the nodes owning the data:

SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{ filter:{type:"boolean", must:[
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';

Or, even better:

SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x>100 AND lucene='{ filter:{field:"y", type:"range", lower:100}}';

Additionally, if your data are geospatial (latitude and longitude), soon you will can use the incoming spatial features.



2015-03-17 23:01 GMT+01:00 Mehak Mehta <[hidden email]>:
Sorry I gave you wrong table definition for query. Here a composite key of image_caseid, x and uuid which is unique. I have used x in clustering columns to query it. And used secondary index on y column. 

1. Example
cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x,uuid));
cqlsh:images> create index results1_y on results1(y);

In the below query you can see I have image_caseid as primary key which is filtered. Even then it is giving error that "No indexed columns present

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y<100 order by image_caseid asc;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

2. Example
I also tried including both x and y columns as composite key even then query gives following error:

cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x,y,uuid));

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y<100 order by image_caseid asc;
code=2200 [Invalid query] message="PRIMARY KEY column "y" cannot be restricted (preceding column "ColumnDefinition{name=x, type=org.apache.cassandra.db.marshal.DoubleType, kind=CLUSTERING_COLUMN, componentIndex=0, indexName=null, indexType=null}" is either not restricted or by a non-EQ relation)"

Thanks,
Mehak 


On Tue, Mar 17, 2015 at 5:19 PM, Jack Krupansky <[hidden email]> wrote:
Yeah, you may have to add a dummy column populated with a constant, or just pick some other additional column to index that you can apply = to. It's just a requirement to assure that a filtered select isn't too-too slow.

Uh.... if you are applying = to your primary key then there is no need for any filtering. What did you really mean to do? It makes no sense the way you have it!

Either go with DSE Search/Solr, or google "Tuplejump Stargate" or "Stratio".

-- Jack Krupansky

On Tue, Mar 17, 2015 at 4:51 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

In my case I am just trying to do range queries on 2 dimensions i.e. x and y.

cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY (image_caseid));

With secondary indexes on x and y:
cqlsh:images> create index results1_y on results1(y);
cqlsh:images> create index results1_x on results1(x);
 
But when I try the following the query it is not allowed with an error. 

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

Can you tell me why this is not working. I can't understand why Cassandra is not allowing me to use Non EQ range query on secondary indexes even when I have restricted my primary key by EQ as specified in error message.

Also please elaborate on how to use Lucene/Solr in this case with Cassandra.

Thanks,
Mehak

On Tue, Mar 17, 2015 at 2:26 PM, 健峰 高 <[hidden email]> wrote:
Lucene or Solr is better. I’m using lucene
在 2015年3月18日,上午2:11,Jack Krupansky <[hidden email]> 写道:

1. Create multiple secondary indexes, one for each non-key column you need to index on. Not recommended. Considered an anti-pattern for Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.

-- Jack Krupansky

On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

I want to perform range queries (as in x and y ranges) on a large data billions of rows. 
CQL allows me to put Non EQ restrictions on only one of the clustering columns. 
Its not allowing me to filter the data using any other column even with use of Allow Filtering option.

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 order by x allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

My table definition is :

CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h double, normalized varchar, type varchar, filehost varchar, filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar, image_mpp_x double, image_mpp_y double, image_width double, image_height double, objective double, cancer_type varchar,  Area float, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x));

Kindly suggest me how to model is data and perform 2d or in general multi dimension range queries in Cassandra efficiently.


Thanks,
Mehak








--

Andrés de la Peña


Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: <a href="tel:%2B34%2091%20352%2059%2042" value="+34913525942" target="_blank">+34 91 352 59 42 // @stratiobd



Reply | Threaded
Open this post in threaded view
|

Re: 2d or multi dimension range query in cassandra CQL

Andres de la Peña
Hi,


You can get info about how to build and getting started at its README file. More detailed info can be found at https://github.com/Stratio/stratio-cassandra/blob/master/doc/stratio/extended-search-in-cassandra.md

Regards,

2015-03-23 18:07 GMT+01:00 Mehak Mehta <[hidden email]>:
Hi,

I checked Startio Cassandra but couldn't get any good documentation for the same.
Can you give me some pointers on how to use it. 
Do I have to build it from the source or I can use it directly with jar files as in case of Stargate. 
Since I was looking for solution which I don't need a full build and can be used with existing tar of cassandra because I have some restrictions on installing stuff on my server.

Thanks,
Mehak

On Mon, Mar 23, 2015 at 8:17 AM, Asit KAUSHIK <[hidden email]> wrote:
i am using Startio Cassandra it way better than stargate as it works on the latest release of Cassandra and is better on my performance.

we are using it for fulltext search use case

Regards
Asit

On Sun, Mar 22, 2015 at 12:14 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

On the basis of some suggestions, I tried using tuplejump for multidimensional queries. Since other mostly needed root permissions (for building ) which I don't have on my cluster account. 

I found a major problem in tuplejump (stargate-core). When I am using it with a list type field in my table. It stops working.
For e.g.

create table person (
    id int primary key,
    isActive boolean,
    age int,
    eyeColor varchar,
    name text,
    gender varchar,
    company varchar,
    email varchar,
    phone varchar,
    address text,
    points list<double>,
    stargate text
);

with indexing as: 
CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some 
'com.tuplejump.stargate.RowIndex' WITH options =
{
        'sg_options':'{
                "fields":{
                "eyeColor":{},
                        "age":{},
                        "phone":{}
                        }
        }'
};

If I insert data in the table along with points list. The following query won't give any results (0 rows):

SELECT * FROM RESULTS1 WHERE stargate ='{
    filter: {
        type: "range",
        field: "x",
        lower: 0
    }
}';

I tried removing points list<double> from the table and it works i.e. same query will return results.
Can somebody help me with this problem as I couldn't find much support from Stargate.

Please note that I am using Cassandra 2.0.9 compatible with Stargate-core as given in link (http://stargate-core.readthedocs.org/en/latest/quickstart.html).

Thanks,
Mehak


On Wed, Mar 18, 2015 at 5:45 AM, Andres de la Peña <[hidden email]> wrote:
Hi,

With Stratio Cassandra you can create Lucene based indexes for multidimensional queries this way:

ALTER TABLE images.results1 ADD lucene text ;

CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene) 
USING 'com.stratio.cassandra.index.RowIndex' 
WITH OPTIONS = {
 'refresh_seconds':'1', 
 'schema':'{
  fields:{
  image_caseid:{type:"string"},
x:{type:"double"}, 
y:{type:"double"} } } '};

Then you can perform the query using the dummy column:

SELECT * FROM images.results1 WHERE lucene='{ filter:{type:"boolean", must:[
{field:"image_caseid", type:"match", value:"mehak"},
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';

However, you can take advantage of partition key to route the query only to the nodes owning the data:

SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{ filter:{type:"boolean", must:[
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';

Or, even better:

SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x>100 AND lucene='{ filter:{field:"y", type:"range", lower:100}}';

Additionally, if your data are geospatial (latitude and longitude), soon you will can use the incoming spatial features.



2015-03-17 23:01 GMT+01:00 Mehak Mehta <[hidden email]>:
Sorry I gave you wrong table definition for query. Here a composite key of image_caseid, x and uuid which is unique. I have used x in clustering columns to query it. And used secondary index on y column. 

1. Example
cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x,uuid));
cqlsh:images> create index results1_y on results1(y);

In the below query you can see I have image_caseid as primary key which is filtered. Even then it is giving error that "No indexed columns present

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y<100 order by image_caseid asc;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

2. Example
I also tried including both x and y columns as composite key even then query gives following error:

cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x,y,uuid));

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y<100 order by image_caseid asc;
code=2200 [Invalid query] message="PRIMARY KEY column "y" cannot be restricted (preceding column "ColumnDefinition{name=x, type=org.apache.cassandra.db.marshal.DoubleType, kind=CLUSTERING_COLUMN, componentIndex=0, indexName=null, indexType=null}" is either not restricted or by a non-EQ relation)"

Thanks,
Mehak 


On Tue, Mar 17, 2015 at 5:19 PM, Jack Krupansky <[hidden email]> wrote:
Yeah, you may have to add a dummy column populated with a constant, or just pick some other additional column to index that you can apply = to. It's just a requirement to assure that a filtered select isn't too-too slow.

Uh.... if you are applying = to your primary key then there is no need for any filtering. What did you really mean to do? It makes no sense the way you have it!

Either go with DSE Search/Solr, or google "Tuplejump Stargate" or "Stratio".

-- Jack Krupansky

On Tue, Mar 17, 2015 at 4:51 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

In my case I am just trying to do range queries on 2 dimensions i.e. x and y.

cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, submit_date timestamp, points list<double>,  PRIMARY KEY (image_caseid));

With secondary indexes on x and y:
cqlsh:images> create index results1_y on results1(y);
cqlsh:images> create index results1_x on results1(x);
 
But when I try the following the query it is not allowed with an error. 

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

Can you tell me why this is not working. I can't understand why Cassandra is not allowing me to use Non EQ range query on secondary indexes even when I have restricted my primary key by EQ as specified in error message.

Also please elaborate on how to use Lucene/Solr in this case with Cassandra.

Thanks,
Mehak

On Tue, Mar 17, 2015 at 2:26 PM, 健峰 高 <[hidden email]> wrote:
Lucene or Solr is better. I’m using lucene
在 2015年3月18日,上午2:11,Jack Krupansky <[hidden email]> 写道:

1. Create multiple secondary indexes, one for each non-key column you need to index on. Not recommended. Considered an anti-pattern for Cassandra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.

-- Jack Krupansky

On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta <[hidden email]> wrote:
Hi,

I want to perform range queries (as in x and y ranges) on a large data billions of rows. 
CQL allows me to put Non EQ restrictions on only one of the clustering columns. 
Its not allowing me to filter the data using any other column even with use of Allow Filtering option.

cqlsh:images> select * from results1 where image_caseid='mehak' and x > 100 and y <100 order by x allow filtering;
code=2200 [Invalid query] message="No indexed columns present in by-columns clause with Equal operator"

My table definition is :

CREATE TABLE images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h double, normalized varchar, type varchar, filehost varchar, filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar, image_mpp_x double, image_mpp_y double, image_width double, image_height double, objective double, cancer_type varchar,  Area float, submit_date timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x));

Kindly suggest me how to model is data and perform 2d or in general multi dimension range queries in Cassandra efficiently.


Thanks,
Mehak








--

Andrés de la Peña


Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: <a href="tel:%2B34%2091%20352%2059%2042" value="+34913525942" target="_blank">+34 91 352 59 42 // @stratiobd






--

Andrés de la Peña


Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // @stratiobd