Quantcast

High latencies for simple queries

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

High latencies for simple queries

Artur Siekielski
I'm running Cassandra locally and I see that the execution time for the
simplest queries is 1-2 milliseconds. By a simple query I mean either
INSERT or SELECT from a small table with short keys.

While this number is not high, it's about 10-20 times slower than
Postgresql (even if INSERTs are wrapped in transactions). I know that
the nature of Cassandra compared to Postgresql is different, but for
some scenarios this difference can matter.

The question is: is it normal for Cassandra to have a minimum latency of
1 millisecond?

I'm using Cassandra 2.1.2, python-driver.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: High latencies for simple queries

Tyler Hobbs-2
Just to check, are you concerned about minimizing that latency or maximizing throughput?

I'll that latency is what you're actually concerned about.  A fair amount of that latency is probably happening in the python driver.  Although it can easily execute ~8k operations per second (using cpython), in some scenarios it can be difficult to guarantee sub-ms latency for an individual query due to how some of the internals work.  In particular, it uses python's Conditions for cross-thread signalling (from the event loop thread to the application thread).  Unfortunately, python's Condition implementation includes a loop with a minimum sleep of 1ms if the Condition isn't already set when you start the wait() call.  This is why, with a single application thread, you will typically see a minimum of 1ms latency.

Another source of similar latencies for the python driver is the Asyncore event loop, which is used when libev isn't available.  I would make sure that you can use the LibevConnection class with the driver to avoid this.

On Fri, Mar 27, 2015 at 6:24 AM, Artur Siekielski <[hidden email]> wrote:
I'm running Cassandra locally and I see that the execution time for the simplest queries is 1-2 milliseconds. By a simple query I mean either INSERT or SELECT from a small table with short keys.

While this number is not high, it's about 10-20 times slower than Postgresql (even if INSERTs are wrapped in transactions). I know that the nature of Cassandra compared to Postgresql is different, but for some scenarios this difference can matter.

The question is: is it normal for Cassandra to have a minimum latency of 1 millisecond?

I'm using Cassandra 2.1.2, python-driver.





--
Tyler Hobbs
DataStax
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: High latencies for simple queries

Ben Bromhead
Latency can be so variable even when testing things locally. I quickly fired up postgres and did the following with psql:

ben=# CREATE TABLE foo(i int, j text, PRIMARY KEY(i));
CREATE TABLE
ben=# \timing
Timing is on.
ben=# INSERT INTO foo VALUES(2, 'yay');
INSERT 0 1
Time: 1.162 ms
ben=# INSERT INTO foo VALUES(3, 'yay');
INSERT 0 1
Time: 1.108 ms

I then fired up a local copy of Cassandra (2.0.12)

cqlsh> CREATE KEYSPACE foo WITH replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> USE foo;
cqlsh:foo> CREATE TABLE foo(i int PRIMARY KEY, j text);
cqlsh:foo> TRACING ON;
Now tracing requests.
cqlsh:foo> INSERT INTO foo (i, j) VALUES (1, 'yay');

Tracing session: 7a7dced0-d4b2-11e4-b950-85c3c9bd91a0

 activity                                          | timestamp    | source    | source_elapsed
---------------------------------------------------+--------------+-----------+----------------
                                execute_cql3_query | 11:52:55,229 | 127.0.0.1 |              0
 Parsing INSERT INTO foo (i, j) VALUES (1, 'yay'); | 11:52:55,229 | 127.0.0.1 |             43
                               Preparing statement | 11:52:55,229 | 127.0.0.1 |            141
                 Determining replicas for mutation | 11:52:55,229 | 127.0.0.1 |            291
                    Acquiring switchLock read lock | 11:52:55,229 | 127.0.0.1 |            403
                            Appending to commitlog | 11:52:55,229 | 127.0.0.1 |            413
                            Adding to foo memtable | 11:52:55,229 | 127.0.0.1 |            432
                                  Request complete | 11:52:55,229 | 127.0.0.1 |            541

All this on a mac book pro with 16gb of memory and an SSD

So ymmv?

On 27 March 2015 at 08:28, Tyler Hobbs <[hidden email]> wrote:
Just to check, are you concerned about minimizing that latency or maximizing throughput?

I'll that latency is what you're actually concerned about.  A fair amount of that latency is probably happening in the python driver.  Although it can easily execute ~8k operations per second (using cpython), in some scenarios it can be difficult to guarantee sub-ms latency for an individual query due to how some of the internals work.  In particular, it uses python's Conditions for cross-thread signalling (from the event loop thread to the application thread).  Unfortunately, python's Condition implementation includes a loop with a minimum sleep of 1ms if the Condition isn't already set when you start the wait() call.  This is why, with a single application thread, you will typically see a minimum of 1ms latency.

Another source of similar latencies for the python driver is the Asyncore event loop, which is used when libev isn't available.  I would make sure that you can use the LibevConnection class with the driver to avoid this.

On Fri, Mar 27, 2015 at 6:24 AM, Artur Siekielski <[hidden email]> wrote:
I'm running Cassandra locally and I see that the execution time for the simplest queries is 1-2 milliseconds. By a simple query I mean either INSERT or SELECT from a small table with short keys.

While this number is not high, it's about 10-20 times slower than Postgresql (even if INSERTs are wrapped in transactions). I know that the nature of Cassandra compared to Postgresql is different, but for some scenarios this difference can matter.

The question is: is it normal for Cassandra to have a minimum latency of 1 millisecond?

I'm using Cassandra 2.1.2, python-driver.





--
Tyler Hobbs
DataStax



--

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr | (650) 284 9692

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: High latencies for simple queries

Artur Siekielski
In reply to this post by Tyler Hobbs-2
Yes, I'm concerned about the latency. Throughput can be high even when
using Python: http://datastax.github.io/python-driver/performance.html.
But in my scenarios I need to run queries sequentially, so latencies
matter. And Cassandra requires issuing more queries than SQL databases
so these latencies can add up to a significant amount.

I was running Asyncore event loop, because it looks like libev isn't
supported for PyPy which I'm using. I've switched to CPython and
LibevConnection for a moment and I don't think I've noticed a major
speedup, and a minimum latency is still 1ms.

Overall, it looks to me that the issue is not that important, because
using multi-master, multi-dc databases always involve getting higher and
somewhat unpredictable latencies, so relying on sub-millisecond
latencies on production clusters is not very realistic.


On 03/27/2015 04:28 PM, Tyler Hobbs wrote:

> Just to check, are you concerned about minimizing that latency or
> maximizing throughput?
>
> I'll that latency is what you're actually concerned about.  A fair
> amount of that latency is probably happening in the python driver.
> Although it can easily execute ~8k operations per second (using
> cpython), in some scenarios it can be difficult to guarantee sub-ms
> latency for an individual query due to how some of the internals work.
> In particular, it uses python's Conditions for cross-thread signalling
> (from the event loop thread to the application thread).  Unfortunately,
> python's Condition implementation includes a loop with a minimum sleep
> of 1ms if the Condition isn't already set when you start the wait()
> call.  This is why, with a single application thread, you will typically
> see a minimum of 1ms latency.
>
> Another source of similar latencies for the python driver is the
> Asyncore event loop, which is used when libev isn't available.  I would
> make sure that you can use the LibevConnection class with the driver to
> avoid this.
>
> On Fri, Mar 27, 2015 at 6:24 AM, Artur Siekielski <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     I'm running Cassandra locally and I see that the execution time for
>     the simplest queries is 1-2 milliseconds. By a simple query I mean
>     either INSERT or SELECT from a small table with short keys.
>
>     While this number is not high, it's about 10-20 times slower than
>     Postgresql (even if INSERTs are wrapped in transactions). I know
>     that the nature of Cassandra compared to Postgresql is different,
>     but for some scenarios this difference can matter.
>
>     The question is: is it normal for Cassandra to have a minimum
>     latency of 1 millisecond?
>
>     I'm using Cassandra 2.1.2, python-driver.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: High latencies for simple queries

Artur Siekielski
In reply to this post by Ben Bromhead
I think that in your example Postgres spends most time on waiting for
fsync() to complete. On Linux, for a battery-backed raid controller,
it's safe to mount ext4 filesystem with "barrier=0" option which
improves fsync() performance a lot. I have partitions mounted with this
option and I did a test from Python, using psycopg2 driver, and I got
the following latencies, in milliseconds:
- INSERT without COMMIT: 0.04
- INSERT with COMMIT: 0.12
- SELECT: 0.05
I'm also repeating benchmark runs multiple times (I'm using Python's
"timeit" module).

On 03/27/2015 07:58 PM, Ben Bromhead wrote:

> Latency can be so variable even when testing things locally. I quickly
> fired up postgres and did the following with psql:
>
> ben=# CREATE TABLE foo(i int, j text, PRIMARY KEY(i));
> CREATE TABLE
> ben=# \timing
> Timing is on.
> ben=# INSERT INTO foo VALUES(2, 'yay');
> INSERT 0 1
> Time: 1.162 ms
> ben=# INSERT INTO foo VALUES(3, 'yay');
> INSERT 0 1
> Time: 1.108 ms
>
> I then fired up a local copy of Cassandra (2.0.12)
>
> cqlsh> CREATE KEYSPACE foo WITH replication = { 'class' :
> 'SimpleStrategy', 'replication_factor' : 1 };
> cqlsh> USE foo;
> cqlsh:foo> CREATE TABLE foo(i int PRIMARY KEY, j text);
> cqlsh:foo> TRACING ON;
> Now tracing requests.
> cqlsh:foo> INSERT INTO foo (i, j) VALUES (1, 'yay');
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: High latencies for simple queries

Tyler Hobbs-2
Since you're executing queries sequentially, you may want to look into using callback chaining to avoid the cross-thread signaling that results in the 1ms latencies.  Basically, just use session.execute_async() and attach a callback to the returned future that will execute your next query.  The callback is executed on the event loop thread.  The main downsides to this are that you need to be careful to avoid blocking the event loop thread (including executing session.execute() or prepare()) and you need to ensure that all exceptions raised in the callback are handled by your application code.

On Fri, Mar 27, 2015 at 3:11 PM, Artur Siekielski <[hidden email]> wrote:
I think that in your example Postgres spends most time on waiting for fsync() to complete. On Linux, for a battery-backed raid controller, it's safe to mount ext4 filesystem with "barrier=0" option which improves fsync() performance a lot. I have partitions mounted with this option and I did a test from Python, using psycopg2 driver, and I got the following latencies, in milliseconds:
- INSERT without COMMIT: 0.04
- INSERT with COMMIT: 0.12
- SELECT: 0.05
I'm also repeating benchmark runs multiple times (I'm using Python's "timeit" module).


On 03/27/2015 07:58 PM, Ben Bromhead wrote:
Latency can be so variable even when testing things locally. I quickly
fired up postgres and did the following with psql:

ben=# CREATE TABLE foo(i int, j text, PRIMARY KEY(i));
CREATE TABLE
ben=# \timing
Timing is on.
ben=# INSERT INTO foo VALUES(2, 'yay');
INSERT 0 1
Time: 1.162 ms
ben=# INSERT INTO foo VALUES(3, 'yay');
INSERT 0 1
Time: 1.108 ms

I then fired up a local copy of Cassandra (2.0.12)

cqlsh> CREATE KEYSPACE foo WITH replication = { 'class' :
'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> USE foo;
cqlsh:foo> CREATE TABLE foo(i int PRIMARY KEY, j text);
cqlsh:foo> TRACING ON;
Now tracing requests.
cqlsh:foo> INSERT INTO foo (i, j) VALUES (1, 'yay');





--
Tyler Hobbs
DataStax
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: High latencies for simple queries

Laing, Michael
I use callback chaining with the python driver and can confirm that it is very fast.

You can "chain the chains" together to perform sequential processing. I do this when retrieving "metadata" and then the referenced "payload" for example, when the metadata has been inverted and the payload is larger than we want to invert. And you can be running multiple "chains of chains" asynchronously - cascade state by employing the userdata of the future.

We also multiprocess, for more parallelism, and we distribute work to multiple multiprocessing instances using a message broker for yet more parallel activity, as well as reliability.

ml

On Fri, Mar 27, 2015 at 4:28 PM, Tyler Hobbs <[hidden email]> wrote:
Since you're executing queries sequentially, you may want to look into using callback chaining to avoid the cross-thread signaling that results in the 1ms latencies.  Basically, just use session.execute_async() and attach a callback to the returned future that will execute your next query.  The callback is executed on the event loop thread.  The main downsides to this are that you need to be careful to avoid blocking the event loop thread (including executing session.execute() or prepare()) and you need to ensure that all exceptions raised in the callback are handled by your application code.

On Fri, Mar 27, 2015 at 3:11 PM, Artur Siekielski <[hidden email]> wrote:
I think that in your example Postgres spends most time on waiting for fsync() to complete. On Linux, for a battery-backed raid controller, it's safe to mount ext4 filesystem with "barrier=0" option which improves fsync() performance a lot. I have partitions mounted with this option and I did a test from Python, using psycopg2 driver, and I got the following latencies, in milliseconds:
- INSERT without COMMIT: 0.04
- INSERT with COMMIT: 0.12
- SELECT: 0.05
I'm also repeating benchmark runs multiple times (I'm using Python's "timeit" module).


On 03/27/2015 07:58 PM, Ben Bromhead wrote:
Latency can be so variable even when testing things locally. I quickly
fired up postgres and did the following with psql:

ben=# CREATE TABLE foo(i int, j text, PRIMARY KEY(i));
CREATE TABLE
ben=# \timing
Timing is on.
ben=# INSERT INTO foo VALUES(2, 'yay');
INSERT 0 1
Time: 1.162 ms
ben=# INSERT INTO foo VALUES(3, 'yay');
INSERT 0 1
Time: 1.108 ms

I then fired up a local copy of Cassandra (2.0.12)

cqlsh> CREATE KEYSPACE foo WITH replication = { 'class' :
'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> USE foo;
cqlsh:foo> CREATE TABLE foo(i int PRIMARY KEY, j text);
cqlsh:foo> TRACING ON;
Now tracing requests.
cqlsh:foo> INSERT INTO foo (i, j) VALUES (1, 'yay');





--
Tyler Hobbs
DataStax

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: High latencies for simple queries

Laing, Michael
Actually I am in the middle of setting up the same sort of thing for PostgreSQL using psycopg2 and pyev.

I'll be using Cassandra and PostgreSQL in an IoT experiment as the backend for swarms of MQTT brokers at something in the 10-100M client range.

ml

On Fri, Mar 27, 2015 at 4:59 PM, Laing, Michael <[hidden email]> wrote:
I use callback chaining with the python driver and can confirm that it is very fast.

You can "chain the chains" together to perform sequential processing. I do this when retrieving "metadata" and then the referenced "payload" for example, when the metadata has been inverted and the payload is larger than we want to invert. And you can be running multiple "chains of chains" asynchronously - cascade state by employing the userdata of the future.

We also multiprocess, for more parallelism, and we distribute work to multiple multiprocessing instances using a message broker for yet more parallel activity, as well as reliability.

ml

On Fri, Mar 27, 2015 at 4:28 PM, Tyler Hobbs <[hidden email]> wrote:
Since you're executing queries sequentially, you may want to look into using callback chaining to avoid the cross-thread signaling that results in the 1ms latencies.  Basically, just use session.execute_async() and attach a callback to the returned future that will execute your next query.  The callback is executed on the event loop thread.  The main downsides to this are that you need to be careful to avoid blocking the event loop thread (including executing session.execute() or prepare()) and you need to ensure that all exceptions raised in the callback are handled by your application code.

On Fri, Mar 27, 2015 at 3:11 PM, Artur Siekielski <[hidden email]> wrote:
I think that in your example Postgres spends most time on waiting for fsync() to complete. On Linux, for a battery-backed raid controller, it's safe to mount ext4 filesystem with "barrier=0" option which improves fsync() performance a lot. I have partitions mounted with this option and I did a test from Python, using psycopg2 driver, and I got the following latencies, in milliseconds:
- INSERT without COMMIT: 0.04
- INSERT with COMMIT: 0.12
- SELECT: 0.05
I'm also repeating benchmark runs multiple times (I'm using Python's "timeit" module).


On 03/27/2015 07:58 PM, Ben Bromhead wrote:
Latency can be so variable even when testing things locally. I quickly
fired up postgres and did the following with psql:

ben=# CREATE TABLE foo(i int, j text, PRIMARY KEY(i));
CREATE TABLE
ben=# \timing
Timing is on.
ben=# INSERT INTO foo VALUES(2, 'yay');
INSERT 0 1
Time: 1.162 ms
ben=# INSERT INTO foo VALUES(3, 'yay');
INSERT 0 1
Time: 1.108 ms

I then fired up a local copy of Cassandra (2.0.12)

cqlsh> CREATE KEYSPACE foo WITH replication = { 'class' :
'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> USE foo;
cqlsh:foo> CREATE TABLE foo(i int PRIMARY KEY, j text);
cqlsh:foo> TRACING ON;
Now tracing requests.
cqlsh:foo> INSERT INTO foo (i, j) VALUES (1, 'yay');





--
Tyler Hobbs
DataStax


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: High latencies for simple queries

Ben Bromhead
One other thing to keep in mind / check is that doing these tests locally the cassandra driver will connect using the network stack, whereas postgres supports local connections over a unix domain socket (this is also enabled by default). 

Unix domain sockets are significantly faster than tcp as you don't have a network stack to traverse. I think any driver using libpq will attempt to use the domain socket when connecting locally.

But I'm going to hazard a guess something else is going on with the Cassandra connection as I'm able to get 0.5ms queries locally and that's even with trace turned on. 

Ben

On 27 March 2015 at 14:10, Laing, Michael <[hidden email]> wrote:
Actually I am in the middle of setting up the same sort of thing for PostgreSQL using psycopg2 and pyev.

I'll be using Cassandra and PostgreSQL in an IoT experiment as the backend for swarms of MQTT brokers at something in the 10-100M client range.

ml

On Fri, Mar 27, 2015 at 4:59 PM, Laing, Michael <[hidden email]> wrote:
I use callback chaining with the python driver and can confirm that it is very fast.

You can "chain the chains" together to perform sequential processing. I do this when retrieving "metadata" and then the referenced "payload" for example, when the metadata has been inverted and the payload is larger than we want to invert. And you can be running multiple "chains of chains" asynchronously - cascade state by employing the userdata of the future.

We also multiprocess, for more parallelism, and we distribute work to multiple multiprocessing instances using a message broker for yet more parallel activity, as well as reliability.

ml

On Fri, Mar 27, 2015 at 4:28 PM, Tyler Hobbs <[hidden email]> wrote:
Since you're executing queries sequentially, you may want to look into using callback chaining to avoid the cross-thread signaling that results in the 1ms latencies.  Basically, just use session.execute_async() and attach a callback to the returned future that will execute your next query.  The callback is executed on the event loop thread.  The main downsides to this are that you need to be careful to avoid blocking the event loop thread (including executing session.execute() or prepare()) and you need to ensure that all exceptions raised in the callback are handled by your application code.

On Fri, Mar 27, 2015 at 3:11 PM, Artur Siekielski <[hidden email]> wrote:
I think that in your example Postgres spends most time on waiting for fsync() to complete. On Linux, for a battery-backed raid controller, it's safe to mount ext4 filesystem with "barrier=0" option which improves fsync() performance a lot. I have partitions mounted with this option and I did a test from Python, using psycopg2 driver, and I got the following latencies, in milliseconds:
- INSERT without COMMIT: 0.04
- INSERT with COMMIT: 0.12
- SELECT: 0.05
I'm also repeating benchmark runs multiple times (I'm using Python's "timeit" module).


On 03/27/2015 07:58 PM, Ben Bromhead wrote:
Latency can be so variable even when testing things locally. I quickly
fired up postgres and did the following with psql:

ben=# CREATE TABLE foo(i int, j text, PRIMARY KEY(i));
CREATE TABLE
ben=# \timing
Timing is on.
ben=# INSERT INTO foo VALUES(2, 'yay');
INSERT 0 1
Time: 1.162 ms
ben=# INSERT INTO foo VALUES(3, 'yay');
INSERT 0 1
Time: 1.108 ms

I then fired up a local copy of Cassandra (2.0.12)

cqlsh> CREATE KEYSPACE foo WITH replication = { 'class' :
'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> USE foo;
cqlsh:foo> CREATE TABLE foo(i int PRIMARY KEY, j text);
cqlsh:foo> TRACING ON;
Now tracing requests.
cqlsh:foo> INSERT INTO foo (i, j) VALUES (1, 'yay');





--
Tyler Hobbs
DataStax





--

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr | (650) 284 9692

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: High latencies for simple queries

Artur Siekielski
On 03/28/2015 12:13 AM, Ben Bromhead wrote:
> One other thing to keep in mind / check is that doing these tests
> locally the cassandra driver will connect using the network stack,
> whereas postgres supports local connections over a unix domain socket
> (this is also enabled by default).
>
> Unix domain sockets are significantly faster than tcp as you don't have
> a network stack to traverse. I think any driver using libpq will attempt
> to use the domain socket when connecting locally.

Good catch. I assured that psycopg2 connects through a TCP socket and
the numbers increased by about 20%, but it still is an order of
magnitude faster than Cassandra.

>
> But I'm going to hazard a guess something else is going on with the
> Cassandra connection as I'm able to get 0.5ms queries locally and that's
> even with trace turned on.

Using python-driver?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: High latencies for simple queries

Ben Bromhead
cqlsh runs on the internal cassandra python drivers: cassandra-pylib and cqlshlib.

I would not recommend using them at all (nothing wrong with them, they are just not built with external users in mind).

I have never used python-driver in anger so I can't comment on whether it is genuinely slower than the internal C* python driver, but this might be a question for python-driver folk.

On 28 March 2015 at 00:34, Artur Siekielski <[hidden email]> wrote:
On 03/28/2015 12:13 AM, Ben Bromhead wrote:
One other thing to keep in mind / check is that doing these tests
locally the cassandra driver will connect using the network stack,
whereas postgres supports local connections over a unix domain socket
(this is also enabled by default).

Unix domain sockets are significantly faster than tcp as you don't have
a network stack to traverse. I think any driver using libpq will attempt
to use the domain socket when connecting locally.

Good catch. I assured that psycopg2 connects through a TCP socket and the numbers increased by about 20%, but it still is an order of magnitude faster than Cassandra.


But I'm going to hazard a guess something else is going on with the
Cassandra connection as I'm able to get 0.5ms queries locally and that's
even with trace turned on.

Using python-driver?



--

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr | (650) 284 9692

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: High latencies for simple queries

Tyler Hobbs-2
The python driver that we bundle with Cassandra for cqlsh is the normal python driver (https://github.com/datastax/python-driver), although sometimes it's patched for bugfixes or is not an official release.

On Sat, Mar 28, 2015 at 5:36 PM, Ben Bromhead <[hidden email]> wrote:
cqlsh runs on the internal cassandra python drivers: cassandra-pylib and cqlshlib.

I would not recommend using them at all (nothing wrong with them, they are just not built with external users in mind).

I have never used python-driver in anger so I can't comment on whether it is genuinely slower than the internal C* python driver, but this might be a question for python-driver folk.

On 28 March 2015 at 00:34, Artur Siekielski <[hidden email]> wrote:
On 03/28/2015 12:13 AM, Ben Bromhead wrote:
One other thing to keep in mind / check is that doing these tests
locally the cassandra driver will connect using the network stack,
whereas postgres supports local connections over a unix domain socket
(this is also enabled by default).

Unix domain sockets are significantly faster than tcp as you don't have
a network stack to traverse. I think any driver using libpq will attempt
to use the domain socket when connecting locally.

Good catch. I assured that psycopg2 connects through a TCP socket and the numbers increased by about 20%, but it still is an order of magnitude faster than Cassandra.


But I'm going to hazard a guess something else is going on with the
Cassandra connection as I'm able to get 0.5ms queries locally and that's
even with trace turned on.

Using python-driver?



--

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr | <a href="tel:%28650%29%20284%209692" value="+16502849692" target="_blank">(650) 284 9692




--
Tyler Hobbs
DataStax
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: High latencies for simple queries

Tyler Hobbs-2
To clarify, that's in Cassandra 2.1+.  In 2.0 and earlier, we used http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/ for cqlsh.

On Tue, Mar 31, 2015 at 10:40 AM, Tyler Hobbs <[hidden email]> wrote:
The python driver that we bundle with Cassandra for cqlsh is the normal python driver (https://github.com/datastax/python-driver), although sometimes it's patched for bugfixes or is not an official release.

On Sat, Mar 28, 2015 at 5:36 PM, Ben Bromhead <[hidden email]> wrote:
cqlsh runs on the internal cassandra python drivers: cassandra-pylib and cqlshlib.

I would not recommend using them at all (nothing wrong with them, they are just not built with external users in mind).

I have never used python-driver in anger so I can't comment on whether it is genuinely slower than the internal C* python driver, but this might be a question for python-driver folk.

On 28 March 2015 at 00:34, Artur Siekielski <[hidden email]> wrote:
On 03/28/2015 12:13 AM, Ben Bromhead wrote:
One other thing to keep in mind / check is that doing these tests
locally the cassandra driver will connect using the network stack,
whereas postgres supports local connections over a unix domain socket
(this is also enabled by default).

Unix domain sockets are significantly faster than tcp as you don't have
a network stack to traverse. I think any driver using libpq will attempt
to use the domain socket when connecting locally.

Good catch. I assured that psycopg2 connects through a TCP socket and the numbers increased by about 20%, but it still is an order of magnitude faster than Cassandra.


But I'm going to hazard a guess something else is going on with the
Cassandra connection as I'm able to get 0.5ms queries locally and that's
even with trace turned on.

Using python-driver?



--

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr | <a href="tel:%28650%29%20284%209692" value="+16502849692" target="_blank">(650) 284 9692




--
Tyler Hobbs
DataStax



--
Tyler Hobbs
DataStax
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: High latencies for simple queries

Anishek Agarwal
Hey all, 

I was wondering if java driver has something like mentioned for python driver above where it sleeps for 1 ms . My writes are very as as expected with around 600 micro seconds per write instruction though read is bad at 1300 micro seconds. 

Seems read latency is bad, is this expected ?

Thanks
Anishek

On Tue, Mar 31, 2015 at 9:11 PM, Tyler Hobbs <[hidden email]> wrote:
To clarify, that's in Cassandra 2.1+.  In 2.0 and earlier, we used http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/ for cqlsh.

On Tue, Mar 31, 2015 at 10:40 AM, Tyler Hobbs <[hidden email]> wrote:
The python driver that we bundle with Cassandra for cqlsh is the normal python driver (https://github.com/datastax/python-driver), although sometimes it's patched for bugfixes or is not an official release.

On Sat, Mar 28, 2015 at 5:36 PM, Ben Bromhead <[hidden email]> wrote:
cqlsh runs on the internal cassandra python drivers: cassandra-pylib and cqlshlib.

I would not recommend using them at all (nothing wrong with them, they are just not built with external users in mind).

I have never used python-driver in anger so I can't comment on whether it is genuinely slower than the internal C* python driver, but this might be a question for python-driver folk.

On 28 March 2015 at 00:34, Artur Siekielski <[hidden email]> wrote:
On 03/28/2015 12:13 AM, Ben Bromhead wrote:
One other thing to keep in mind / check is that doing these tests
locally the cassandra driver will connect using the network stack,
whereas postgres supports local connections over a unix domain socket
(this is also enabled by default).

Unix domain sockets are significantly faster than tcp as you don't have
a network stack to traverse. I think any driver using libpq will attempt
to use the domain socket when connecting locally.

Good catch. I assured that psycopg2 connects through a TCP socket and the numbers increased by about 20%, but it still is an order of magnitude faster than Cassandra.


But I'm going to hazard a guess something else is going on with the
Cassandra connection as I'm able to get 0.5ms queries locally and that's
even with trace turned on.

Using python-driver?



--

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr | <a href="tel:%28650%29%20284%209692" value="+16502849692" target="_blank">(650) 284 9692




--
Tyler Hobbs
DataStax



--
Tyler Hobbs
DataStax

Loading...