hector or astyanax

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

hector or astyanax

李 晗
hello,
i want to know which cassandra client is better?
and what are their advantages and disadvantages?

thanks
Reply | Threaded
Open this post in threaded view
|

Re: hector or astyanax

Shamim
Hi,
  Astyanax is just a refactoring of Hector and implements a few common cassandra use cases. Very easy to use api. In Astyanax you will found all the functions from hector. For better performance you can also check datastax java driver https://github.com/datastax/java-driver.

There are another lightweight client from twitter https://github.com/twitter/cassie

--
Best regards
  Shamim A.

05.05.2013, 05:30, "李 晗" <[hidden email]>:
> hello,
> i want to know which cassandra client is better?
> and what are their advantages and disadvantages?
>
> thanks
Reply | Threaded
Open this post in threaded view
|

Re: hector or astyanax

Renato Marroquín Mogrovejo
Hey Shamim,

Why do you say that Java-Driver has better performance over Hector or
Astyanax? Is there any reasons for this?
Thanks.


Renato M.

2013/5/5 Shamim <[hidden email]>:

> Hi,
>   Astyanax is just a refactoring of Hector and implements a few common cassandra use cases. Very easy to use api. In Astyanax you will found all the functions from hector. For better performance you can also check datastax java driver https://github.com/datastax/java-driver.
>
> There are another lightweight client from twitter https://github.com/twitter/cassie
>
> --
> Best regards
>   Shamim A.
>
> 05.05.2013, 05:30, "李 晗" <[hidden email]>:
>> hello,
>> i want to know which cassandra client is better?
>> and what are their advantages and disadvantages?
>>
>> thanks
Reply | Threaded
Open this post in threaded view
|

Re: hector or astyanax

Derek Williams
The binary protocol is able to multiplex multiple requests using a single connection, which can lead to much better performance (similar to HTTP vs SPDY). This is without comparing the performance of thrift vs binary protocol, which I assume the binary protocol would be faster since it is specialized for cassandra requests.


On Sun, May 5, 2013 at 12:14 PM, Renato Marroquín Mogrovejo <[hidden email]> wrote:
Hey Shamim,

Why do you say that Java-Driver has better performance over Hector or
Astyanax? Is there any reasons for this?
Thanks.


Renato M.

2013/5/5 Shamim <[hidden email]>:
> Hi,
>   Astyanax is just a refactoring of Hector and implements a few common cassandra use cases. Very easy to use api. In Astyanax you will found all the functions from hector. For better performance you can also check datastax java driver https://github.com/datastax/java-driver.
>
> There are another lightweight client from twitter https://github.com/twitter/cassie
>
> --
> Best regards
>   Shamim A.
>
> 05.05.2013, 05:30, "李 晗" <[hidden email]>:
>> hello,
>> i want to know which cassandra client is better?
>> and what are their advantages and disadvantages?
>>
>> thanks



--
Derek Williams
Reply | Threaded
Open this post in threaded view
|

Re: hector or astyanax

Aaron Turner


On Sun, May 5, 2013 at 1:09 PM, Derek Williams <[hidden email]> wrote:
The binary protocol is able to multiplex multiple requests using a single connection, which can lead to much better performance (similar to HTTP vs SPDY). This is without comparing the performance of thrift vs binary protocol, which I assume the binary protocol would be faster since it is specialized for cassandra requests.


Curious why you think multiplexing multiple requests over a single connection (serial) is faster then multiple requests over multiple connections (parallel)?

And isn't Thrift a binary protocol?


--
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety. 
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"
Reply | Threaded
Open this post in threaded view
|

Re: hector or astyanax

Edward Capriolo
In reply to this post by Derek Williams
I am aware of no benchmark that shows the binary driver to be faster then thrift. Yes. Theoretically a driver that with multiplex *should be* faster in *some* cases. However I have never seen any evidence to back up this theory anecdotal or otherwise.

In fact....
https://github.com/pchalamet/cassandra-sharp/pull/24


On Sun, May 5, 2013 at 4:09 PM, Derek Williams <[hidden email]> wrote:
The binary protocol is able to multiplex multiple requests using a single connection, which can lead to much better performance (similar to HTTP vs SPDY). This is without comparing the performance of thrift vs binary protocol, which I assume the binary protocol would be faster since it is specialized for cassandra requests.


On Sun, May 5, 2013 at 12:14 PM, Renato Marroquín Mogrovejo <[hidden email]> wrote:
Hey Shamim,

Why do you say that Java-Driver has better performance over Hector or
Astyanax? Is there any reasons for this?
Thanks.


Renato M.

2013/5/5 Shamim <[hidden email]>:
> Hi,
>   Astyanax is just a refactoring of Hector and implements a few common cassandra use cases. Very easy to use api. In Astyanax you will found all the functions from hector. For better performance you can also check datastax java driver https://github.com/datastax/java-driver.
>
> There are another lightweight client from twitter https://github.com/twitter/cassie
>
> --
> Best regards
>   Shamim A.
>
> 05.05.2013, 05:30, "李 晗" <[hidden email]>:
>> hello,
>> i want to know which cassandra client is better?
>> and what are their advantages and disadvantages?
>>
>> thanks



--
Derek Williams

Reply | Threaded
Open this post in threaded view
|

Re: hector or astyanax

Derek Williams

I haven't done any performance testing with Cassandra 1.2, I was only giving possible reasons why the datastax binary driver might be faster. I have seen much better performance and reliability under heavy load in our internal rest services when switching from HTTP to SPDY. Possible reasons could be less sockets reads with more data per read which increases throughput, and it's easier to configure optimally.

And it could very well be that the datastax binary driver is slower than a thrift client, I haven't benchmarked it. Even so, that doesn't mean a driver that used Cassandra's binary protocol couldn't be made to run faster. I look forward to the opportunity to give it a try once we do move to 1.2.



On Sun, May 5, 2013 at 6:39 PM, Edward Capriolo <[hidden email]> wrote:
I am aware of no benchmark that shows the binary driver to be faster then thrift. Yes. Theoretically a driver that with multiplex *should be* faster in *some* cases. However I have never seen any evidence to back up this theory anecdotal or otherwise.

In fact....
https://github.com/pchalamet/cassandra-sharp/pull/24


On Sun, May 5, 2013 at 4:09 PM, Derek Williams <[hidden email]> wrote:
The binary protocol is able to multiplex multiple requests using a single connection, which can lead to much better performance (similar to HTTP vs SPDY). This is without comparing the performance of thrift vs binary protocol, which I assume the binary protocol would be faster since it is specialized for cassandra requests.


On Sun, May 5, 2013 at 12:14 PM, Renato Marroquín Mogrovejo <[hidden email]> wrote:
Hey Shamim,

Why do you say that Java-Driver has better performance over Hector or
Astyanax? Is there any reasons for this?
Thanks.


Renato M.

2013/5/5 Shamim <[hidden email]>:
> Hi,
>   Astyanax is just a refactoring of Hector and implements a few common cassandra use cases. Very easy to use api. In Astyanax you will found all the functions from hector. For better performance you can also check datastax java driver https://github.com/datastax/java-driver.
>
> There are another lightweight client from twitter https://github.com/twitter/cassie
>
> --
> Best regards
>   Shamim A.
>
> 05.05.2013, 05:30, "李 晗" <[hidden email]>:
>> hello,
>> i want to know which cassandra client is better?
>> and what are their advantages and disadvantages?
>>
>> thanks



--
Derek Williams




--
Derek Williams
Reply | Threaded
Open this post in threaded view
|

Re: hector or astyanax

Hiller, Dean
In reply to this post by Aaron Turner
I was under the impression that it is multiple requests using a single connectin PARALLEL not serial as they have request ids and the responses do as well so you can send a request while a previous request has no response just yet.

I think you do get a big speed advantage from the asynchronous nature as you do not need to hold up so many threads in your webserver while you have outstanding requests being processed.  The thrift async was not exactly async like I am suspecting the new java driver is, but have not verified(I hope it is)

Dean

From: Aaron Turner <[hidden email]<mailto:[hidden email]>>
Reply-To: "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>
Date: Sunday, May 5, 2013 5:27 PM
To: cassandra users <[hidden email]<mailto:[hidden email]>>
Subject: Re: hector or astyanax



On Sun, May 5, 2013 at 1:09 PM, Derek Williams <[hidden email]<mailto:[hidden email]>> wrote:
The binary protocol is able to multiplex multiple requests using a single connection, which can lead to much better performance (similar to HTTP vs SPDY). This is without comparing the performance of thrift vs binary protocol, which I assume the binary protocol would be faster since it is specialized for cassandra requests.


Curious why you think multiplexing multiple requests over a single connection (serial) is faster then multiple requests over multiple connections (parallel)?

And isn't Thrift a binary protocol?


--
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"
Reply | Threaded
Open this post in threaded view
|

Re: hector or astyanax

Aaron Turner
Just because you can batch queries or have the server process them out of order doesn't make it fully "parellel".  You're still using a single TCP connection which is by definition a serial data stream.  Basically, if you send a bunch of queries which each return a large amount of data you've effectively limited your query throughput to a single TCP connection.  Using Thrift, each query result is returned in it's own TCP stream in *parallel*.

Not saying the new API isn't great, doesn't have it's place or may have better performance in certain situations, but generally speaking I would refrain from making general claims without actual benchmarks to back them up.   I do completely agree that Async interfaces have their place and have certain advantages over multi-threading models, but it's just another tool to be used when appropriate.

Just my .02. :)



On Mon, May 6, 2013 at 5:08 AM, Hiller, Dean <[hidden email]> wrote:
I was under the impression that it is multiple requests using a single connectin PARALLEL not serial as they have request ids and the responses do as well so you can send a request while a previous request has no response just yet.

I think you do get a big speed advantage from the asynchronous nature as you do not need to hold up so many threads in your webserver while you have outstanding requests being processed.  The thrift async was not exactly async like I am suspecting the new java driver is, but have not verified(I hope it is)

Dean

From: Aaron Turner <[hidden email]<mailto:[hidden email]>>
Reply-To: "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>
Date: Sunday, May 5, 2013 5:27 PM
To: cassandra users <[hidden email]<mailto:[hidden email]>>
Subject: Re: hector or astyanax



On Sun, May 5, 2013 at 1:09 PM, Derek Williams <[hidden email]<mailto:[hidden email]>> wrote:
The binary protocol is able to multiplex multiple requests using a single connection, which can lead to much better performance (similar to HTTP vs SPDY). This is without comparing the performance of thrift vs binary protocol, which I assume the binary protocol would be faster since it is specialized for cassandra requests.


Curious why you think multiplexing multiple requests over a single connection (serial) is faster then multiple requests over multiple connections (parallel)?

And isn't Thrift a binary protocol?


--
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"



--
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety. 
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"
Reply | Threaded
Open this post in threaded view
|

Re: hector or astyanax

Hiller, Dean
You have me thinking more.  I wonder in practice if 3 sockets is any faster than 1 socket when doing nio.  If your buffer sizes were small, maybe that would be the case.  Usually the nic buffers are big so when the selector fires it is reading from 3 buffers for 3 sockets or 1 buffer for one socket.  In both cases, all 3 requests are there in the buffers.  At any rate, my belief is it probably is still basically parallel performance on one socket though I have not tested my theory…..My theory being the real bottleneck on performance being the work cassandra has to do on the reads and such.

What about 20 sockets then(like someone has a pool).  Will it be any faster…not really sure as in the end you are still held up by the real bottleneck of reading from disk on the cassandra side.  We went to 20 threads in one case using 20 sockets with astyanax and received no performance improvement(synchronous but more sockets did not improve our performance).  Ie. It may be the case 90% of the time, one socket is just as fast as 10/20…..I would love to know the truth/answer to that though.

Later,
Dean


From: Aaron Turner <[hidden email]<mailto:[hidden email]>>
Reply-To: "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>
Date: Monday, May 6, 2013 10:57 AM
To: cassandra users <[hidden email]<mailto:[hidden email]>>
Subject: Re: hector or astyanax

Just because you can batch queries or have the server process them out of order doesn't make it fully "parellel".  You're still using a single TCP connection which is by definition a serial data stream.  Basically, if you send a bunch of queries which each return a large amount of data you've effectively limited your query throughput to a single TCP connection.  Using Thrift, each query result is returned in it's own TCP stream in *parallel*.

Not saying the new API isn't great, doesn't have it's place or may have better performance in certain situations, but generally speaking I would refrain from making general claims without actual benchmarks to back them up.   I do completely agree that Async interfaces have their place and have certain advantages over multi-threading models, but it's just another tool to be used when appropriate.

Just my .02. :)



On Mon, May 6, 2013 at 5:08 AM, Hiller, Dean <[hidden email]<mailto:[hidden email]>> wrote:
I was under the impression that it is multiple requests using a single connectin PARALLEL not serial as they have request ids and the responses do as well so you can send a request while a previous request has no response just yet.

I think you do get a big speed advantage from the asynchronous nature as you do not need to hold up so many threads in your webserver while you have outstanding requests being processed.  The thrift async was not exactly async like I am suspecting the new java driver is, but have not verified(I hope it is)

Dean

From: Aaron Turner <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>
Reply-To: "[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>" <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>
Date: Sunday, May 5, 2013 5:27 PM
To: cassandra users <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>
Subject: Re: hector or astyanax



On Sun, May 5, 2013 at 1:09 PM, Derek Williams <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> wrote:
The binary protocol is able to multiplex multiple requests using a single connection, which can lead to much better performance (similar to HTTP vs SPDY). This is without comparing the performance of thrift vs binary protocol, which I assume the binary protocol would be faster since it is specialized for cassandra requests.


Curious why you think multiplexing multiple requests over a single connection (serial) is faster then multiple requests over multiple connections (parallel)?

And isn't Thrift a binary protocol?


--
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"



--
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"
Reply | Threaded
Open this post in threaded view
|

Re: hector or astyanax

Aaron Turner
From my experience, your NIC buffers generally aren't the problem (or at least it's easy to tune them to fix).  It's TCP.  Simply put, your raw NIC throughput > single TCP socket throughput on most modern hardware/OS combinations.  This is especially true as latency increases between the two hosts.  This is why Bittorrent or "download accellerators" are often faster then just downloading a large file via your browser or ftp client- they're running multiple TCP connections in parallel compared to only one.

TCP is great for reliable, bi-directional, stream based communication.  Not the best solution for high throughput though.  UDP is much better for that, but then you loose all the features that TCP gives you and so then people end up re-inventing the wheel (poorly I might add).

So yeah, I think the answer to the question of "which is faster" the answer is "it depends on your queries".



On Mon, May 6, 2013 at 10:24 AM, Hiller, Dean <[hidden email]> wrote:
You have me thinking more.  I wonder in practice if 3 sockets is any faster than 1 socket when doing nio.  If your buffer sizes were small, maybe that would be the case.  Usually the nic buffers are big so when the selector fires it is reading from 3 buffers for 3 sockets or 1 buffer for one socket.  In both cases, all 3 requests are there in the buffers.  At any rate, my belief is it probably is still basically parallel performance on one socket though I have not tested my theory…..My theory being the real bottleneck on performance being the work cassandra has to do on the reads and such.

What about 20 sockets then(like someone has a pool).  Will it be any faster…not really sure as in the end you are still held up by the real bottleneck of reading from disk on the cassandra side.  We went to 20 threads in one case using 20 sockets with astyanax and received no performance improvement(synchronous but more sockets did not improve our performance).  Ie. It may be the case 90% of the time, one socket is just as fast as 10/20…..I would love to know the truth/answer to that though.

Later,
Dean


From: Aaron Turner <[hidden email]<mailto:[hidden email]>>
Reply-To: "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>
Date: Monday, May 6, 2013 10:57 AM
To: cassandra users <[hidden email]<mailto:[hidden email]>>
Subject: Re: hector or astyanax

Just because you can batch queries or have the server process them out of order doesn't make it fully "parellel".  You're still using a single TCP connection which is by definition a serial data stream.  Basically, if you send a bunch of queries which each return a large amount of data you've effectively limited your query throughput to a single TCP connection.  Using Thrift, each query result is returned in it's own TCP stream in *parallel*.

Not saying the new API isn't great, doesn't have it's place or may have better performance in certain situations, but generally speaking I would refrain from making general claims without actual benchmarks to back them up.   I do completely agree that Async interfaces have their place and have certain advantages over multi-threading models, but it's just another tool to be used when appropriate.

Just my .02. :)



On Mon, May 6, 2013 at 5:08 AM, Hiller, Dean <[hidden email]<mailto:[hidden email]>> wrote:
I was under the impression that it is multiple requests using a single connectin PARALLEL not serial as they have request ids and the responses do as well so you can send a request while a previous request has no response just yet.

I think you do get a big speed advantage from the asynchronous nature as you do not need to hold up so many threads in your webserver while you have outstanding requests being processed.  The thrift async was not exactly async like I am suspecting the new java driver is, but have not verified(I hope it is)

Dean

From: Aaron Turner <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>
Reply-To: "[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>" <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>
Date: Sunday, May 5, 2013 5:27 PM
To: cassandra users <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>
Subject: Re: hector or astyanax



On Sun, May 5, 2013 at 1:09 PM, Derek Williams <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> wrote:
The binary protocol is able to multiplex multiple requests using a single connection, which can lead to much better performance (similar to HTTP vs SPDY). This is without comparing the performance of thrift vs binary protocol, which I assume the binary protocol would be faster since it is specialized for cassandra requests.


Curious why you think multiplexing multiple requests over a single connection (serial) is faster then multiple requests over multiple connections (parallel)?

And isn't Thrift a binary protocol?


--
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"



--
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"



--
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety. 
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"
Reply | Threaded
Open this post in threaded view
|

Re: hector or astyanax

Derek Williams
Also have to keep in mind that it should be rare to only use a single socket since you are usually making at least 1 connection per node in the cluster (or local datacenter). There is also nothing enforcing that a single client cannot open more than 1 connection to a node. In the end it should come down to which protocol implementation is faster.


On Mon, May 6, 2013 at 11:58 AM, Aaron Turner <[hidden email]> wrote:
From my experience, your NIC buffers generally aren't the problem (or at least it's easy to tune them to fix).  It's TCP.  Simply put, your raw NIC throughput > single TCP socket throughput on most modern hardware/OS combinations.  This is especially true as latency increases between the two hosts.  This is why Bittorrent or "download accellerators" are often faster then just downloading a large file via your browser or ftp client- they're running multiple TCP connections in parallel compared to only one.

TCP is great for reliable, bi-directional, stream based communication.  Not the best solution for high throughput though.  UDP is much better for that, but then you loose all the features that TCP gives you and so then people end up re-inventing the wheel (poorly I might add).

So yeah, I think the answer to the question of "which is faster" the answer is "it depends on your queries".



On Mon, May 6, 2013 at 10:24 AM, Hiller, Dean <[hidden email]> wrote:
You have me thinking more.  I wonder in practice if 3 sockets is any faster than 1 socket when doing nio.  If your buffer sizes were small, maybe that would be the case.  Usually the nic buffers are big so when the selector fires it is reading from 3 buffers for 3 sockets or 1 buffer for one socket.  In both cases, all 3 requests are there in the buffers.  At any rate, my belief is it probably is still basically parallel performance on one socket though I have not tested my theory…..My theory being the real bottleneck on performance being the work cassandra has to do on the reads and such.

What about 20 sockets then(like someone has a pool).  Will it be any faster…not really sure as in the end you are still held up by the real bottleneck of reading from disk on the cassandra side.  We went to 20 threads in one case using 20 sockets with astyanax and received no performance improvement(synchronous but more sockets did not improve our performance).  Ie. It may be the case 90% of the time, one socket is just as fast as 10/20…..I would love to know the truth/answer to that though.

Later,
Dean


From: Aaron Turner <[hidden email]<mailto:[hidden email]>>
Reply-To: "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>
Date: Monday, May 6, 2013 10:57 AM
To: cassandra users <[hidden email]<mailto:[hidden email]>>
Subject: Re: hector or astyanax

Just because you can batch queries or have the server process them out of order doesn't make it fully "parellel".  You're still using a single TCP connection which is by definition a serial data stream.  Basically, if you send a bunch of queries which each return a large amount of data you've effectively limited your query throughput to a single TCP connection.  Using Thrift, each query result is returned in it's own TCP stream in *parallel*.

Not saying the new API isn't great, doesn't have it's place or may have better performance in certain situations, but generally speaking I would refrain from making general claims without actual benchmarks to back them up.   I do completely agree that Async interfaces have their place and have certain advantages over multi-threading models, but it's just another tool to be used when appropriate.

Just my .02. :)



On Mon, May 6, 2013 at 5:08 AM, Hiller, Dean <[hidden email]<mailto:[hidden email]>> wrote:
I was under the impression that it is multiple requests using a single connectin PARALLEL not serial as they have request ids and the responses do as well so you can send a request while a previous request has no response just yet.

I think you do get a big speed advantage from the asynchronous nature as you do not need to hold up so many threads in your webserver while you have outstanding requests being processed.  The thrift async was not exactly async like I am suspecting the new java driver is, but have not verified(I hope it is)

Dean

From: Aaron Turner <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>
Reply-To: "[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>" <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>
Date: Sunday, May 5, 2013 5:27 PM
To: cassandra users <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>
Subject: Re: hector or astyanax



On Sun, May 5, 2013 at 1:09 PM, Derek Williams <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> wrote:
The binary protocol is able to multiplex multiple requests using a single connection, which can lead to much better performance (similar to HTTP vs SPDY). This is without comparing the performance of thrift vs binary protocol, which I assume the binary protocol would be faster since it is specialized for cassandra requests.


Curious why you think multiplexing multiple requests over a single connection (serial) is faster then multiple requests over multiple connections (parallel)?

And isn't Thrift a binary protocol?


--
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"



--
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"



--
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety. 
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"



--
Derek Williams
Reply | Threaded
Open this post in threaded view
|

Re: hector or astyanax

aaron morton
i want to know which cassandra client is better?
Go with Astynax or Native Binary, they are both under active development and support by a vendor / large implementor. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton

On 7/05/2013, at 7:03 AM, Derek Williams <[hidden email]> wrote:

Also have to keep in mind that it should be rare to only use a single socket since you are usually making at least 1 connection per node in the cluster (or local datacenter). There is also nothing enforcing that a single client cannot open more than 1 connection to a node. In the end it should come down to which protocol implementation is faster.


On Mon, May 6, 2013 at 11:58 AM, Aaron Turner <[hidden email]> wrote:
From my experience, your NIC buffers generally aren't the problem (or at least it's easy to tune them to fix).  It's TCP.  Simply put, your raw NIC throughput > single TCP socket throughput on most modern hardware/OS combinations.  This is especially true as latency increases between the two hosts.  This is why Bittorrent or "download accellerators" are often faster then just downloading a large file via your browser or ftp client- they're running multiple TCP connections in parallel compared to only one.

TCP is great for reliable, bi-directional, stream based communication.  Not the best solution for high throughput though.  UDP is much better for that, but then you loose all the features that TCP gives you and so then people end up re-inventing the wheel (poorly I might add).

So yeah, I think the answer to the question of "which is faster" the answer is "it depends on your queries".



On Mon, May 6, 2013 at 10:24 AM, Hiller, Dean <[hidden email]> wrote:
You have me thinking more.  I wonder in practice if 3 sockets is any faster than 1 socket when doing nio.  If your buffer sizes were small, maybe that would be the case.  Usually the nic buffers are big so when the selector fires it is reading from 3 buffers for 3 sockets or 1 buffer for one socket.  In both cases, all 3 requests are there in the buffers.  At any rate, my belief is it probably is still basically parallel performance on one socket though I have not tested my theory…..My theory being the real bottleneck on performance being the work cassandra has to do on the reads and such.

What about 20 sockets then(like someone has a pool).  Will it be any faster…not really sure as in the end you are still held up by the real bottleneck of reading from disk on the cassandra side.  We went to 20 threads in one case using 20 sockets with astyanax and received no performance improvement(synchronous but more sockets did not improve our performance).  Ie. It may be the case 90% of the time, one socket is just as fast as 10/20…..I would love to know the truth/answer to that though.

Later,
Dean


From: Aaron Turner <[hidden email]<mailto:[hidden email]>>
Reply-To: "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>
Date: Monday, May 6, 2013 10:57 AM
To: cassandra users <[hidden email]<mailto:[hidden email]>>
Subject: Re: hector or astyanax

Just because you can batch queries or have the server process them out of order doesn't make it fully "parellel".  You're still using a single TCP connection which is by definition a serial data stream.  Basically, if you send a bunch of queries which each return a large amount of data you've effectively limited your query throughput to a single TCP connection.  Using Thrift, each query result is returned in it's own TCP stream in *parallel*.

Not saying the new API isn't great, doesn't have it's place or may have better performance in certain situations, but generally speaking I would refrain from making general claims without actual benchmarks to back them up.   I do completely agree that Async interfaces have their place and have certain advantages over multi-threading models, but it's just another tool to be used when appropriate.

Just my .02. :)



On Mon, May 6, 2013 at 5:08 AM, Hiller, Dean <[hidden email]<mailto:[hidden email]>> wrote:
I was under the impression that it is multiple requests using a single connectin PARALLEL not serial as they have request ids and the responses do as well so you can send a request while a previous request has no response just yet.

I think you do get a big speed advantage from the asynchronous nature as you do not need to hold up so many threads in your webserver while you have outstanding requests being processed.  The thrift async was not exactly async like I am suspecting the new java driver is, but have not verified(I hope it is)

Dean

From: Aaron Turner <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>
Reply-To: "[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>" <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>
Date: Sunday, May 5, 2013 5:27 PM
To: cassandra users <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>
Subject: Re: hector or astyanax



On Sun, May 5, 2013 at 1:09 PM, Derek Williams <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> wrote:
The binary protocol is able to multiplex multiple requests using a single connection, which can lead to much better performance (similar to HTTP vs SPDY). This is without comparing the performance of thrift vs binary protocol, which I assume the binary protocol would be faster since it is specialized for cassandra requests.


Curious why you think multiplexing multiple requests over a single connection (serial) is faster then multiple requests over multiple connections (parallel)?

And isn't Thrift a binary protocol?


--
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"



--
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"



--
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety. 
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"



--
Derek Williams

Reply | Threaded
Open this post in threaded view
|

Re: hector or astyanax

Blair Zajac
On 05/07/2013 01:37 AM, aaron morton wrote:
>> i want to know which cassandra client is better?
> Go with Astynax or Native Binary, they are both under active development
> and support by a vendor / large implementor.

Native Binary being which one specifically?  Do you mean the new
DataStax java-driver? [1]

Regards,
Blair

[1] https://github.com/datastax/java-driver
Reply | Threaded
Open this post in threaded view
|

Re: hector or astyanax

aaron morton
Yup, thats the one. 

A

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton

On 8/05/2013, at 3:40 AM, Blair Zajac <[hidden email]> wrote:

On 05/07/2013 01:37 AM, aaron morton wrote:
i want to know which cassandra client is better?
Go with Astynax or Native Binary, they are both under active development
and support by a vendor / large implementor.

Native Binary being which one specifically?  Do you mean the new DataStax java-driver? [1]

Regards,
Blair

[1] https://github.com/datastax/java-driver