Quantcast

Cross-datacenter requests taking a very long time.

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Cross-datacenter requests taking a very long time.

Andrew Vant
I have a Cassandra 2.0.13 cluster with three datacenters, three nodes per datacenter. If I open cqlsh and do a select with any consistency level that crosses datacenters (e.g. QUORUM or ALL), it works, but takes 2+ minutes to return. The same statement with consistency ONE or LOCAL_QUORUM is as fast as it should be. It does not appear to be latency between centers; I can point cqlsh at a server in a different DC and it's not noticeably slow.

I tried turning tracing on to get a better idea of what was happening; but it complains `Session <long hex string> wasn't found`.

I'm not entirely sure what direction to look in to find the problem.

--

Andrew
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cross-datacenter requests taking a very long time.

daemeon reiydelle
What is your replication factor?

Any idea how much data has to be processed under the query?

With that few nodes (3) in each DC, even with replication=1, you are probably not getting much inter-node data transfer in a local quorum, until of course you do cross data centers and at least one full copy of the data has to come across the wire.

While running the query against both DC's, you can take a look at netstats to get a really quick-and-dirty idea of network traffic.


.......
“Life should not be a journey to the grave with the intention of arriving safely in a
pretty and well preserved body, but rather to skid in broadside in a cloud of smoke,
thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!”

- Hunter Thompson

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872


On Tue, Mar 31, 2015 at 1:54 PM, Andrew Vant <[hidden email]> wrote:
I have a Cassandra 2.0.13 cluster with three datacenters, three nodes per datacenter. If I open cqlsh and do a select with any consistency level that crosses datacenters (e.g. QUORUM or ALL), it works, but takes 2+ minutes to return. The same statement with consistency ONE or LOCAL_QUORUM is as fast as it should be. It does not appear to be latency between centers; I can point cqlsh at a server in a different DC and it's not noticeably slow.

I tried turning tracing on to get a better idea of what was happening; but it complains `Session <long hex string> wasn't found`.

I'm not entirely sure what direction to look in to find the problem.

--

Andrew

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cross-datacenter requests taking a very long time.

Robert Coli-3
In reply to this post by Andrew Vant
On Tue, Mar 31, 2015 at 1:54 PM, Andrew Vant <[hidden email]> wrote:
I have a Cassandra 2.0.13 cluster with three datacenters, three nodes per datacenter. If I open cqlsh and do a select with any consistency level that crosses datacenters (e.g. QUORUM or ALL), it works, but takes 2+ minutes to return. The same statement with consistency ONE or LOCAL_QUORUM is as fast as it should be. It does not appear to be latency between centers; I can point cqlsh at a server in a different DC and it's not noticeably slow.

Have you changed your default timeouts? 2 minutes is about... 2 minutes longer than the default timeouts... ?

=Rob
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cross-datacenter requests taking a very long time.

Bharatendra Boddu
In reply to this post by daemeon reiydelle
What type of snitch are you using for cassandra.yaml: endpoint_snitch ? PropertyFileSnitch can improve performance.

- bharat

On Tue, Mar 31, 2015 at 1:59 PM, daemeon reiydelle <[hidden email]> wrote:
What is your replication factor?

Any idea how much data has to be processed under the query?

With that few nodes (3) in each DC, even with replication=1, you are probably not getting much inter-node data transfer in a local quorum, until of course you do cross data centers and at least one full copy of the data has to come across the wire.

While running the query against both DC's, you can take a look at netstats to get a really quick-and-dirty idea of network traffic.


.......
“Life should not be a journey to the grave with the intention of arriving safely in a
pretty and well preserved body, but rather to skid in broadside in a cloud of smoke,
thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!”

- Hunter Thompson

Daemeon C.M. Reiydelle
USA <a href="tel:%28%2B1%29%20415.501.0198" value="+14155010198" target="_blank">(+1) 415.501.0198
London <a href="tel:%28%2B44%29%20%280%29%2020%208144%209872" value="+442081449872" target="_blank">(+44) (0) 20 8144 9872


On Tue, Mar 31, 2015 at 1:54 PM, Andrew Vant <[hidden email]> wrote:
I have a Cassandra 2.0.13 cluster with three datacenters, three nodes per datacenter. If I open cqlsh and do a select with any consistency level that crosses datacenters (e.g. QUORUM or ALL), it works, but takes 2+ minutes to return. The same statement with consistency ONE or LOCAL_QUORUM is as fast as it should be. It does not appear to be latency between centers; I can point cqlsh at a server in a different DC and it's not noticeably slow.

I tried turning tracing on to get a better idea of what was happening; but it complains `Session <long hex string> wasn't found`.

I'm not entirely sure what direction to look in to find the problem.

--

Andrew


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: COMMERCIAL:Re: Cross-datacenter requests taking a very long time.

Andrew Vant
In reply to this post by daemeon reiydelle
On Mar 31, 2015, at 4:59 PM, daemeon reiydelle <[hidden email]> wrote:
> What is your replication factor?

NetworkTopologyStrategy with replfactor: 2 in each DC.

Someone else asked about the endpoint snitch I'm using; it's set to GossipingPropertyFileSnitch.

> Any idea how much data has to be processed under the query?

It does not matter what query I use, or what size; the problem occurs even just selecting a single user from the users table.

> While running the query against both DC's, you can take a look at netstats
> to get a really quick-and-dirty idea of network traffic.

I'll try that. I should add that one of the other teams here has a similar setup (3 nodes in 3 DCs) that is working correctly. We're going to go through the config files and see if we can figure out what's different.

--

Andrew
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: COMMERCIAL:Re: Cross-datacenter requests taking a very long time.

daemeon reiydelle
You might want to see what quorum is configured? I meant to ask that.


.......
“Life should not be a journey to the grave with the intention of arriving safely in a
pretty and well preserved body, but rather to skid in broadside in a cloud of smoke,
thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!”

- Hunter Thompson

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872


On Thu, Apr 2, 2015 at 12:39 PM, Andrew Vant <[hidden email]> wrote:
On Mar 31, 2015, at 4:59 PM, daemeon reiydelle <[hidden email]> wrote:
> What is your replication factor?

NetworkTopologyStrategy with replfactor: 2 in each DC.

Someone else asked about the endpoint snitch I'm using; it's set to GossipingPropertyFileSnitch.

> Any idea how much data has to be processed under the query?

It does not matter what query I use, or what size; the problem occurs even just selecting a single user from the users table.

> While running the query against both DC's, you can take a look at netstats
> to get a really quick-and-dirty idea of network traffic.

I'll try that. I should add that one of the other teams here has a similar setup (3 nodes in 3 DCs) that is working correctly. We're going to go through the config files and see if we can figure out what's different.

--

Andrew

Loading...