|
I will be loadbalancing between nodes using HAProxy. Is this recommended?
Also is there a some sort of ping/health check uri available? Thanks |
|
no and no.
On Sat, Aug 28, 2010 at 10:28 AM, Mark <[hidden email]> wrote: > I will be loadbalancing between nodes using HAProxy. Is this recommended? > > Also is there a some sort of ping/health check uri available? > > Thanks > |
|
On 8/28/10 11:20 AM, Benjamin Black wrote:
> no and no. > > On Sat, Aug 28, 2010 at 10:28 AM, Mark<[hidden email]> wrote: >> I will be loadbalancing between nodes using HAProxy. Is this recommended? >> >> Also is there a some sort of ping/health check uri available? >> >> Thanks >> any reason on why loadbalancing client connections using HAProxy isnt recommended? |
|
In reply to this post by Benjamin Black
On 8/28/10 11:20 AM, Benjamin Black wrote:
> no and no. > > On Sat, Aug 28, 2010 at 10:28 AM, Mark<[hidden email]> wrote: >> I will be loadbalancing between nodes using HAProxy. Is this recommended? >> >> Also is there a some sort of ping/health check uri available? >> >> Thanks >> Also, what would be a good way of monitoring the health of the cluster? |
|
In reply to this post by Mark-50
Because you create a bottleneck at the HAProxy and because the
presence of the proxy precludes clients properly backing off from nodes returning errors. The proper approach is to have clients maintain connection pools with connections to multiple nodes in the cluster, and then to spread requests across those connections. Should a node begin returning errors (for example, because it is overloaded), clients can remove it from rotation. On Sat, Aug 28, 2010 at 11:27 AM, Mark <[hidden email]> wrote: > On 8/28/10 11:20 AM, Benjamin Black wrote: >> >> no and no. >> >> On Sat, Aug 28, 2010 at 10:28 AM, Mark<[hidden email]> wrote: >>> >>> I will be loadbalancing between nodes using HAProxy. Is this >>> recommended? >>> >>> Also is there a some sort of ping/health check uri available? >>> >>> Thanks >>> > any reason on why loadbalancing client connections using HAProxy isnt > recommended? > |
|
In reply to this post by Mark-50
I think maybe he thought you meant put a layer between cassandra internal
communication. There's no problem balancing client connections with haproxy, we've been pushing several billion requests per month through haproxy to cassandra. we use mode tcp balance leastconn server local 127.0.0.1:12350 check so basically just a connect based check, and it works fine -Anthony On Sat, Aug 28, 2010 at 11:27:26AM -0700, Mark wrote: > On 8/28/10 11:20 AM, Benjamin Black wrote: > >no and no. > > > >On Sat, Aug 28, 2010 at 10:28 AM, Mark<[hidden email]> wrote: > >> I will be loadbalancing between nodes using HAProxy. Is this > >> recommended? > >> > >>Also is there a some sort of ping/health check uri available? > >> > >>Thanks > >> > any reason on why loadbalancing client connections using HAProxy isnt > recommended? -- ------------------------------------------------------------------------ Anthony Molinaro <[hidden email]> |
|
In reply to this post by Mark-50
munin is the simplest thing. There are numerous JMX stats of interest.
As a symmetric distributed system, you should not expect to monitor Cassandra like you would a web server. Intelligent clients use connection pools and react to current node behavior in making choices of where to send requests, including using describe_ring to discover nodes and open new connections as needed. On Sat, Aug 28, 2010 at 11:29 AM, Mark <[hidden email]> wrote: > On 8/28/10 11:20 AM, Benjamin Black wrote: >> >> no and no. >> >> On Sat, Aug 28, 2010 at 10:28 AM, Mark<[hidden email]> wrote: >>> >>> I will be loadbalancing between nodes using HAProxy. Is this >>> recommended? >>> >>> Also is there a some sort of ping/health check uri available? >>> >>> Thanks >>> > Also, what would be a good way of monitoring the health of the cluster? > |
|
In reply to this post by Anthony Molinaro-5
On Sat, Aug 28, 2010 at 2:34 PM, Anthony Molinaro
<[hidden email]> wrote: > I think maybe he thought you meant put a layer between cassandra internal > communication. No, I took the question to be about client connections. > There's no problem balancing client connections with > haproxy, we've been pushing several billion requests per month through > haproxy to cassandra. > Can it be done: yes. Is it best practice: no. Even 10 billion requests/month is an average of less than 4000 reqs/sec. Just not that many for a distributed database like Cassandra. > we use > > mode tcp > balance leastconn > server local 127.0.0.1:12350 check > > so basically just a connect based check, and it works fine > Cassandra can, and does, fail in ways that do not stop it from answering TCP connection requests. Are you saying it works fine because you have seen numerous types of node failures and this was sufficient? I would be quite surprised if that were so. Using an LB for service discovery is a fine thing (connect to a VIP, call describe_ring, open direct connections to cluster nodes). Relying on an LB to do the right thing when it is totally ignorant of what is going across those client connections (as is implied by simply checking for connectivity) is asking for trouble. Doubly so when you use a leastconn policy (a failing node can spit out an error and close a connection with impressive speed, sucking all the traffic to itself; common problem with HTTP servers giving back errors). b |
|
In reply to this post by Mark-50
On Aug 28, 2010, at 12:29 PM, Mark wrote: > Also, what would be a good way of monitoring the health of the cluster? We use Ganglia. I believe failover is usually built into clients. Not sure why using HAProxy or LVS wouldn't be a good option though. I used to use it with MySQL slaves with much success. --Joe |
|
In reply to this post by Benjamin Black
On 8/28/10 2:44 PM, Benjamin Black wrote:
> On Sat, Aug 28, 2010 at 2:34 PM, Anthony Molinaro > <[hidden email]> wrote: >> I think maybe he thought you meant put a layer between cassandra internal >> communication. > No, I took the question to be about client connections. > >> There's no problem balancing client connections with >> haproxy, we've been pushing several billion requests per month through >> haproxy to cassandra. >> > Can it be done: yes. Is it best practice: no. Even 10 billion > requests/month is an average of less than 4000 reqs/sec. Just not > that many for a distributed database like Cassandra. > >> we use >> >> mode tcp >> balance leastconn >> server local 127.0.0.1:12350 check >> >> so basically just a connect based check, and it works fine >> > Cassandra can, and does, fail in ways that do not stop it from > answering TCP connection requests. Are you saying it works fine > because you have seen numerous types of node failures and this was > sufficient? I would be quite surprised if that were so. Using an LB > for service discovery is a fine thing (connect to a VIP, call > describe_ring, open direct connections to cluster nodes). Relying on > an LB to do the right thing when it is totally ignorant of what is > going across those client connections (as is implied by simply > checking for connectivity) is asking for trouble. Doubly so when you > use a leastconn policy (a failing node can spit out an error and close > a connection with impressive speed, sucking all the traffic to itself; > common problem with HTTP servers giving back errors). > > > b sending requests to individual nodes it would send it to haproxy. FYI we are using ruby and our client is the Cassandra gem which I think you may know about :) |
|
In reply to this post by Benjamin Black
On Sat, Aug 28, 2010 at 02:44:41PM -0700, Benjamin Black wrote: > On Sat, Aug 28, 2010 at 2:34 PM, Anthony Molinaro > <[hidden email]> wrote: > > I think maybe he thought you meant put a layer between cassandra internal > > communication. > > No, I took the question to be about client connections. Sorry didn't mean to put words into your mouth > > There's no problem balancing client connections with > > haproxy, we've been pushing several billion requests per month through > > haproxy to cassandra. > > > > Can it be done: yes. Is it best practice: no. Even 10 billion > requests/month is an average of less than 4000 reqs/sec. Just not > that many for a distributed database like Cassandra. I don't know it seems to tax our setup of 39 extra large ec2 nodes, its also closer to 24000 reqs/sec at peak since there are different tables (2 tables for each read and 2 for each write) > Cassandra can, and does, fail in ways that do not stop it from > answering TCP connection requests. Are you saying it works fine > because you have seen numerous types of node failures and this was > sufficient? I would be quite surprised if that were so. Using an LB > for service discovery is a fine thing (connect to a VIP, call > describe_ring, open direct connections to cluster nodes). Relying on > an LB to do the right thing when it is totally ignorant of what is > going across those client connections (as is implied by simply > checking for connectivity) is asking for trouble. Doubly so when you > use a leastconn policy (a failing node can spit out an error and close > a connection with impressive speed, sucking all the traffic to itself; > common problem with HTTP servers giving back errors). The haproxy does seem sufficient for us. We've been running with cassandra in production since 0.3.0 and seen just about every possible failure. For the most part it has worked. I'm not saying it's the most efficient, just that it will work for most people's usage. All the writes to this cluster are via php, which creates a connection for each request, so a connection check works fine in this case. We attempt to pool connections via java for reads, but they reconnect whenever they receive an error. If one machine is misbehaving it tends to fail pretty quickly, at which point all the haproxies drop it (we have an haproxy on every client node, so it acts like a connection pooling mechanism for the client). describe_ring is a newish call, it didn't exist when we wrote our systems and we have not had a chance to revisit. So while yes there are problems with using an haproxy, they are not insurmountable, and it would probably work for many use cases. But like everything YMMV. -Anthony -- ------------------------------------------------------------------------ Anthony Molinaro <[hidden email]> |
|
On Sun, Aug 29, 2010 at 11:04 AM, Anthony Molinaro
<[hidden email]> wrote: > If one machine is misbehaving it tends to fail pretty quickly, at which > point all the haproxies drop it (we have an haproxy on every client node, > so it acts like a connection pooling mechanism for the client). Cool. Except this is not at all how most people use HAProxy (and I'd be very surprised if the OP had this configuration in mind). As you say, you are using it per client as a connection pool (which I do advocate, along with using languages that don't require this sort of hack), rather than as a service proxy on the Cassandra side (which I don't advocate). b |
|
In reply to this post by Anthony Molinaro-5
On Sun, Aug 29, 2010 at 11:04 AM, Anthony Molinaro
<[hidden email]> wrote: > > > I don't know it seems to tax our setup of 39 extra large ec2 nodes, its > also closer to 24000 reqs/sec at peak since there are different tables > (2 tables for each read and 2 for each write) > Could you clarify what you mean here? On the face of it, this performance seems really poor given the number and size of nodes. b |
|
Sent from my iPhone On Aug 29, 2010, at 3:20 PM, Benjamin Black <[hidden email]> wrote: > On Sun, Aug 29, 2010 at 11:04 AM, Anthony Molinaro > <[hidden email]> wrote: >> >> >> I don't know it seems to tax our setup of 39 extra large ec2 nodes, its >> also closer to 24000 reqs/sec at peak since there are different tables >> (2 tables for each read and 2 for each write) >> > > Could you clarify what you mean here? On the face of it, this > performance seems really poor given the number and size of nodes. > > > b |
|
In reply to this post by Benjamin Black
On Sun, Aug 29, 2010 at 12:20:10PM -0700, Benjamin Black wrote: > On Sun, Aug 29, 2010 at 11:04 AM, Anthony Molinaro > <[hidden email]> wrote: > > > > > > I don't know it seems to tax our setup of 39 extra large ec2 nodes, its > > also closer to 24000 reqs/sec at peak since there are different tables > > (2 tables for each read and 2 for each write) > > > > Could you clarify what you mean here? On the face of it, this > performance seems really poor given the number and size of nodes. As you say I would expect to achieve much better performance given the node size, but if you go back and look through some of the issues we've seen over time, you'll find we've been hit with nodes being too small, having too few nodes to deal with request volume, having OOMs, having bad sstables, having the ring appear different to different nodes, and several other problems. Many of i/o problems presented themselves as MessageDeserializer pool backups (although we stopped having these since Jonathan was by and suggested row cache of about 1Gb, thanks Riptano!). We currently have mystery OOMs which are probably caused by GC storms during compactions (although usually the nodes restart and compact fine, so who knows). I also regularly watch nodes go away for 30 seconds or so (logs show node goes dead, then comes back to life a few seconds later). I've sort of given up worrying about these, as we are in the process of moving this cluster to our own machines in a colo, so I figure I should wait until they are moved, and see how the new machines do before I worry more about performance. -Anthony -- ------------------------------------------------------------------------ Anthony Molinaro <[hidden email]> |
|
FWIW - we've been using HAProxy in front of a cassandra cluster in production and haven't run into any problems yet. It sounds like our cluster is tiny in comparison to Anthony M's cluster. But I just wanted to mentioned that others out there are doing the same.
One thing in this thread that I thought was interesting is Ben's initial comment "the presence of the proxy precludes clients properly backing off from nodes returning errors." I think it would be very cool if someone implemented a mechanism for haproxy to detect the error nodes and then enable it to drop those nodes from the rotation. I'd be happy to help with this, as I know how it works with haproxy and standard web servers or other tcp servers. But, I'm not sure how to make it work with Cassandra, since, as Ben points out, it can return valid tcp responses (that say "error-condition") on the standard port.
Dave Viner
On Sun, Aug 29, 2010 at 4:48 PM, Anthony Molinaro <[hidden email]> wrote:
|
|
On Mon, Aug 30, 2010 at 12:40 PM, Dave Viner <[hidden email]> wrote:
> FWIW - we've been using HAProxy in front of a cassandra cluster in > production and haven't run into any problems yet. It sounds like our > cluster is tiny in comparison to Anthony M's cluster. But I just wanted to > mentioned that others out there are doing the same. > One thing in this thread that I thought was interesting is Ben's initial > comment "the presence of the proxy precludes clients properly backing off > from nodes returning errors." I think it would be very cool if someone > implemented a mechanism for haproxy to detect the error nodes and then > enable it to drop those nodes from the rotation. I'd be happy to help with > this, as I know how it works with haproxy and standard web servers or other > tcp servers. But, I'm not sure how to make it work with Cassandra, since, > as Ben points out, it can return valid tcp responses (that say > "error-condition") on the standard port. > Dave Viner > > On Sun, Aug 29, 2010 at 4:48 PM, Anthony Molinaro > <[hidden email]> wrote: >> >> On Sun, Aug 29, 2010 at 12:20:10PM -0700, Benjamin Black wrote: >> > On Sun, Aug 29, 2010 at 11:04 AM, Anthony Molinaro >> > <[hidden email]> wrote: >> > > >> > > >> > > I don't know it seems to tax our setup of 39 extra large ec2 nodes, >> > > its >> > > also closer to 24000 reqs/sec at peak since there are different tables >> > > (2 tables for each read and 2 for each write) >> > > >> > >> > Could you clarify what you mean here? On the face of it, this >> > performance seems really poor given the number and size of nodes. >> >> As you say I would expect to achieve much better performance given the >> node >> size, but if you go back and look through some of the issues we've seen >> over time, you'll find we've been hit with nodes being too small, having >> too few nodes to deal with request volume, having OOMs, having bad >> sstables, >> having the ring appear different to different nodes, and several other >> problems. >> >> Many of i/o problems presented themselves as MessageDeserializer pool >> backups >> (although we stopped having these since Jonathan was by and suggested row >> cache of about 1Gb, thanks Riptano!). We currently have mystery OOMs >> which are probably caused by GC storms during compactions (although >> usually >> the nodes restart and compact fine, so who knows). I also regularly watch >> nodes go away for 30 seconds or so (logs show node goes dead, then comes >> back to life a few seconds later). >> >> I've sort of given up worrying about these, as we are in the process of >> moving this cluster to our own machines in a colo, so I figure I should >> wait until they are moved, and see how the new machines do before I worry >> more about performance. >> >> -Anthony >> >> -- >> ------------------------------------------------------------------------ >> Anthony Molinaro <[hidden email]> > > Any proxy with a TCP health check should be able to determine if the Cassandra service is down hard. The problem for the tools that are not cassandra protocol aware are detecting slowness or other anomalies like TimedOut exceptions. If you are seeing GC storms during compactions you might have rows that are too big. When the compaction hits these memory spikes. I lowered the compaction priority (and added more nodes) which has helped compaction back off leaving some IO for requests. |
|
Hi Edward,
By "down hard", I assume you mean that the machine is no longer responding on the cassandra thrift port. That makes sense (and in fact is what I'm doing currently). But, it seems like the real improvement is something that would allow for a simple monitor that goes beyond the simple "machine not reachable" issue and covers more common scenarios that temporarily impact service time, but aren't so drastic as to cause machine outage.
Dave Viner On Mon, Aug 30, 2010 at 9:52 AM, Edward Capriolo <[hidden email]> wrote:
|
|
On Mon, Aug 30, 2010 at 1:02 PM, Dave Viner <[hidden email]> wrote:
> Hi Edward, > By "down hard", I assume you mean that the machine is no longer responding > on the cassandra thrift port. That makes sense (and in fact is what I'm > doing currently). But, it seems like the real improvement is something that > would allow for a simple monitor that goes beyond the simple "machine not > reachable" issue and covers more common scenarios that temporarily impact > service time, but aren't so drastic as to cause machine outage. > Dave Viner > > On Mon, Aug 30, 2010 at 9:52 AM, Edward Capriolo <[hidden email]> > wrote: >> >> On Mon, Aug 30, 2010 at 12:40 PM, Dave Viner <[hidden email]> wrote: >> > FWIW - we've been using HAProxy in front of a cassandra cluster in >> > production and haven't run into any problems yet. It sounds like our >> > cluster is tiny in comparison to Anthony M's cluster. But I just wanted >> > to >> > mentioned that others out there are doing the same. >> > One thing in this thread that I thought was interesting is Ben's initial >> > comment "the presence of the proxy precludes clients properly backing >> > off >> > from nodes returning errors." I think it would be very cool if someone >> > implemented a mechanism for haproxy to detect the error nodes and then >> > enable it to drop those nodes from the rotation. I'd be happy to help >> > with >> > this, as I know how it works with haproxy and standard web servers or >> > other >> > tcp servers. But, I'm not sure how to make it work with Cassandra, >> > since, >> > as Ben points out, it can return valid tcp responses (that say >> > "error-condition") on the standard port. >> > Dave Viner >> > >> > On Sun, Aug 29, 2010 at 4:48 PM, Anthony Molinaro >> > <[hidden email]> wrote: >> >> >> >> On Sun, Aug 29, 2010 at 12:20:10PM -0700, Benjamin Black wrote: >> >> > On Sun, Aug 29, 2010 at 11:04 AM, Anthony Molinaro >> >> > <[hidden email]> wrote: >> >> > > >> >> > > >> >> > > I don't know it seems to tax our setup of 39 extra large ec2 nodes, >> >> > > its >> >> > > also closer to 24000 reqs/sec at peak since there are different >> >> > > tables >> >> > > (2 tables for each read and 2 for each write) >> >> > > >> >> > >> >> > Could you clarify what you mean here? On the face of it, this >> >> > performance seems really poor given the number and size of nodes. >> >> >> >> As you say I would expect to achieve much better performance given the >> >> node >> >> size, but if you go back and look through some of the issues we've seen >> >> over time, you'll find we've been hit with nodes being too small, >> >> having >> >> too few nodes to deal with request volume, having OOMs, having bad >> >> sstables, >> >> having the ring appear different to different nodes, and several other >> >> problems. >> >> >> >> Many of i/o problems presented themselves as MessageDeserializer pool >> >> backups >> >> (although we stopped having these since Jonathan was by and suggested >> >> row >> >> cache of about 1Gb, thanks Riptano!). We currently have mystery OOMs >> >> which are probably caused by GC storms during compactions (although >> >> usually >> >> the nodes restart and compact fine, so who knows). I also regularly >> >> watch >> >> nodes go away for 30 seconds or so (logs show node goes dead, then >> >> comes >> >> back to life a few seconds later). >> >> >> >> I've sort of given up worrying about these, as we are in the process of >> >> moving this cluster to our own machines in a colo, so I figure I should >> >> wait until they are moved, and see how the new machines do before I >> >> worry >> >> more about performance. >> >> >> >> -Anthony >> >> >> >> -- >> >> >> >> ------------------------------------------------------------------------ >> >> Anthony Molinaro >> >> <[hidden email]> >> > >> > >> >> Any proxy with a TCP health check should be able to determine if the >> Cassandra service is down hard. The problem for the tools that are not >> cassandra protocol aware are detecting slowness or other anomalies >> like TimedOut exceptions. >> >> If you are seeing GC storms during compactions you might have rows >> that are too big. When the compaction hits these memory spikes. I >> lowered the compaction priority (and added more nodes) which has >> helped compaction back off leaving some IO for requests. > > Correct. I see two basic approaches for this. One is your proxy has to know how to communicate cassandra+thrift and be have some intelligence such as "I got an exception" or "Request took to long" and mark the node as failed. The other is to have something external making nodes as dead. This is something that eddie http://eddie.sourceforge.net/lbdns.html does. if (Bad node) { remove from dns }. In our deployment, we have added some extra intelligence to hector to dodge compacting nodes, etc. |
| Powered by Nabble | Edit this page |
