|
Hi all
just wanted to share a simple way we use to monitor cassandra internals with zabbix. We use a minimal http server which reads jmx and shows returns them in a property form. Thats read by zabbix every 30secs. That's started together with cassandra: Output looks something like: dd@caladan[~]$ curl http://b22:9090/jmxexport OperationMode=Normal Load=151.379 ReadOperations=506334 WriteOperations=865867 TotalReadLatencyMicros=6663882635 TotalWriteLatencyMicros=352292885 BytesCompacted=0 BytesTotalInProgress=0 PendingTasks=0 HeapUsed=1153810280 How / what are you monitoring? Best practices someone? Cheers, Daniel Doubleday, smeet.com, Berlin
|
|
On Fri, Dec 17, 2010 at 5:48 AM, Daniel Doubleday
<[hidden email]> wrote: > Hi all > just wanted to share a simple way we use to monitor cassandra internals with > zabbix. > We use a minimal http server which reads jmx and shows returns them in a > property form. Thats read by zabbix every 30secs. > That's started together with cassandra: > https://gist.github.com/744761 > Output looks something like: > dd@caladan[~]$ curl http://b22:9090/jmxexport > OperationMode=Normal > Load=151.379 > ReadOperations=506334 > WriteOperations=865867 > TotalReadLatencyMicros=6663882635 > TotalWriteLatencyMicros=352292885 > BytesCompacted=0 > BytesTotalInProgress=0 > PendingTasks=0 > HeapUsed=1153810280 > How / what are you monitoring? Best practices someone? > Cheers, > Daniel Doubleday, > smeet.com, Berlin Using cacti and - > http://www.jointhegrid.com/cassandra/cassandra-cacti-m6.jsp Many people are using munin good support there. Best Bractices: Monitor SSTable sizes and growth. Monitor Reads/Write sec Monitor Cache hit rate Monitor Compactions (what % of the day and average node is compacting) Monitor SSTable count (make sure you do not have to many) Monitor IO wait. (make sure you are not disk bound) Monitor JVM memory (make sure you have some overhead for bursts of traffic) |
|
Is anyone using cassandra with monit? All I have is this embarrassing bit of monit config:
check process cassandra with pidfile /var/run/cassandra.pid start program = "/etc/init.d/cassandra start" with timeout 60 seconds
stop program = "/etc/init.d/cassandra stop" if failed port 9160 type tcp with timeout 15 seconds then restart if 3 restarts within 5 cycles then timeout
group server I'm sure there's some good numbers available via JMX to alert on as well but I'm not sure best way to poll it from monit. Comments/contributions appreciated.
dan On Fri, Dec 17, 2010 at 11:03 AM, Edward Capriolo <[hidden email]> wrote:
|
|
In reply to this post by doubleday
> How / what are you monitoring? Best practices someone?
I recently set up monitoring using the cassandra-munin-plugins (https://github.com/jamesgolick/cassandra-munin-plugins). However, due to various little details that wasn't too fun to integrate properly with munin-node-configure and automated configuration management. A problem is also the starting of a JVM for each use of jmxquery, which can become a problem with many column families. I like your web server idea. Something persistent that can sit there and do the JMX acrobatics, and expose something more easily consumed for stuff like munin/zabbix/etc. It would be pretty nice to have that out of the box with Cassandra, though I expect that would be considered bloat. :) -- / Peter Schuller |
|
mx4j? https://issues.apache.org/jira/browse/CASSANDRA-1068
On Sun, Dec 19, 2010 at 8:36 AM, Peter Schuller <[hidden email]> wrote:
-- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com |
|
FYI, I just added an mx4j section to the bottom of this page http://wiki.apache.org/cassandra/Operations
On Sun, Dec 19, 2010 at 4:30 PM, Jonathan Ellis <[hidden email]> wrote: mx4j? https://issues.apache.org/jira/browse/CASSANDRA-1068 -- /Ran |
|
Another option is Evident ClearStone (http://www.evidentsoftware.com/products/clearstone-for-cassandra/).
It collects the Cassandra metrics via JMX as well. As long as one node in the cluster is configured, it'll find the rest of them. The UI is written in Adobe Flex. The Cassandra management pack comes with some pre-built visualizations for Cassandra. However, you can easily create adhoc visualizations to monitor any other metric. Users can set thresholds and alerts on JVM heap, GC, LiveSSTables, disk usage, cache hit rate, compactions, CPU utilization, and other metrics. In addition, some of the nodetool features have been incorporated into the UI for quick and simple access.
In our upcoming Q1 release, we're adding support for system level monitoring (via SAR) for the purposes of correlating application performance with system. Check it out on our website.
On Sun, Dec 19, 2010 at 10:01 AM, Ran Tavory <[hidden email]> wrote:
-- ----------------------------------------------------
CTO/Co-Founder
Office: 973-622-5656 ext. 288 ---------------------------------------------------- THIS TRANSMISSION CONTAINS CONFIDENTIAL AND/OR LEGALLY PRIVILEGED
INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUALS NAMED IN THIS MESSAGE.
IF YOU ARE NOT THE INTENDED RECIPIENT, YOU ARE HEREBY NOTIFIED THAT ANY
DISCLOSURE, COPYING, DISTRIBUTION OR THE TAKING OF ANY ACTION IN RELIANCE ON
THE CONTENTS OF THIS E-MAIL TRANSMISSION IS STRICTLY PROHIBITED. IF YOU HAVE
RECEIVED THIS TRANSMISSION IN ERROR, PLEASE NOTIFY US IMMEDIATELY SO THAT WE CAN
ARRANGE FOR THE RETURN OF THE DOCUMENTS TO US AT NO COST TO YOU.
|
|
In reply to this post by Ran Tavory
How does mx4j compare with the earlier jmx-to-rest bridge listed in the operations page:
"JMX-to-REST bridge available at http://code.google.com/p/polarrose-jmx-rest-bridge"
Thanks Dave Viner
On Sun, Dec 19, 2010 at 7:01 AM, Ran Tavory <[hidden email]> wrote:
|
|
In reply to this post by Peter Schuller
I'm currently working to configure AppDynamics to monitor cassandra. It
does byte-code instrumentation, so there is an agent added to the cassandra JVM, which gives the ability to capture latency for requests and see where the bottleneck is coming from. We have been using it on our other Java apps. They have a free version to try it out. It doesn't track thrift calls out of the box, but I'm encouraging AD to figure out a way to do that, and working on a config for capturing the entry points in the meantime. The way the page cache works is that pages stay in memory linked to a specific file. If you delete that file, the pages are all considered invalid at that point, so get zero'ed out and go to the start of the free list. So compaction creates a new file first (which is competing with existing read traffic to try and keep its pages in memory) then removes the old files that were being merged, so at that point there is a supply of blank pages, but disk reads will be needed to warm up the cache again. The use case that I'm working with is more like a persistent memcached replacement, so we are trying to have more RAM than data on m2.4xl EC2 instances (~70GB) and keep all reads in memory all the time. Adrian On 12/19/10 5:36 AM, "Peter Schuller" <[hidden email]> wrote: >> How / what are you monitoring? Best practices someone? > >I recently set up monitoring using the cassandra-munin-plugins >(https://github.com/jamesgolick/cassandra-munin-plugins). However, due >to various little details that wasn't too fun to integrate properly >with munin-node-configure and automated configuration management. A >problem is also the starting of a JVM for each use of jmxquery, which >can become a problem with many column families. > >I like your web server idea. Something persistent that can sit there >and do the JMX acrobatics, and expose something more easily consumed >for stuff like munin/zabbix/etc. It would be pretty nice to have that >out of the box with Cassandra, though I expect that would be >considered bloat. :) > >-- >/ Peter Schuller > |
|
In reply to this post by Dave Viner-2
Mx4j is in process, same jvm, you just need to throw mx4j-tools.jar in
the lib before you start Cassandra jmx-to-rest runs in a separate jvm. It also has a nice useful HTML interface that you can look into any running host. On Sunday, December 19, 2010, Dave Viner <[hidden email]> wrote: > How does mx4j compare with the earlier jmx-to-rest bridge listed in the operations page: > "JMX-to-REST bridge available at http://code.google.com/p/polarrose-jmx-rest-bridge" > > ThanksDave Viner > > > On Sun, Dec 19, 2010 at 7:01 AM, Ran Tavory <[hidden email]> wrote: > FYI, I just added an mx4j section to the bottom of this page http://wiki.apache.org/cassandra/Operations > > > On Sun, Dec 19, 2010 at 4:30 PM, Jonathan Ellis <[hidden email]> wrote: > mx4j? https://issues.apache.org/jira/browse/CASSANDRA-1068 > > > > > On Sun, Dec 19, 2010 at 8:36 AM, Peter Schuller <[hidden email]> wrote: >> How / what are you monitoring? Best practices someone? > > I recently set up monitoring using the cassandra-munin-plugins > (https://github.com/jamesgolick/cassandra-munin-plugins). However, due > to various little details that wasn't too fun to integrate properly > with munin-node-configure and automated configuration management. A > problem is also the starting of a JVM for each use of jmxquery, which > can become a problem with many column families. > > I like your web server idea. Something persistent that can sit there > and do the JMX acrobatics, and expose something more easily consumed > for stuff like munin/zabbix/etc. It would be pretty nice to have that > out of the box with Cassandra, though I expect that would be > considered bloat. :) > > -- > / Peter Schuller > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com > > > -- > /Ran > > > > -- /Ran |
|
On Sun, Dec 19, 2010 at 2:01 PM, Ran Tavory <[hidden email]> wrote:
> Mx4j is in process, same jvm, you just need to throw mx4j-tools.jar in > the lib before you start Cassandra jmx-to-rest runs in a separate jvm. > It also has a nice useful HTML interface that you can look into any > running host. > > On Sunday, December 19, 2010, Dave Viner <[hidden email]> wrote: >> How does mx4j compare with the earlier jmx-to-rest bridge listed in the operations page: >> "JMX-to-REST bridge available at http://code.google.com/p/polarrose-jmx-rest-bridge" >> >> ThanksDave Viner >> >> >> On Sun, Dec 19, 2010 at 7:01 AM, Ran Tavory <[hidden email]> wrote: >> FYI, I just added an mx4j section to the bottom of this page http://wiki.apache.org/cassandra/Operations >> >> >> On Sun, Dec 19, 2010 at 4:30 PM, Jonathan Ellis <[hidden email]> wrote: >> mx4j? https://issues.apache.org/jira/browse/CASSANDRA-1068 >> >> >> >> >> On Sun, Dec 19, 2010 at 8:36 AM, Peter Schuller <[hidden email]> wrote: >>> How / what are you monitoring? Best practices someone? >> >> I recently set up monitoring using the cassandra-munin-plugins >> (https://github.com/jamesgolick/cassandra-munin-plugins). However, due >> to various little details that wasn't too fun to integrate properly >> with munin-node-configure and automated configuration management. A >> problem is also the starting of a JVM for each use of jmxquery, which >> can become a problem with many column families. >> >> I like your web server idea. Something persistent that can sit there >> and do the JMX acrobatics, and expose something more easily consumed >> for stuff like munin/zabbix/etc. It would be pretty nice to have that >> out of the box with Cassandra, though I expect that would be >> considered bloat. :) >> >> -- >> / Peter Schuller >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com >> >> >> -- >> /Ran >> >> >> >> > > -- > /Ran > There is a lot of overhead on your monitoring station to kick up so many JMX connections. There can also be nat/hostname problems for remote JMX. My solution is to execute JMX over nagios remote plugin executor (NRPE). command[run_column_family_stores]=/usr/lib64/nagios/plugins/run_column_family_stores.sh $ARG1$ $ARG2$ $ARG3$ $ARG4$ $ARG5$ $ARG6$ Maybe not as fancy as a rest-jmx bridge, but solves most of the RMI issues involved in pulling stats over JMX, |
|
Can you share the code for run_column_family_stores.sh ?
On Sun, Dec 19, 2010 at 6:14 PM, Edward Capriolo <[hidden email]> wrote:
|
|
On Sun, Dec 19, 2010 at 10:37 PM, Dave Viner <[hidden email]> wrote:
> Can you share the code for run_column_family_stores.sh ? > > On Sun, Dec 19, 2010 at 6:14 PM, Edward Capriolo <[hidden email]> > wrote: >> >> On Sun, Dec 19, 2010 at 2:01 PM, Ran Tavory <[hidden email]> wrote: >> > Mx4j is in process, same jvm, you just need to throw mx4j-tools.jar in >> > the lib before you start Cassandra jmx-to-rest runs in a separate jvm. >> > It also has a nice useful HTML interface that you can look into any >> > running host. >> > >> > On Sunday, December 19, 2010, Dave Viner <[hidden email]> wrote: >> >> How does mx4j compare with the earlier jmx-to-rest bridge listed in the >> >> operations page: >> >> "JMX-to-REST bridge available >> >> at http://code.google.com/p/polarrose-jmx-rest-bridge" >> >> >> >> ThanksDave Viner >> >> >> >> >> >> On Sun, Dec 19, 2010 at 7:01 AM, Ran Tavory <[hidden email]> wrote: >> >> FYI, I just added an mx4j section to the bottom of this >> >> page http://wiki.apache.org/cassandra/Operations >> >> >> >> >> >> On Sun, Dec 19, 2010 at 4:30 PM, Jonathan Ellis <[hidden email]> >> >> wrote: >> >> mx4j? https://issues.apache.org/jira/browse/CASSANDRA-1068 >> >> >> >> >> >> >> >> >> >> On Sun, Dec 19, 2010 at 8:36 AM, Peter Schuller >> >> <[hidden email]> wrote: >> >>> How / what are you monitoring? Best practices someone? >> >> >> >> I recently set up monitoring using the cassandra-munin-plugins >> >> (https://github.com/jamesgolick/cassandra-munin-plugins). However, due >> >> to various little details that wasn't too fun to integrate properly >> >> with munin-node-configure and automated configuration management. A >> >> problem is also the starting of a JVM for each use of jmxquery, which >> >> can become a problem with many column families. >> >> >> >> I like your web server idea. Something persistent that can sit there >> >> and do the JMX acrobatics, and expose something more easily consumed >> >> for stuff like munin/zabbix/etc. It would be pretty nice to have that >> >> out of the box with Cassandra, though I expect that would be >> >> considered bloat. :) >> >> >> >> -- >> >> / Peter Schuller >> >> >> >> >> >> -- >> >> Jonathan Ellis >> >> Project Chair, Apache Cassandra >> >> co-founder of Riptano, the source for professional Cassandra support >> >> http://riptano.com >> >> >> >> >> >> -- >> >> /Ran >> >> >> >> >> >> >> >> >> > >> > -- >> > /Ran >> > >> >> There is a lot of overhead on your monitoring station to kick up so >> many JMX connections. There can also be nat/hostname problems for >> remote JMX. >> >> My solution is to execute JMX over nagios remote plugin executor (NRPE). >> >> command[run_column_family_stores]=/usr/lib64/nagios/plugins/run_column_family_stores.sh >> $ARG1$ $ARG2$ $ARG3$ $ARG4$ $ARG5$ $ARG6$ >> >> Maybe not as fancy as a rest-jmx bridge, but solves most of the RMI >> issues involved in pulling stats over JMX, > > That script is just a wrapper: For example we can have our NMS directly call the JMX fetch code like this: java -cp /usr/lib64/nagios/plugins/cassandra-cacti-m6.jar com.jointhegrid.m6.cassandra.CFStores service:jmx:rmi:///jndi/rmi://<host>:<port>/jmxrmi <user> <pass> org.apache.cassandra.db:columnfamily=<columnfamily>,keyspace=<keyspace>,type=ColumnFamilyStores But as mentioned this puts a lot of pressure on the monitoring node to open up all these JMX connections. With NRPE I can "farm" the requests out over NRPE. Nodes end up executing their checks locally. # cat /usr/lib64/nagios/plugins/run_column_family_stores.sh java -cp /usr/lib64/nagios/plugins/cassandra-cacti-m6.jar com.jointhegrid.m6.cassandra.CFStores service:jmx:rmi:///jndi/rmi://${1}:${2}/jmxrmi ${3} ${4} org.apache.cassandra.db:columnfamily=${5},keyspace=${6},type=ColumnFamilyStores All the code is up here: http://www.jointhegrid.com/cassandra/cassandra-cacti-m6.jsp http://www.jointhegrid.com/svn/cassandra-cacti-m6/trunk/src/com/jointhegrid/m6/cassandra/CFStores.java My main goal was to point out that you do not need REST bridges and embedded web servers to run JMX checks remotely. |
|
Anyone has Cassandra's cacti templates for server 0.7.4+?
On 20 December 2010 17:40, Edward Capriolo <[hidden email]> wrote: > On Sun, Dec 19, 2010 at 10:37 PM, Dave Viner <[hidden email]> wrote: >> Can you share the code for run_column_family_stores.sh ? >> >> On Sun, Dec 19, 2010 at 6:14 PM, Edward Capriolo <[hidden email]> >> wrote: >>> >>> On Sun, Dec 19, 2010 at 2:01 PM, Ran Tavory <[hidden email]> wrote: >>> > Mx4j is in process, same jvm, you just need to throw mx4j-tools.jar in >>> > the lib before you start Cassandra jmx-to-rest runs in a separate jvm. >>> > It also has a nice useful HTML interface that you can look into any >>> > running host. >>> > >>> > On Sunday, December 19, 2010, Dave Viner <[hidden email]> wrote: >>> >> How does mx4j compare with the earlier jmx-to-rest bridge listed in the >>> >> operations page: >>> >> "JMX-to-REST bridge available >>> >> at http://code.google.com/p/polarrose-jmx-rest-bridge" >>> >> >>> >> ThanksDave Viner >>> >> >>> >> >>> >> On Sun, Dec 19, 2010 at 7:01 AM, Ran Tavory <[hidden email]> wrote: >>> >> FYI, I just added an mx4j section to the bottom of this >>> >> page http://wiki.apache.org/cassandra/Operations >>> >> >>> >> >>> >> On Sun, Dec 19, 2010 at 4:30 PM, Jonathan Ellis <[hidden email]> >>> >> wrote: >>> >> mx4j? https://issues.apache.org/jira/browse/CASSANDRA-1068 >>> >> >>> >> >>> >> >>> >> >>> >> On Sun, Dec 19, 2010 at 8:36 AM, Peter Schuller >>> >> <[hidden email]> wrote: >>> >>> How / what are you monitoring? Best practices someone? >>> >> >>> >> I recently set up monitoring using the cassandra-munin-plugins >>> >> (https://github.com/jamesgolick/cassandra-munin-plugins). However, due >>> >> to various little details that wasn't too fun to integrate properly >>> >> with munin-node-configure and automated configuration management. A >>> >> problem is also the starting of a JVM for each use of jmxquery, which >>> >> can become a problem with many column families. >>> >> >>> >> I like your web server idea. Something persistent that can sit there >>> >> and do the JMX acrobatics, and expose something more easily consumed >>> >> for stuff like munin/zabbix/etc. It would be pretty nice to have that >>> >> out of the box with Cassandra, though I expect that would be >>> >> considered bloat. :) >>> >> >>> >> -- >>> >> / Peter Schuller >>> >> >>> >> >>> >> -- >>> >> Jonathan Ellis >>> >> Project Chair, Apache Cassandra >>> >> co-founder of Riptano, the source for professional Cassandra support >>> >> http://riptano.com >>> >> >>> >> >>> >> -- >>> >> /Ran >>> >> >>> >> >>> >> >>> >> >>> > >>> > -- >>> > /Ran >>> > >>> >>> There is a lot of overhead on your monitoring station to kick up so >>> many JMX connections. There can also be nat/hostname problems for >>> remote JMX. >>> >>> My solution is to execute JMX over nagios remote plugin executor (NRPE). >>> >>> command[run_column_family_stores]=/usr/lib64/nagios/plugins/run_column_family_stores.sh >>> $ARG1$ $ARG2$ $ARG3$ $ARG4$ $ARG5$ $ARG6$ >>> >>> Maybe not as fancy as a rest-jmx bridge, but solves most of the RMI >>> issues involved in pulling stats over JMX, >> >> > > That script is just a wrapper: > > For example we can have our NMS directly call the JMX fetch code like this: > java -cp /usr/lib64/nagios/plugins/cassandra-cacti-m6.jar > com.jointhegrid.m6.cassandra.CFStores > service:jmx:rmi:///jndi/rmi://<host>:<port>/jmxrmi <user> <pass> > org.apache.cassandra.db:columnfamily=<columnfamily>,keyspace=<keyspace>,type=ColumnFamilyStores > > But as mentioned this puts a lot of pressure on the monitoring node to > open up all these JMX connections. With NRPE I can "farm" the requests > out over NRPE. Nodes end up executing their checks locally. > > # cat /usr/lib64/nagios/plugins/run_column_family_stores.sh > java -cp /usr/lib64/nagios/plugins/cassandra-cacti-m6.jar > com.jointhegrid.m6.cassandra.CFStores > service:jmx:rmi:///jndi/rmi://${1}:${2}/jmxrmi ${3} ${4} > org.apache.cassandra.db:columnfamily=${5},keyspace=${6},type=ColumnFamilyStores > > All the code is up here: > http://www.jointhegrid.com/cassandra/cassandra-cacti-m6.jsp > http://www.jointhegrid.com/svn/cassandra-cacti-m6/trunk/src/com/jointhegrid/m6/cassandra/CFStores.java > > My main goal was to point out that you do not need REST bridges and > embedded web servers to run JMX checks remotely. > -- Albert Vila Puig <[hidden email]> iMente.com <http://www.imente.com> |
|
On Thu, Jul 14, 2011 at 8:58 AM, Albert Vila <[hidden email]> wrote: Anyone has Cassandra's cacti templates for server 0.7.4+? http://www.jointhegrid.com/cassandra/cassandra-cacti-m6.jsp There is some preliminary support for 0.7.X but I have not ported over all the graphs yet. Look over the next couple of days. Edward |
|
This post has NOT been accepted by the mailing list yet.
We use graphite for monitoring cassandra via JMX (we use JMXTrans for collecting the metrics: https://github.com/jmxtrans/jmxtrans)
Here's how we do it: http://techo-ecco.com/blog/monitoring-apache-hadoop-cassandra-and-zookeeper-using-graphite-and-jmxtrans/ |
| Powered by Nabble | Edit this page |
