Monitoring Drupal sites with Munin
One of the applications I've been working with recently is the Munin monitoring tool. Its homepage describes it simply:
Munin is a networked resource monitoring tool that can help analyze resource trends and "what just happened to kill our performance?" problems. It is designed to be very plug and play. A default installation provides a lot of graphs with almost no work.
Getting Munin set up on an Ubuntu server is very easy. (One caveat: a lot of new plugins require the latest version of Munin, which is only available in Ubuntu 10.) Munin works on a "master" and "node" structure, the basic idea being:
- On a cron, the master asks all its nodes for all their stats (usually via port 4949, so configure your firewall accordingly).
- Each node server asks all its plugins for their stats.
- Each plugin dumps out brief key:value pairs.
- The master collects all the data and compiles graphes as images on static HTML pages.
Its simplicity is admirable: Each plugin is its own script, written in any executable language. There are common environment variables and output syntax, but otherwise writing or modifying a plugin is very easy. The plugin directory is called Munin Exchange. (The latest version of each plugin isn't necessarily on there, though: in some cases searching for the plugin name brought up newer versions on Github.)
I set up Munin for two reasons: 1) get notifications of problems, 2) see historical graphs to spot trends and bottlenecks. I have Munin running on a dedicated monitoring server (also running Jenkins), since notifications coming from the web server wouldn't be much use if the web server went down. It's currently monitoring three nodes (including itself), giving me stats on memory (total and for specific processes), CPU, network traffic, apache, mysql, S3 buckets, memcached, varnish, and mongodb. Within a few days of it running, a memory leak on one server became apparent, and the "MySql slow query" spikes that coincide with cron (doing a bunch of stats/aggregation) are illuminating.
None of this is Drupal specific, but graphing patterns in Drupal simply requires a plugin, and McGo has fortunately given us a Munin module that provides just that. (The package includes two modules: Munin API to define stats and queries, and Munin Defaults with some basic node and user queries.) I asked for maintainer access and modified it a little - the 6.x-2.x branch now uses Drush for database queries rather than storing the credentials in the scripts, for example. The module generates the script code which you copy to files in your plugins directory.
Conclusions so far: getting Munin to show you graphs on all the major stats of a server takes a few hours (coming at it as a total beginner). Setting up useful notifications is more complicated, though, and will probably have to evolve over time through trial and error. For simple notifications on servers going down, for example, it's easier to set up a simple cron script (on another server) with curl
and mail
, or use the free version of CloudKick. Munin's notifications are more suited to spotting spikes and edge cases.