Achieving Drupal Log Bliss with Splunk
From the smallest personal blogs to the largest enterprise websites, the ability to understand current, recent, and historical states of the site's health and its users' activities and experiences is paramount to ensuring stability and end-user satisfaction. No matter how complete the test cases or thorough the QA, bugs will crop up and performance and features will be impacted. Site administrators and developers must have a window into the production website in order to diagnose and debug issues that come up.
To this end, Drupal core comes packaged with two useful modules logging across two dimensions: enabled by default is Database logging (also known as dblog or watchdog), which records system events to the database via the watchdog API; and Statistics which, in addition to keeping track of content popularity, can record every request to Drupal in its "access log."
Both modules provide some level of reporting, providing site administrators with basic reports that can answer simple questions relating to top "page not founds," and the most active pages and users, and can list recent events like PHP errors, user registrations, new comments, etc.
For extremely small websites, these modules are adequate in providing the information necessary to make basic decisions, e.g. "I'm getting a lot of spam on my blog, what's the IP address of the user so I can block him?" However, there are fundamental limitations that prevent more in-depth analysis, discovery, and informed decision-making.
This article will walk through the problems presented by logging in Drupal and highlight a Splunk-based path to enlightenment.
The Weaknesses in Drupal Statistics and DB Logging
-
Drupal log data discovery is painful.
Suppose you get an e-mail from a customer/end-user with an error report with little context or details. There's a decent chance this error was also logged somewhere in Drupal, but finding the more detailed error message (perhaps a file or line number in code, or even just a module name) can be extremely difficult. There are no facilities for searching the logs, only page-delimited lists that are constantly shifting due to the constant stream of new data.
-
Drupal log reporting is handicapped.
Watchdog records contain a wealth of information beyond just type and severity, but these are the only dimensions by which you can filter them. The reports provided by access log aren't even filterable. Building out custom reports requires a separate module in the case of watchdog and a decent understanding of Views for both watchdog and the access log. Aggregate reporting requires a much deeper understanding of Views; understanding either of the logs visually by building out chart-based dashboards may as well be impossible.
-
Drupal log data retention is non-existent.
Because logs are written to the database, there's no concept of log rotation or archiving. Both the access log and watchdog offer only two options: retain all data forever (which will quickly cause performance and scalability issues in the database), or automatically delete the oldest log entries once a configurable number of records is reached. Depending on traffic to the site, this could mean only a day or two's worth of data with no historical data at all.
-
Drupal logging does not scale.
Drupal can relatively easily scale horizontally across multiple PHP servers to handle increased throughput, but as this is done, the database can quickly become a bottleneck. For heavily trafficked Drupal sites, it's not practical or sometimes even possible to handle one or more database inserts for every request. For smaller sites using shared database servers, even though scalability is not an immediate concern, latency between the web server and/or on the DB server can be a performance killer.
Log Performance & Scalability via Syslog
Enter Syslog, a standard for computer data logging that separates log storage and management from the application logging the message and the application reporting on the logs. It's been around for nearly three decades and is supported by virtually any OS you'd use to run a web server. Modern systems are likely to come pre-installed with a superset of syslog, like rsyslog or syslog-ng. Your server is almost certainly logging valuable data via syslog already.
It also happens to be very easy to configure Drupal to log to syslog. Drupal core comes packaged with the Syslog module, which routes watchdog system event messages to syslog. In the contributed module space, you'll also find the Syslog Access module, which routes access statistics to syslog.
Once enabled, both modules are easily configured on the config pages for the modules they replace (admin/config/development/logging
for watchdog and admin/config/system/statistics
for the access log). Both provide essentially the same settings:
- Facility lets syslog know what type of program is logging the message; based on the facility, syslog can route and store log data differently.
- Identity is a string that is prepended to each log message; it can be used to differentiate between an access log record and a watchdog record, but can also be useful to differentiate between different Drupal installations running on a single server.
- Format is the format the log message will take. It consists of a number of tokens that are replaced by values available when the log message is generated.
After enabling these modules, even without making any configuration changes, log data should be flowing successfully to syslog. If you have shell access to your web server (and proper permissions), you'll likely be able to see the data coming in live using the following command:
tail -f /var/log/messages
(Here, we're using the defaults for the location/name of the log file and rotation/backlog rate, but altering them is relatively easy, if slightly beyond the scope of this article.)
Once you see the data coming through, you can disable the dblog module and uncheck the "enable database logging" checkbox on the statistics configuration page to shut off all DB-based logging.
If you're following along at home, you'll note that this takes care of items 3 and 4 from the gripe list above, but if anything, log data discovery and reporting is even less accessible in this format. In many situations, you may not have the proper access to or permissions on the production server for active debugging; even if you did, you would have to be extremely familiar and comfortable working with unix tools like sed, awk, grep, etc. to find the data you were looking for.
Log Data Discovery and Reporting via Splunk
Enter Splunk: software that can index, monitor, search, and analyze machine-generated data, including syslog messages. It's currently available in two flavors: a standalone application to be installed on your own server, licensed based on daily throughput (free up to 500MB a day), or the SaaS version known as Splunk Storm, which is licensed based on storage (free up to 1GB). For those working in larger enterprises, it's possible Splunk is already in use; for small-to-medium-sized Drupal installations, it's very unlikely that you'll exceed the free-tier storage limitations of the SaaS offering.
Configuring your server to send data to Splunk is relatively straightforward. At the end of the configuration file for your server's syslog implementation (potentially at /etc/syslog.conf
or /etc/rsyslog.conf
depending on your system), add something like the following:
local0.* @@url.to.splunk:12345
Where local0 is the facility you configured in Drupal for the Syslog and Syslog Access modules (or * to send all syslog data), url.to.splunk is the host or IP of your Splunk or Splunk Storm instance and 12345 is the port number on which Splunk is listening for data. After making the change, you'll need to restart the syslog service with something like service syslog restart
, depending on your system. You'll also need to whitelist your web server within Splunk or your project on Splunk Storm so that it accepts the data being passed to it.
Once the plumbing is hooked up, you should see log messages being piped into Splunk. Below are a few examples of searches to get things flowing.
Splunk has a deceptively simple-looking search query language that allows us to filter down to just the data we need over just the time period we need (including real-time). For instance, we can perform a basic search using the default identity provided by the Syslog Access module to filter down to only access log data:
Not only is Splunk very good at finding patterns in your log data and breaking them out into fields that you can use to filter your data (you can even teach it with regex), you can also force Splunk to use your own field names by altering the "format" setting used by the Syslog and Syslog Access modules to include value assignments. For example:
base_url=!base_url timestamp=!timestamp type=!type hostname=!ip request_uri=!request_uri referer=!referer uid=!uid link=!link message=!message
With this we can, for instance, filter down to only watchdog data for a specific URL:
Using Splunk's query language, we can even generate charts and graphs. Here's a look at minimum, average, and maximum memory usage over time. You might use this to adjust PHP's memory limit, or assess increases or decreases in memory usage as you deploy or disable modules.
Here's a look at watchdog events by type over time, which you might use to ensure that a recent deployment didn't cause any major issues, or to diagnose ongoing issues related to a specific module.
In the works for Splunk Storm (and already available on Splunk) is an alert feature, which can perform custom alert tasks (like send e-mails or run scripts) when, for example, a certain search count exceeds a specified threshold over a given period of time.
In addition to time charts, Splunk supports a number of other chart types like bar charts and pies, as well as simple tables. You can even combine these searches and visualizations into exportable, shareable dashboards. Here's one to replace Drupal Core's Access Statistics reporting [view the XML definition to import into your own Splunk or Splunk Storm instance]:
The Bliss is in the Logs
To recap, we've shown that scalability and data retention issues associated with stock Drupal logging can be solved by re-routing log messages to syslog and disabling database logging. We've additionally demonstrated Splunk as a more than capable replacement for the reporting and data discovery provided by Drupal Core.
And we've only just scratched the surface of what's possible. Imagine setting up custom alerts on fatal PHP errors that open tickets in your task management system, a dashboard of popular search phrases, or a real-time look at application performance. The data's there, you just need the tools to explore it.
Further Reading: