Managing load balanced production environments
In December we launched a site onto a multi server load balanced
production environment. It's certainly the first time I've had to deal with a
site that has two web servers and as such, the multi server production
environment presented a couple of challenges which I cover in this blog.
Challenge #1: ip_address() was returning the internal address of the
server, not the actual user's IP address
This caused us real problems with the user login process, since all failed
login attempts were being stored with the same IP address which in turn quickly
locked everyone out.
Solution #1:
The fix was simple enough on our environment: add a code block to
your settings.php which manually injects the correct
address from the HTTP headers:
// Code provided by Acquia.
// Non balanced servers (dev and stage) don't have this problem.
if (!empty($conf['reverse_proxy_addresses'])) {
$ips = explode(',', $_SERVER['HTTP_X_FORWARDED_FOR']);
$ips = array_map('trim', $ips);
// Add REMOTE_ADDR to the X-Forwarded-For list (the ip_address function will
// also do this) in case it's a 10. internal AWS address; if it is we should
// add it to the list of reverse proxy addresses so that ip_address will
// ignore it.
$ips[] = $_SERVER['REMOTE_ADDR'];
// Work backwards through the list of IPs, adding 10. addresses to the proxy
// list but stop at the first non-10. address we find.
$ips = array_reverse($ips);
foreach ($ips as $ip) {
if (strpos($ip, '10.') === 0) {
if (!in_array($ip, $conf['reverse_proxy_addresses'])) {
$conf['reverse_proxy_addresses'][] = $ip;
}
}
else {
// we hit the first non-10. address, so stop.
break;
}
}
}
Challenge #2: The user login form was considered cacheable by Drupal (and
therefore Varnish)
Coupled with the above issue, we were getting Varnish cache hits when
filling out the login form. This meant that all users were sharing a
form_build_id (and therefore the same form cache was being shared for everyone).
The upshot was that as soon as anybody entered a valid user name it would
be stored for everybody else attempting to login. That in turn meant that flood
attempts were all registered against a single account and it would quickly get
locked out.
Solution #2:
We don't actually have an answer as to the cause of this yet. It could be
something specific to our site or it could be related to other problems with
the environment, but the login page is obviously important, so we've added a
manual call to drupal_page_is_cacheable<span style="color: #009900;">(</span><span style="color: #009900; font-weight: bold;">FALSE</span><span style="color: #009900;">)</span>
which fixes it.
Challenge #3: Views data export, batch processes and temp files
The batch API splits a large job up into smaller jobs, of which each one is
processed in a separate HTTP request, held together by an AJAX based page. We
are using the views_data_export
module to build a relatively large CSV and so enabled the batch mode. What we
hadn't considered is that each server has it's
own temporary directory, so because the requests are load balanced between the two
servers, the CSV was being split roughly into two half's.
Solution #3:
If you're lucky enough to be using the Acquia platform, there is a module for this:
https://drupal.org/project/acquia_cloud_sticky_sessions.
Conclusion
Each of these challenges only presented themselves to us once we had made it to the
production environment, and once we had a realistic amount of user traffic. This made them
all the harder to detect and solve. We're lucky enough to have the support of Acquia in
solving the issues quickly but if we were building out this kind of environment ourselves
things could have been very different.
Read moreManaging load balanced production environmentsBy Dan James | 15th January 2014