Varnish'ing over Drupal
I've recently been spending more time getting Drupal sites served to survive the storm. We've had a site running the multiple apache mirrors trick using Boost and now a few sites using Varnish as a specialised reverse proxy.
There's some great work going on for Drupal 7 to make it more cache friendly. Looking at not actually starting sessions till they are needed, and adding appropriate headers for example. There are several issue to take into account here. For example Drupal is going to serve some content in different languages with the same URL, even if in the future it may only be the front page. It also will serve content to authorised, logged in users, as well as anonymous users on the same URL. Even if we cache cleverly we need to make sure that any other caches on the way do so too, or don't cache.
Varnish is quite nice for this as you can write pretty complicated rules for caching very simply in its vcl syntax. What I've here is just a start for what could be done, especially when Drupal is more cache aware. For now I'm just correcting some of the things that in the future will be done by core.
'Static' files
Drupal sends lots of files that can be cached and they could have headers so that they aren't rerequested - or if they are they can recieve a 304 not modified response rather than the file. Which these are will depend on your site set up, with different content changing for logged in users, or if you have messages displayed to who aren't logged in. Anything in your /sites directory (css, javascript), and depending where you put them, uploaded files are really promising candidates for general caching. In varnish this can be as simple as:-
sub vcl_recv {
if (req.request == "GET" && req.url ~ "^/sites/") {
/* we only ever want to deal with GET requests, we are working
/* on the assumption that everything in sites is served the same
/* to all users so we don't want the cookie */
unset req.http.cookie;
lookup;
}
}
sub vcl_fetch {
if (req.request == "GET" && req.url ~ "^/sites/") {
/* we can unset the Cookie Drupal adds, set a lifetime for the object
/* and make it cacheable */
unset obj.http.Set-Cookie;
set obj.cacheable = true;
# we can set how long Varnish will keep the object here, or later
# set obj.ttl = 30m;
# debug add this and you'll see it in the headers if we came here
# set obj.http.X-Drupal-Varnish-Debug = "1";
}
if (obj.cacheable) {
/* Things common to all cacheable objects, here it removes
/* the Expires that are often in the past, sets cache control
/* and how long varnish will keep it
/* and mark it for delivery (and storing) */
unset obj.http.expires;
set obj.http.cache-control = "max-age = 900";
set obj.ttl = 1w;
set obj.http.magicmarker = "1";
deliver;
}
}
sub vcl_deliver {
if (resp.http.magicmarker) {
/* unset marker and serve it for upstream as new */
unset resp.http.magicmarker;
set resp.http.age = "0";
}
}
As I put my files in /files, and as files links that want storing often have a languge code before them I use another if block elsif (req.request == "GET") && req.url (^/[a-zA-Z]{2})?/files/")
, and can use this to set alternative cache times on them. This could also help with the load created by the private file method, using the trick below for knowing which files can be seen by anonymous users and allowing them to have modified cacheable headers, thus reducing the number of times php has to serve files that can be seen to anonymous users.
Caching anonymous page views
In addition to the language and automatic session cookie issues mentioned above our cache doesn't know if users are logged in. We want to cache the anonymous page views (at least those that won't have drupal_set_messages, or other individual changing content, on them). Boost does this neatly by setting it's own cookie when users log in, and unsetting it when logged out. I've used this along side Boost checking for the req.http.Cookie !~ "DRUPAL_UID"
and also pinched the code and made a very simple Varinish helper module.
So adding some caching for these pages.
sub vcl_recv {
...
elsif (req.request == "GET" && req.http.Cookie !~ "DRUPAL_VARNISH") {
/* this site has drupal_set_messages and importantly changing content
/* for anon users only on /user page */
/* It was tempting to unset.http.cookie; here but it's needed to
/* stop users who log out getting the last page they saw logged in */
lookup;
}
}
sub vcl_fetch {
...
elsif (req.request == "GET" && req.http.cookie ~! "DRUPAL_VARNISH") {
if (req.url !~ "(/[a-zA-Z]{2})?/user" && req.url !~ "(/[a-zA-Z]{2})?/admin") {
/* We don't want the ttl so long on these pages, so we must set
/* it in the different if blocks rather than cacheable here */
set obj.ttl = 30m;
unset obj.http.Set-Cookie;
if (req.url !~ "^[a-zA-Z]{2}/") {
/* make sure that language is taken into account on caching pages
/* without a langage code in the url, and make sure that caches
/* know if there is a cookie with the page it's not to use the
/* cached one */
set obj.http.Vary = "Accept-Language, Cookie";
}
else {
set obj.http.Vary = "Cookie";
}
}
}
}
Older versions of Varnish
Varnish comes out of EPEL for RHEL/CentOS, and is pretty up todate. I've done this also with a Debian stable box and as the version of Varnish is older the supported syntax for vcl is a bit more limited. You can't change the obj.cacheable boolean for example so I used an obj.http.value that I then unset. The comparitive (a !~ b) was causing errors, when (! a ~ b) didn't. The command deliver
is called insert
, and unset
is remove
.