Purge: Clearing Drupal pages from Proxy Caches
For some time I've been working on a little module I named Purge. Here's a little background on that effort and what's it been like to publish and improve it out in the public on drupal.org.
It's my first published drupal code contribution, it's my first serious programming project in the many years since I'd decided to drop out of a programmer eduction and career in the second half of the '90. Since then I've mastered many skills in system administrating and network architecture. Though a flawed web 2.0 startup project around 2007 I've run into Drupal and I've been using and evangelizing it's use ever since. In my current occupation as system operator for the web hosting department of a big IT firm I've been working with Varnish, an open source and extremely fast reverse proxy cache software package. In an in-company training by Linpro (who have now spun off their Varnish support business as Varnish Software) I learned about "Purging", a command to clear an object from the cache so it will get refreshed on the next request. Varnish allows purges to be issued via the administration interface and through http requests.
This all led me to test the Varnish Drupal contrib module. It integrates with Varnish on the administration socket and allows for purging. In fact it will purge the complete cache for a domain on any content update. It however also allows for more precieze purges of individual URL's when configured to work with the Expire module. I liked the functionality of the module but foresaw some major issues when imagining this for allowing Drupal sites we host at my job to purge from our shared Varnish platform. The use of the administration interface would imply long and difficult discussion with our security department about opening up network ports for it. Having to add to this that the interface has very weak security and no privilege separation between our clients this would probably be a no go.
So I started to hack the option for HTTP purging into the Varnish module. I managed to get it working but during development I found out this purge method can be implemented in Squid and Nginx as well. This and the lack of interest by the Varnish module maintainers made me decide to branch off into a separate module. I dropped the functionality to purge a complete domain and went for exclusive interfacing with the Expire module that provides nice hooks to plug into. I also decided to do all http requests using curl and curl_multi objects to fire requests simultaneously whenever possible to minimize performance impact.
I first had to apply for a CVS account. In mid 2010 Drupal was still using this legacy versioning system and getting access to publish a module had to be gained by a thorough vetting process that took me a few months and a few iterations of code improvements to pass. The process itself required a lot of patience and some lobbying inside my Drupal network but in the end helped me to improve my code style and the module itself.
When finally publishing the code on drupal.org the benefits of open source continued. A few people started using it and some reported back bug reports with fixes. I also received feature requests (also with patches) that helped me broaden the potential use of the module. It now supports Varnish, Squid, Nginx and has support to add headers for the Acquia Cloud hosting services.
When discussing about adding drush and rules integration with Mike Carper, the author of Expire he handed me commit rights on the expire project to avoid code duplication, thus I've written the drush integration for expire. I've also taken it upon myself to restart a drupal handbook section on Reverse Proxy Caches and Varnish in particular so I can refer to it in the modules documentation.
The unwritten road map in my head for Purge includes a port to 7.x, removing the hard dependency on curl and adding some user friendliness to the configuration form. Next to that I'm researching a new web standard draft called LCI (Linked Cache Invalidation) that might be a long term solution to this problem. Will probably end up as a separate project.
Tags: