Drupal Configuration Gotcha: Dynamic Base URLs
Recently at the ol' day job, we got word through our website bugs alias that there was a duplicate version of the website located at a different, garbage URL. Fairly straightforward: someone was mirroring our website via something like a CNAME record and we needed to redirect, at the server level, all traffic from that URL to the correct domain name. More interesting, though, was that at the same time, all of the links embedded within our RSS feed were pointed to the same junk URL.
After some quick research, we determined that something like the following happened. Some relevant background: This RSS feed was handled by a View set to stay cached for 30 minutes. The feed file also lives in the page cache, which we have set to expire after a number of hours. We also did not have $base_url defined in our production settings.php file.
Step-by-Step Account
- At some point, the RSS feed was accessed from the mirror (for illustrative purposes, we'll say http://mirror.example.com/rss.xml). Drupal tried to find that page in the page cache and failed because Drupal's page cache uses the base URL (http://mirror.example.com) as part of the unique cache ID. So, rather than pulling the cached version of the feed, it went on to generate the feed as it normally would.
- The next step in the process was to run the View. Before generating the View, it attempted to pull the results from Views' cache. This time it succeeds, but doesn't use it because the cache is older than the 30 minute expiration time we have set. (Note that unlike page cache, Views' caching mechanism does not take base URL into consideration.)
- Because the Views cache is expired, it freshly generates and runs the Views query and renders the XML output, pointing all links to mirror.example.com and caching its output in the process. After the full XML is rendered, Drupal then stashes the output into its own page cache.
- Note that at this point, the real RSS feed (at, for example, http://realsite.example.com/rss.xml) is still fine and pointing to the correct domain because a cached version of it still exists in Drupal's page cache.
- At some point within thirty minutes of step 2 happening, the page cache entry for the real RSS feed expires. Because of that, on the next request to it, Drupal attempts to generate the page dynamically. When it reaches the Views layer, Views comes back with the mirror's cached version because it's still valid (and again, Views CIDs don't consider base URL).
- Now, our real RSS feed is pointing to a different domain that we have no control over, and the XML is cached and won't expire for several hours.
Mitigating Factors, Cautions
Luckily, because the website was just a mirror, the only ill effects were that a few users were on the wrong URL (though they saw the same content), and we were able to resolve the issue within a small amount of time. However, that's not to say that someone using this technique for nefarious purposes couldn't point those URLs somewhere else.
Configuration Recommendations
Afterward, I sent in a note to Drupal's security team. Greg Knaddison replied shortly thereafter, determining that it was effectively a configuration issue, not necessarily an actionable security issue. He wrote a fantastic summary about Drupal's dynamic base URL system and how it can cause strange issues like the above, highlighting a number of other possible side effects (including notification and other e-mails being sent from the wrong domain). Ultimately, our solution was simply to set $base_url in our production settings.php file, though Greg offers several other suggestions when that solution isn't desirable.