Performant bulk-redirection with Apache RewriteMap
When continuing development of a web site, big changes occur every so often. One such change that may occur, frequently as a result of another change, is a bulk update of URLs. When this is necessary, you can greatly improve the response time experienced by your users—as they are redirected from the old path to the new path—by using a handy directive offered in Apache's mod_rewrite called RewriteMap.
At Agaric we regularly turn to Drupal for it's power and flexibility, so one might question why we didn't leverage Drupal's support for handling redirects. When we see an opportunity for our software/system to respond "as early as it can", it is worth investigating how that is handled. Apache handles redirects itself, making it entirely unnecessary to hand-off to PHP, never mind Drupal bootstrapping and retrieving up a redirect record in a database, just to tell a browser (or a search engine) to look somewhere else.
There were two conditions that existed making use of RewriteMap a great candidate. For one, there will be no changes to the list of redirects once they are set: these are for historical purposes only (the old URLs are no longer exposed anywhere else on the site). Also, because we could make the full set of hundreds of redirects via a single RewriteRule—thanks to the substitution capability afforded by RewriteMap—this solution offered a fitting and concise solution.
So, what did we do, and how did we do it?
We started with an existing set of URLs that followed the pattern: http://example.com/user-info/ID[/tab-name]
. Subsequently we implemented a module on the site that produced aliases for our user page URLs. The new patten to the page was then (given exceptions for multiple J Smiths, etc via the suffix): http://example.com/user-info/firstname-lastname[-suffix#][/tab-name]
. The mapping of ID to firstname-lastname[-suffix#] was readily available within Drupal, so we used an update_hook to write out the existing mappings to a file (in the Drupal public files folder, since we know that's writable by Drupal) . This file (which I called 'staffmapping.txt') is what we used for a simple text-based rewrite map. Sample output of the update hook looked like this:
# User ID to Name mapping:
1 admin-admin
2 john-smith
3 john-smith-2
4 jane-smith
The format of this file is pretty straight-forward: comments can be started on any line with a #, and the mapping lines themselves are composed of {lookupValue}{whitespace}{replacementValue}
.
To actually consume this mapping somewhere in our rules, we must let Apache know about the mapping file itself. This is done with a RewriteMap directive, which can be placed in the Server config or else inside a VirtualHost directive. The format of the RewriteMap looks like this: RewriteMap MapName MapType:MapSource. In our case, the file is a simple text file mapping, so the MapType is 'txt'. The resulting string added to our VirtualHost section is then: RewriteMap staffremap txt:/path/to/staffmapping.txt
This directive makes this rewrite mapping file available under the name "staffremap" in our RewriteRules. There are other MapTypes, including ones that uses random selection for the replacement values from a text file, using a hash map rather than a text file, using an internal function, or even using an external program or script to generate replacement values.
Now it's time to actually change incoming URLs using this mapping file, providing the 301 redirect we need. The rewrite rule we used, looks like this:
RewriteRule ^user-detail/([0-9]+)(.*) /user-detail/${staffremap:$1}$2 [R=301,L]
The initial argument to the rewrite rule identifies what incoming URLs this rule applies to. This is the string: "^user-detail/([0-9]+)(.*)". This particular rule looks for URLs starting with (signified by the special character ^) the string "user-detail/", then followed by one or more numbers: ([0-9]+), and finally, anything else that might appear at the end of the string: "(.*)". There's a particular feature of regex being used here as well: each of search terms in parenthesis are captured (or tagged) by the regex processor which then provides some references that can be used in the replacement string portion. These are available with $<captured position>
—so, the first value captured by parenthesis is available in "$1"—this would be the user ID, and the second in "$2"—which for this expression would be anything else appearing after the user ID.
Following the whitespace is our new target URL expression: "/user-detail/${staffremap:$1}$2". We're keeping the beginning of the URL the same, and then following the expression syntax "${rewritemap:lookupvalue}", which in our case is: "${staffremap:$1}" we find the new user-name URL. This section could be read as: take the value from the rewrite map called "staffremap", where the lookup value is $1 (the first tagged expression in the search: the numeric value) and return the substitution value from that map in place of this expression. So, if we were attempting to visit the old URL /user-detail/1/about, the staffremap provides the value "admin-admin" from our table. The final portion of the replacement URL (which is just $2) copies everything else that was passed on the URL through to the redirected URL. So, for example, /user-detail/1/about includes the /about portion of the URL in the ultimate redirect URL: /user-detail/admin-admin/about
The final section of the sample RewriteRule is for applying additional flags. In this case, we are specifying the response status of 301, and the L indicates to mod_rewrite that this is the last rule it should process.
That's basically it! We've gone from an old URL pattern, to a new one with a redirect mapping file, and only two directives. For an added performance perk, especially if your list of lookup and replacement values is rather lengthy, you can easily change your text table file (type txt) with a HashMap (type dbm) that Apache's mod_rewrite also understands using a quick command and directive adjustment. Following our example, we'll first run:
$> httxt2dbm -i staffrepam.txt -o staffremap.map
Now that we have a hashmap file, we can adjust our RewriteMap directive accordingly, changing the type to map, and of course updating the file name, which becomes:
RewriteMap staffremap dbm:/path/to/staffremap.map
RewriteMap substitutions provide a straight-forward, and high-performance method for pretty extensive enhancement of RewriteRules. If you are not familiar with RewriteRules generally, at some point you should consider reviewing the Apache documentation on mod_rewrite—it's worthwhile knowledge to have.