Tokens, Performance, and Caching

Earlier this year, I was brought in to help our client Foreign Affairs whose site was experiencing some pre-launch performance issues. Authenticated pages were taking far too long to render and they needed it fixed. We identified and addressed a couple of issues, but one in particular I want to unpack because it's an issue that many developers may not think about.

Well there's yer problem

Foreign Affairs had New Relic installed on their production server, which was able to turn up several leads. In particular, the function _entity_token_get_token() was being called over 13 million times in the course of a day. The next most common function was only a few hundred thousand. A clue, Sherlock!

Some spelunking through the site's codebase and profiling using Blackfire.io eventually turned up the issue: The site was using both the Doubleclick for Publishers (DFP) module and the SiteCatalyst module, both of which are advertisement tracking services that need to place numerous markers on each rendered page. So far so good. Those markers need to vary by page, of course. So far so good. And both modules were using the token replacement system in Drupal core to generate those markers. Not so good.

Drupal's token replacement system is a powerful string mangling engine. Modules can expose []-denoted tokens — such as [node:title], [user:name], or [timestamp:Y-m-d] — that a user can enter into a textfield, and then at runtime those values get replaced with an appropriate, contextually-relevant value — such as the title of a node, login name of the current user, or the timestamp when the code is run, respectively.

The problem is that the token system is not architected to be very performant. The root issue is token generation, which is done for every token replace call. That is, every time token_replace() is called, all token hooks are called for every token in the string, and those in turn usually generate far more token replacement values than are actually going to be used. Entity API's token integration is even less efficient, and each token replacement call generates hundreds of internal function calls. Most of the time that's OK; if it takes a few dozen extra milliseconds to generate a path alias with pathauto when a node is saved no one is really going to notice or care. When the string needs to be re-processed on every page request, people notice and care. When it's not one but over a dozen strings to be processed on every page (because of how the modules were configured), people really notice and really care.

Take a memo

Since rewriting the token system in core to be smarter was clearly off the table, that left making the token system be invoked less often. Between two different page loads of the same node the token replacement values wouldn't change. You know what that means? Caching!

The solution was to modify both DFP and SiteCatalyst in the same way, adding a caching wrapper to all token_replace() calls. The fancy name for this concept is "memoization": That is, if you know that calling a function with the same parameters is always guaranteed to return the same result then you can cache ("memoize", as in, record a memo) that result. Then, the next time it's called, it can simply return the already-computed value. It's one of the easiest performance boosts you can get, as long as your functions have very clear, explicit inputs. (For more on this technique and others like it, see my DrupalCon Austin presentation on Functional PHP or catch it again this fall at Connect.JS in Atlanta.)

Our inputs were a bit more complicated in this case, as the token_replace() call can take any number of complex Drupal objects, such as nodes and users. Fortunately in our case the DFP module provided only a single node, user, and taxonomy term as possible sources for token data. That simplified the problem space. That meant we could create a cache key based on the input string (that contains the tokens to be replaced) and the IDs of the available objects. To allow for arbitrarily long input strings we can simply hash the string to get a unique key for lookup purposes. Those together form the total inputs, and so the output should always be the same.

Great! Now we can cache calls to token_replace() based on the input values and a given page's ad markers only need to be recalculated after a cache clear, and everything is fine!

Ch-ch-ch-ch-changes!

Except for when someone edits a node. The node ID clearly doesn't ever change when updating a node, so the cached value would still get used rather than regenerated. The simple solution is to just include the node's last updated timestamp in the key. That works great for nodes but users and taxonomy terms do not have a last-updated field.

Fortunately that problem has already been solved (yay community!) in the form of the Entity modified module. That module allows for tracking the last-modified time of any entity, creating its own tables to track that information if the entity doesn't offer it already.

Great! We introduce a dependency on the entity_modified module, then use the last-modified value that module provides as part of the cache key. Now an edit to a node, user, or term will result in a new last-updated timestamp and thus a different cache key, and we'll regenerate the token value as soon as it's edited, and everything is fine!

Cache the stampede!

Except we're still then hitting the cache, and therefore the database, once for each string we need to process. In our case that was over a dozen cache lookups on every page, and thus a dozen extra needless hits against the database. Not good at all.

The solution here depends on the particulars of DFP's use case. It's calling token_replace() on every page, but what it does on each page is constant: It takes a fixed set of user-configured strings and processes them, then sticks the result into the page for ad tracking purposes. That means, realistically, we don't need a dozen cache items; we only need one cache item per page (which is reasonable), which holds all of the token replacements for that page. That cache item is just an array of the cache keys we defined before to the result we computed before. Once the cache is warm there's only a single cache lookup per page, and every string in that cache item gets used so there's no waste.

Great! Now we have a very efficient caching strategy, no wasted data, we skip calling token_replace() in the majority case, and everything is fine!

Keep it clean

Except we would get new items added to that array over time as a node gets edited, because we're not cleaning out the old values from the array. Additionally, some global tokens are sensitive to more than just the passed-in objects. Although we didn't need them, there are tokens available for the current date and time, for instance. There's no way we can catch them all safely.

The solution here is deceptively simple: Have an expiry time on our cache items, so that they'll get invalidated eventually no matter what happens, with a default of getting cleaned up when Drupal periodically clears its caches using the CACHE_TEMPORARY flag. Now we still get valid, non-stale data for our token replacements and any left-overs in the cache (either pages that are removed or nodes that are edited) get cleaned up by Drupal in due course. That way the waste never gets too big, and everything is fine!

Mind what you have learned

The final patch for DFP has already been committed, but is available online to see exactly how it all works. The main work is all in the two new functions _dfp_token_replace_make_key(), which produces the unique key per replacement, and dfp_token_replace(), which is the memoizing wrapper that coordinates the action. Both are surprisingly simple given what they allow us to do.

In the end, for Foreign Affairs, the patch to DFP (combined with a nearly identical patch to SiteCatalyst) resulted in a savings of over 400 ms for a warm cache on every authenticated page load. That's huge! So what have we learned along the way?

We've learned that monitoring tools that can provide deep profiling are invaluable. Online tools like New Relic or developer tools like Blackfire.io are both useful in their own ways.
We've learned how to memoize our code for better performance. In particular, we need to know all of the inputs to our function in order to do so, which is not always entirely obvious. Once we actually know all of the true inputs to a function, though, memoizing is a very efficient and powerful way to improve the performance of our code. Writing code with explicit, fine-grained inputs also helps to make it easier to memoize our code (and easier to test, too).
We've discovered the useful entity_modified module.
We've learned that Drupal's token system, while powerful, is not very performant. The Entity token integration in particular is very bad. That doesn't mean you should not use it, but it means being very careful about how and when it is used.
We've learned to be mindful of the performance implications of our code. Using tokens to generate ad markers is a completely logical decision, but do we know what the performance implications are going to be if it gets used more than we expect? Or if the site has more authenticated users than we were expecting, so page caching isn't useful? These are hard questions, but important ones to consider… and to adjust once we realize we got it wrong the first time.

D8FTW?

Has anything changed in Drupal 8 to help with this sort of case? Quite a bit, actually. For one, the move to more cleanly injected service objects means that far more of the system is memoizable, in those cases where it's useful.

More importantly, though, the cache context and render caching systems now allow most output generating code (controllers, blocks, formatters, etc.) to do what we've described above automatically. A full discussion of how that works is worth a whole blog post on its own (or several), but suffice to say that enabling all of core to smartly cache output like we're doing here, in a much more robust and automated way, has been a major push in Drupal 8's development.

Original Article:

Tokens, Performance, and Caching