High concurrency Composer
On behalf of Acquia I’m currently working on Drupal’s next big leap: Automatic Updates & Project Browser — both are “strategic initiatives”.
In November, I started helping out the team led by Ted Bowman that’s been working on it non-stop for well over 1.5 years (!): see d.o/project/automatic_updates. It’s an enormous undertaking, with many entirely new challenges — as this post will show.
For a sense of scale: more people of Acquia’s “DAT” Drupal Acceleration Team have been working on this project than the entire original DAT/OCTO team back in 2012!
The foundation for both will be the (API-only, no UI!) package_manager
module, which builds on top of the php-tuf/composer-stager
library. We’re currently working hard to get that module committed to Drupal core before 10.1.0-alpha1
.
Over the last few weeks, we managed to solve almost all of the remaining alpha blockers (which block the core issue that will add package_manager
to Drupal core, as an alpha
-experimental module. One of those was a random test failure on DrupalCI, whose failure frequency was increasing over time!
A rare random failure may be acceptable, but at this point, ~90% of test runs were failing on one or more of the dozens of Kernel
tests … but always a different combination. Repeated investigations over the course of a month had not led us to the root cause. But now that the failure rate had reached new heights, we had to solve this. It brought the team’s productivity to a halt — imagine what damage this would have done to Drupal core’s progress!
A combination of prior research combined with the fact that suddenly the failure rate had gone up meant that there really could only be one explanation: this had to be a bug/race condition in Composer itself, because we were now invoking many more composer
commands during test execution.
Once we changed focus to composer
itself, the root cause became obvious: Composer tries to ensure the temporary directory is writable and avoids conflicts by using microtime()
. That function confusingly can return the time at microsecond resolution, but defaults to mere milliseconds — see for yourself.
With sufficiently high concurrency (up to 32 concurrent invocations on DrupalCI!), two composer
commands could be executed on the exact same millisecond:
// Check system temp folder for usability as it can cause weird runtime issues otherwiseSilencer::call(static function () use ($io): void { $tempfile = sys_get_temp_dir() . '/temp-' . md5(microtime()); if (!(file_put_contents($tempfile, __FILE__) && (file_get_contents($tempfile) === __FILE__) && unlink($tempfile) && !file_exists($tempfile))) { $io->writeError(sprintf('<error><span class="caps">PHP</span> temp directory (%s) does not exist or is not writable to Composer. Set sys_temp_dir in your php.ini</error>', sys_get_temp_dir())); }});
— src/Composer/Console/Application.php
in Composer 2.5.4
We could switch to microtime(TRUE)
for microseconds (reduce collision probability 1000-fold) or hrtime()
(reduce collision probability by a factor of a million). But more effective would be to avoid collisions altogether. And that’s possible: composer
always runs in its own process.
Simply changing
sys_get_temp_dir() . '/temp-' . md5(microtime());
to
sys_get_temp_dir() . '/temp-' . getmypid() . '-' . md5(microtime());
is sufficient to safeguard against collisions when using Composer in high concurrency contexts.
So that single line change is what I proposed in a Composer PR a few days ago. Earlier today it was merged into the 2.5 branch — meaning it should ship in the next version!
Eventually we’ll be able to remove our work-around. But for now, this was one of the most interesting challenges along the way :)