Aegir 3 and Drupal 10: eeek!
Aegir is a hosting system built in Drupal, for Drupal.
It lets you easily create new Drupal sites and create databases, filesystems, virtual hosts etc. for the sites. You can manage hundreds or thousands of sites using a simple Drupal based UI. As simple as you would manage a list of 100 blog posts, you can manage 100 Drupal websites.
Currently the latest released version of Aegir is: Aegir 3.
Aegir 3 relies on Drush 8, which means it can work with: Drupal 7; Drupal 8 and Drupal 9. But not Drupal 10. Oh.
I have need for it to work with Drupal 10. Specifically we have a large collection of Drupal sites, that currently Drupal 9, but ready and waiting to jump to Drupal 10. Yes, we've left it too late, we should have done this a long time ago, but we are where we are.
I've spent the day looking into various options, they are roughly:
- Move to BOA
- Get Aegir 3 to work with Drupal 10
- Move to Aegir 4
- Move to Aegir 5
- Move to something else
My terms of reference
I'm looking for a solution that changes as little as possible in the current stack. We've got a big, beefy server, and lots and lots of Drupal sites on it. The Drupal sites themselves run fine and we're after the management of them really. We don't need to scale out particularly, and if we do, we've got options on the hardware side vertically. We also don't need some fancy multi-datacenter approach, and ideally keep with our Nginx, Varnish and Apache sandwich.
I do want to manage things in the old school way, but I need a few, select, non-technical users to be able to manage the sites in a friendly UI like Aegir's hosting system.
Lots of the new hotness out there are essentially orchestrating Kubernetes, and that's great, but I don't actually want 100 containers all running PHP which is how I understand all of these systems would essentially function.
I used to be an Aegir maintainer, and I have a deep knowledge of how Aegir works and even helped write some of the inter-process communications stuff in Drush 8.
Move to BOA
The first option is one that has shown to be working, and requires a custom fork of Drush 8, some core patches and using BOA. BOA is quite an aggressive fork of Aegir, and I'm not sure I want all the changes and whatnot that comes with it. I think I have to discount it, because the last time I looked there would simply be too much change on the nginx side of things at least, and it was doing all kinds of things I don't simply understand and thus won't be able to reason about.
The main inspiration I take from BOA is that they've got it working! Well, they've got something working. It looks like they've essentially forked Drush 8, applied some Drupal core and Aegir patches. This makes it so that Aegir can still use it's aliases, code and talk to Drupal 10 sites. How they've been able to do this is very impressive indeed. I feel like it's just a matter of time before it breaks though, surely Drupal 10 is going to change something in it's lifecycle and cause issues. I don't quite see how this can work with Drush commands in contrib modules. For example, during deployments on our sites we revert features, but that's a features module provided Drush command, which won't be callable from Drush 8 in Drupal 10.
Get Aegir 3 to work with Drupal 10
This I feel is the least effort option. Find some way to get standard Aegir 3 to communicate with Drupal 10. Now, Drush 8 can't directly bootstrap a Drupal 10 site. Aegir 3 uses Drush 8 to be both the backend storage for all the data about the sites to host, and to bootstrap the sites it manages.
This worked really well in the Drupal 5/6/7 days, and just about coped with Drupal 8/9, but hasn't for Drupal 10. Think of it this way, what if instead of hosting a Drupal site, it was a Wordpress site? Aegir simply wouldn't be able to neatly bootstrap into a Drupal site and do it's stuff. It's the same with Drupal 10.
How much stuff does it really do though? I'm not 100% sure. I know that it gets the currently installed packages for example, but what else does it do. It looks to me like it often hands off a Drush subprocess do the heavy lifting. Maybe we can hand off to another subprocess, one that runs a site-local Drush 12+ instance, passing in all the options it needs to bootstrap and run the actual commands on the site.
Please, if this is a completely crazy way to go, someone please comment and let me know!
Move to Aegir 4
So there is a 4.x branch of Aegir, and apparently it can work with Drupal 10 sites. It does this by replicating the aliases to YAML files and then running a global Drush 11, which can then bootstrap the Drupal sites. Apparently it also has replaced some of the Drush invocations with standard process invocations. I guess it had to do this because the Drush to Drush process communication stuff was in Drush 8 and got removed.
It seems like Aegir 4 also brought in a lot of other changes too, and that's not particularly something I want or need. Also, it's never been officially released/tested by the community etc.
Move to Aegir 5
This got announced in a few blog posts a while back, and I've not heard anything since. It's possible that it's there and ready to go, but as far as I understand it, it's a whole change to how the sites would be managed and hosted, and while an import from Aegir 3 is on the roadmap, it seems like it would import the sites into something completely different, not some simple vhosts and a DB server.
Also, even if it does exist, it'll have had minimal testing and eyes on it.
Move to something else
I think this is my preferred long-term option. These sites don't really need Drupal. They could actually be static sites plugged into a central content repository. That would drastically simplify lots and lots of things. Or we keep the Drupal sites, but move them to Kubernetes and using something like Amazee's Lagoon to host them etc. This would be very cool, but probably quite expensive in terms of having enough resources to host all the containers.
One for the long term future I think.
Get Aegir 3 to work with Drupal 10
I might give this a go in that I think what I can do is this:
- Get Aegir running normally.
- Install a Drupal 9 site.
- Get a single Aegir task running that doesn't bootstrap the Drupal 9 site with Drush 8. I'm thinking something like getting the one-time login link. This has data flowing in both directions, and I think, Drush 8 trying to bootstrap Drupal to the run the provision command. Getting this working would involve calling a site local Drush based on the data from the alias, but not actually using the alias.
- If that works, put something into settings.php that throws an exception if Drush 8 is detected, and then try to get everything else working for my use-cases.
- Then try and set up a Drupal 10 site.
I'd love to know your thoughts on this. Do you have an old Aegir running sites, and you don't know what to do with it? Or do you think I'm crazy and want to offer me a better way to go, use the comments below, please!
Updates/notes
So looking at the provision commands, the actual provision command to say, get the one-time login link bootstraps the Drupal site. So that would need to change that command so that it no-longer bootstrapped Drupal at all, but instead changed to be a pure Drush command. Then it could load the data from the alias etc.I'm tempted to go ahead and get a dev instance of Aegir running and the add a new Drush command, that doesn't bootstrap Drupal, but does attempt to grab data from an alias and see what I can do. That possibly an even simpler proof of concept, as it's an entirely new Drush command, but one I then know could be called by provision's task runner instead of the current one.