Drupal Staging Problem (revisited)
As a freelancer and Drupal specialist, I've worked with a number of website-building teams. It seems like each group reinvents solutions to the problem of sharing settings between multiple copies of Drupal. Frankly I'm not sure "solution" is the best word, because with each team, there have been headaches in setting up a new copy of the site for development, and further headaches getting changes to other developers' copies and to the live server.
This problem has been called "the Drupal Staging Problem" (in an excellent summary by Dominique De Cooman). If you ask around, you'll hear about a number of solutions, including but not limited to hook_update_N, backup_migrate.module, features.module, deploy.module, and possibly others I don't know about. In this post I describe a new(-ish) approach that I call site_update.
So many solutions! Does that mean the problem is solved? Or could it mean the problem is widespread, but not solved at all?
Why so hard?
This isn't a problem unique to Drupal, but not all software development has this problem. Say you're building a typical piece of software, maybe a spreadsheet. Your team writes source code, and shares the code via a source code control system. The result is compiled into an executable (and maybe default configuration files) and distributed to end users. End users produce "content" (their files), but developers don't need to ever see those files. They can produce the next version of the software with the end user data.
Drupal is different because the end product, the website, consists of
a) The source code, mostly PHP and templates.
b) The database, including content produced by users.
c) Files and media uploaded by the user, if any.
The source code isn't the hard part, that's basically solved by source code control software. The uploaded files can be shared pretty easily by rsync or a similar tool. It's the database that causes headaches.
That database isn't simply content, its both content and settings. Settings are essentially part of the software. If the settings are not correct, the website won't work properly. So each developer needs correct settings. That's why, in a typical Drupal dev shop, each developer has an entire copy of the live database, or something close to that.
Why can't the settings part of the database be separated from the
content part?
You might think it would be easy. Just say some tables are settings (like the {system} table which says which modules and themes are enabled), while other tables are content (like say the {comments} table which most likely has only end user posts). But, as you try to label each table as either settings or content, it gets tricky. Say you've enabled a taxonomy access control module and a taxonomy with terms like 'public' or 'private' that you consider part of the settings, while you also allow end users to tag their posts. Now, you have a taxonomy term table which has both settings and content in the same table. From site to site, depending on the features enabled, you'll find that some tables are considered either settings or content, or both.
What's so hard about sharing the live database?
You may think: just have every developer work with a copy of the live site. This usually turns out to be problematic for one or more of the following reasons.
a) The live database is constantly changing, so developers need to update their copy frequently.
b) Replacing a developer database with the live copy blows away the developer's working data.
c) The live database might be huge. Having each developer work with a copy complicates matters.
d) Settings might have external references that should not be shared, like a facebook application ID or google API key.
e) The live database might have sensitive/private data that most developers should not be privy to.
Even if it can be done, it usually turns out to be a big headache.
So, what's a Drupal developer to do?
I'm glad you asked!
I'm sharing a technique I use (when I have a say in the matter). I call it the site_update module. In the past I've been hesitant to recommend it to a wide audience. However as it has grown more stable, and as I have more converstations with folks struggling with this problem, I've decided to share it. For example at this weekend's BADCamp, I'm going to give a live demo.
The site_update module divides the database settings from content. I know, I just described that as a hard task, so how does site_update do it? First, it divides each table. So that taxonomy term example I mentioned, that one table contains both settings and data.
The things that set site_update apart are...
a) There's a special copy of Drupal where the settings can be changed. It's called the "base".
b) Each table might contain settings and data. It uses a database trick to reserve a range of IDs.
The site_update.module does most of the work for you. You just have to remember to make settings changes on the special "base" copy of you site. And use a special script to "dump" those settings to a file. The file can then be treated like source code. So it can be checked into a version control system and shared among developers that way.
It is a little tricky to start using site_update on a site that already has a large database. It works best when you use it from the very beginning of every project. I highly recommend you give it a try on your next Drupal site.
Tags: