Exporting and importing big Drupal databases
Once your site's database dump file gets to be 1GB or more, phrases like "oh, just download and import a DB dump" can't really be taken for granted anymore. So here are some tips for dealing with large databases, especially those of the Drupal variety.
Exporting
Before we can import, we must export. With a big DB, you don't want to just do a regular old mysqldump > outfile.sql and call it a day. Here are some tips.
Find the size before exporting
It can sometimes be useful to see how big the export is going to be before you actually export anything. That way, you can know ahead of time if you need to be doing this or that to reduce the size, or if it won't matter since the whole thing won't be that big anyway.
Here's a query you can run to see the size per DB table:
<span class="token keyword">SELECT</span> TABLE_SCHEMA<span class="token punctuation">,</span> TABLE_NAME<span class="token punctuation">,</span> DATA_LENGTH <span class="token operator">/</span> POWER<span class="token punctuation">(</span><span class="token number">1024</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span> Data_KB<span class="token punctuation">,</span> DATA_LENGTH <span class="token operator">/</span> POWER<span class="token punctuation">(</span><span class="token number">1024</span><span class="token punctuation">,</span><span class="token number">2</span><span class="token punctuation">)</span> Data_MB<span class="token punctuation">,</span> DATA_LENGTH <span class="token operator">/</span> POWER<span class="token punctuation">(</span><span class="token number">1024</span><span class="token punctuation">,</span><span class="token number">3</span><span class="token punctuation">)</span> Data_GB<span class="token keyword">FROM</span> information_schema<span class="token punctuation">.</span><span class="token keyword">tables</span> <span class="token keyword">WHERE</span> table_schema <span class="token operator">NOT</span> <span class="token operator">IN</span> <span class="token punctuation">(</span><span class="token string">'information_schema'</span><span class="token punctuation">,</span><span class="token string">'performance_schema'</span><span class="token punctuation">,</span><span class="token string">'mysql'</span><span class="token punctuation">)</span> <span class="token keyword">ORDER</span> <span class="token keyword">BY</span> DATA_LENGTH<span class="token punctuation">;</span>
And here's another query you can run to see what the total size for the entire DB is:
<span class="token keyword">SELECT</span> Data_BB <span class="token operator">/</span> POWER<span class="token punctuation">(</span><span class="token number">1024</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span> Data_KB<span class="token punctuation">,</span> Data_BB <span class="token operator">/</span> POWER<span class="token punctuation">(</span><span class="token number">1024</span><span class="token punctuation">,</span><span class="token number">2</span><span class="token punctuation">)</span> Data_MB<span class="token punctuation">,</span> Data_BB <span class="token operator">/</span> POWER<span class="token punctuation">(</span><span class="token number">1024</span><span class="token punctuation">,</span><span class="token number">3</span><span class="token punctuation">)</span> Data_GB<span class="token keyword">FROM</span> <span class="token punctuation">(</span><span class="token keyword">SELECT</span> <span class="token function">SUM</span><span class="token punctuation">(</span>data_length<span class="token punctuation">)</span> Data_BB <span class="token keyword">FROM</span> information_schema<span class="token punctuation">.</span><span class="token keyword">tables</span><span class="token keyword">WHERE</span> table_schema <span class="token operator">NOT</span> <span class="token operator">IN</span> <span class="token punctuation">(</span><span class="token string">'information_schema'</span><span class="token punctuation">,</span><span class="token string">'performance_schema'</span><span class="token punctuation">,</span><span class="token string">'mysql'</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
Dump without unnecessary data
For those cases where you need the database structure for all of the tables, but you don't need the data for all of them, here's a technique you can use. This will grab the entire DB structure, but lets you exclude data for any tables that you want. For example, search_index, cache_*, or sessions tables will be good places to cut out some fat.
<span class="token comment"># First we export the table structure.</span>mysqldump --no-data database_name <span class="token operator">></span> /export.sql
Just replace "table_name1" and "table_name2" with the tables that you want to skip, and you're golden. Also note that you can use the % character as a wildcard, so for example, you could ignore "cache%" for all cache tables.
After you do that, you'll have a single export.sql file that contains the DB structure for all tables and the DB data for all tables except the ones you excluded. Then, you'll probably want to compress it...
Compress all the things
This one may go without saying, but if you're not compressing your database dumps then either they're really tiny, or you're dumber than a dummy.
drush sql-dump --gzip --result-file<span class="token operator">=</span>db.sql
Compare that with the regular old:
drush sql-dump --result-file<span class="token operator">=</span>db.sql
...and you're going to see a huge difference.
Or if you already have the SQL dump that you need to compress, you can compress the file directly using:
<span class="token function">gzip</span> -v db.sql
That will output a db.sql.gz file for you.
Importing
Now you have a nice clean compressed DB dump with everything you need and nothing you don't, and you're ready to import. Here are a few ways to ease the pain.
Import a compressed dump directly
Instead of having to decompress the dump before importing, you can do it inline:
drush sqlq --file<span class="token operator">=</span>db.sql.gz
Note that --file supports both compressed and uncompressed files.
Exclude data when importing
If you receive a DB dump that has a lot of data you don't need (caches, sessions, search index, etc.), then you can just ignore that stuff when importing it as well. Here's a little one-liner for this:
gunzip -c db.sql.gz <span class="token operator">|</span> <span class="token function">grep</span> -Ev <span class="token string">"^INSERT INTO \`(cache_|search_index|sessions)"</span> <span class="token operator">|</span> drush sqlc
What this is doing is using "grep" as a middleman and saying "skip any lines that are insertion lines for these specific tables we don't care about". You can edit what's in the parenthesis to add/remove tables as needed.
Monitor import progress
There's nothing worse than just sitting and waiting and having no idea how far along the import has made it. Monitoring progress makes a long import seem faster, because there's no wondering.
If you have the ability to install it (from Homebrew or apt-get or whatever), the "pv" (Pipe Viewer) command is great here:
<span class="token function">pv</span> db.sql <span class="token operator">|</span> drush sqlc
Or if your database is compressed:
<span class="token function">pv</span> db.sql.gz <span class="token operator">|</span> gunzip <span class="token operator">|</span> drush sqlc
Using "pv" will show you a progress bar and a completion percentage. It's pretty awesome.
If you don't have "pv" then you can settle for the poor man's version:
<span class="token function">watch</span> <span class="token string">"mysql database_name -Be 'SHOW TABLES' | tail -n2"</span>
That slick little guy will show you the table that is currently importing, and auto-updates as it runs, so you can at least see how far through the table list it has gone.
Tools and Resource
In this post I tried to focus on commands that everyone already has. If this just isn't cutting it for you, then look into these tools which could help even more:
- SyncDB - a couple Drush commands that split DB dumps into separate files and import them in parallel, drastically speeding things up
- Drush SQL Sync Pipe - an alternative to "drush sql-sync" that uses pipes where possible to speed things up