Drupal Migration Tips
Drupal Migration TipsNov 6th 2015
As part of our recent work on road.cc, we performed a large data migration and transformation of hundreds of thousands of rows of data into their new Drupal 7 site, including users, taxonomy terms, nodes and comments. We did this using a combination of the migrate, migrate_d2d and migrate_extras modules, as well as a custom module to house all of our own migration code. During this process, I’ve collated some tips and tricks that I found useful.
Use Drush
I’d suggest tha tyou use Drush to run the migrate commands rather than using the Migrate UI. I’ve found it to be more robust because I’ve had migrations fail when being run via the migrate UI, only to run successfully when executed via Drush.
There are the main Drush migrate commands that you can run:
$ drush migrate-import (mi)$ drush migrate-stop (mst)$ drush migrate-reset-status (mrs)$ drush migrate-rollback (mr)
To see a full list of the available Drush Migrate commands, run $ drush --filter=migrate
.
Use prepareRow() and drush_print_r()
If you’re used to using functions like dpm()
, dsm()
or kpr()
in your module code to find out a value of a variable, or what properties an array or object has, there's a similar function in Drush - drush_print_r()
. This outputs data to the screen in the same way that PHP’s print_r()
function does.
I tend to use it within the prepareRow()
method to see what data is available within the $row
object.
protected function prepareRow($row) { drush_print_r($row);}
If you are using migrate_d2d or extending another class, remember to use parent::prepareRow()
to add to the preparations in the parent class rather than overridding them, and also to skip the row if was skipped in the parent class.
class RoadccPageNodeMigration { ... protected function prepareRow($row) { if (parent::prepareRow($row) === FALSE) { return FALSE; } }}
Limit the Number of Items that You Are Importing
Rather than waiting for an entire migration to run to confirm if your latest script amend or addition worked, you can run a migration on a reduced number of items by using the --limit
option.
drush mi RoadccPageNode --limit=”10 items”
You can limit by the number of items, such as "10 items", or by the amount of time, such as "60 seconds".
Update, not Rollback
You can also save time by using the --update
option to update any already-imported rows, rather than rolling back and removing them, then re-importing.
drush mi RoadccPageNode --limit="10 items" --update
Use addSimpleMappings()
As part of each migration, you need to map the source values to the approprate destination using the addFieldMapping()
method. For example:
$this->addFieldMapping('destination', 'source');
If, however, the source and destination names are the same, you can use the addSimpleMappings()
method. This just takes a list of property names in an array and automatically uses each one as both the source and the destination.
$this->addSimpleMappings( array( 'uid', 'created', 'changed', 'field_foo', ... ));
If you are using migrate_d2d then some of the common properties - e.g. uid
, created
, changed
- will already be mapped in this way in the parent class.
Use addUnmigratedSources() and addUnmigratedDestinations()
If you use the Migrate UI, then you may see messages like the one in this image. In this example, there are 108 unmapped destination properties, although the same can happen for sources (properties attached to the data being imported). These may be intentionally not mapped, a newer source database has added more sources following a schema update, or a new module has been installed and has added more destinations.
If you do mean to intentionally not map a source or a destination, then use the addUnmigratedSources()
and/or the addUnmigratedDestinations()
method within your constructor after declaring your field mappings.
Both methods take an array of property names to declare as unmigrated, and will therefore mark them as mapped and remove the error.
public function __construct(array $arguments = array()) { ... // These fields are not being migrated, so mark them as such. $this->addUnmigratedDestinations( array( 'field_one', 'field_two', 'field_three, ) );}
This makes it much clearer when you or a colleague re-visits this migration at a later date that these were intentionally not mapped and not forgotten about or were not present when the migration class was written.
Write your own base Migration Classes
Because migrate is based on object-oriented classes, these can be extended and customised as needed, making them extremely flexible. I’ve found this to be very useful when I needed to do something that needed to apply to all migrations, such as getting the database connection, or something affected all migrated nodes, such as replacing full URLs with relative ones so that they work on different environments, or mapping the source node ID values to the destination ones.
This is done by writing our own abstract classes that extend the default ones such as Migration
or DrupalNode6Migration
. Because we’re using the abstract
keyword before the class name, we ensure that these classes cannot be instantiated directly, and must be extended by another class.
Extending a Normal Migration Class
class RoadccMigration extends DrupalMigration { protected function getConnection($connection = 'migrate') { return Database::getConnection('default', $connection); }}
In this example, we can use $query = $this->getConnection()
as the starting point for any classes that extend RoadccMigration
, and then continue building the query using the db_select()
syntax. This means that there is less duplication within our custom classes, and it makes it easy to update if needed as it’s only declared once.
Extending a migrate_d2d Class
abstract class RoadccNodeMigration extends DrupalNode6Migration { public function __construct(array $arguments = array()) { parent::__construct($arguments); } protected function prepareRow($row) { // Update any absolute URLs. foreach (array('body', 'teaser') as $property) { if (isset($row->{$property})) { if (strpos($row->{$property}, 'http://www.road.cc')) { $row->{$property} = str_replace('http://www.road.cc', '', $row->{$property}); } if (strpos($row->{$property}, 'http://road.cc')) { $row->{$property} = str_replace('http://road.cc', '', $row->{$property}); } } } }}
In this example, we’ve extended the DrupalNode6Migration
class from the migrate_d2d module, and are performing some transformations on the body and teaser values - removing the full URL so that users aren’t redirected back to the original production site rather than to their intended destination.
As all of our node migrations extend RoadccNodeMigration
, this automatically applies to all nodes imported via the migration.
Limit your Result Set
If you need to test something, like if all of your field mappings are working, I found it beneficial to find a small collection of examples that would cover all use cases, and then limit the query so that the migration would only affect those nodes, rather than reguarly searching for the right examples to test against.
If you’re writing normal migrations you can do this in your __construct()
method as part of your query. If you’re extending a migrate_d2d class, then you’ll need to add your own query()
method and add the additional conditions to the query from the parent class.
For example:
class RoadccPageNodeMigration extends RoadccNodeMigration { ... protected function query() { // Get the query from the parent class. $query = parent::query(); // Add any new conditions. In this case, just filter on this single node. $query->condition('n.nid', 123456); // Return the new, full query. return $query; }}
This means that you can quickly re-run that migration and see how your changes affected the result, if at all, rather than waiting for the entire migration to be re-run on of hundreds or thousands of nodes.
Just remember to remove the test conditions when they are no longer needed. If you use Git for version control, I’d suggest using git add -p
to interactively add chunks of code to the staging area, allowing you to review each one and keep your code repository clean of any test conditions.
Written by: Oliver Davies, Senior Drupal Developer
Microserve is a Drupal Agency based in Bristol, UK. We specialise in Drupal Development, Drupal Site Audits and Health Checks, and Drupal Support and Maintenance. Contact us for for further information.