Drupal 8 content migrations from CSV or spreadsheet
Drupal 8 content migrations from CSV or spreadsheet
Joel Steidl
Mon, 08/17/2020 - 06:32
Content migration into a Drupal website using spreadsheet or CSV data can be surprisingly effective — especially (counterintuitively?) with large and complex datasets. I’ve written about data migration in Drupal 8 a couple of times, but new projects keep highlighting the diversity of data sources and content configurations, reminding me over and over again that data migration doesn’t really have a one-size-fits-all solution. In this post we’ll get you from zero (fresh Drupal 8 install) to a basic CSV data migration — including entity references and multiple field values — in about 15 or 20 minutes. You can download the accompanying Aten CSV Migrate example module here, and skip straight to the instructions if you'd like.
Drupal 8 offers a handful of powerful and extensible migration modules in core. These modules lay the foundation for a wide variety of migration methods including several flavors of CSV importers like the point-and-click Entity Importer module for Drupal 8 developed by my colleague Travis Tomka. GUI powered migrations are perfect for minimally to moderately complex content, but can have a hard time crossing the finish line alone for complicated datasets.
When GUIs overwhelm: CSV or Spreadsheet content migrations
Earlier this year I began working with the National Science Foundation’s National Ecological Observatory Network (NEON) to migrate robust ecological data collected from more than eighty terrestrial and aquatic field sites across the United States into a Drupal 8 website. The bulk of the data was destined for a content type with more than 75 unique fields, and the sheer volume and complexity of the data was steering me away from point-and-click interfaces.
Luckily for me, the NEON team are old hands when it comes to spreadsheet manipulation. Their expertise and flexibility were applied to formatting a spreadsheet export of their data specifically configured for an import into Drupal. Then their spreadsheets were exported to CSV, a format simple to consume into a Drupal website with the right modules and configurations. With just a little back and forth, we were well on our way to a successful, complex data migration with minimal custom code.
The basics: Getting spreadsheet or CSV data in Drupal
A quick introduction to configuring a CSV import using the Migrate Source CSV module for Drupal 8 is warranted. Once the basics are clear, a more complex import won’t be so overwhelming. If your data is in spreadsheet format, you’ll want to export it to CSV. The example below assumes a comma delimiter, no encapsulation characters, and no escape characters. For more complex content you’ll define these attributes in the source portion of your migration yaml file. The following steps culminate in a working example CSV data migration module you can download and tinker with. Note that this first example doesn’t discuss the more complex sample below, which is also included in the download.
- Ensure that the core Migrate module is enabled
- Install and enable Migrate Source CSV, Migrate Tools, and Migrate Plus
- Create a new custom module (or just download my working example) which will include an info file, a migrate file, a source file, and an install file following the outline below
modules/custom/aten_csv_migrate/aten_csv_migrate.info.yml
type: module
name: Aten CSV Migration
description: 'Aten CSV Migration example. Read more from this data migration from CSV or spreadsheet tutorial.'
package: Migration
core_version_requirement: ^8.7.7 || ^9
dependencies:
- drupal:migrate
- migrate_source_csv
- migrate_plus
Here we simply define the module and its dependencies.
modules/custom/aten_csv_migrate/aten_csv_migrate.install
/**
* Implements hook_uninstall()
*/
function aten_csv_migrate_uninstall() {
Drupal::configFactory()->getEditable('migrate_plus.migration.aten_csv_migrate_node')->delete();
Drupal::configFactory()->getEditable('migrate_plus.migration.aten_complex_csv_example')->delete();
}
This implementation of hook_uninstall() will clean up active configuration for the example CSV data migration when the module is uninstalled.
modules/custom/aten_csv_migrate/config/install/migrate_plus.migration.aten_csv_migrate_node.yml
id: aten_csv_migrate_node
label: Aten CSV Migrate example node
migration_tags:
- Aten Migrate CSV
source:
plugin: csv
path: modules/custom/aten_csv_migrate/sources/aten_csv_migrate_items.csv
ids: [ID]
process:
title: Title
body: Description
field_color: Color
field_weight: Weight
type:
plugin: default_value
default_value: item
destination:
plugin: entity:node
This is the cornerstone of the migration. Using the migrate_plus.migration.my_module.yml namespace lets us hook into the power of the Migrate Plus module. You can see we define our CSV file in path and then map our fields to columns in the CSV file under process including the node type item which we’ll need to create.
modules/custom/aten_csv_migrate/sources/aten_csv_migrate_items.csv
ID,Title,Description,Color,Weight
100,Ball,A top quality exercise ball.,Red,4
101,Flag,The perfect flag for your collection.,Blue,1
102,Paperweight,A weight for your important papers,Black,2
103,Thermos,Liquids stay warm with this amazing product.,White,2
Some items that we’ll stuff into nodes!
Now that your code is in place, make sure that you have a content type configured to receive the content import. For this example I’m sticking with core field types and a custom node type item that matches my CSV data. Note that the field machine names match those in the process definitions of my migrate_plus.migration.aten_csv_migrate_node.yml file.
The content type "Item" contains simple core fields and will receive the CSV data.
And here are the field configurations.
The "Item" content type contains the core body field, a field_color (short text) field, and a field_weight (integer) field.
Now you’re set to run your import using drush. You’ll want to clear the drupal and drush caches before attempting your import via drush cr and drush cc drush - at which point you’ll see the migrate commands are available to drush via drush list.
Now we run the migration using drush migrate:import aten_csv_migrate_node, and we should see a success message.
With the module properly installed drush cr, drush cc drush, and drush migrate:import aten_csv_migrate_node should complete the example migration.
That’s it for the basic CSV data migration! You’ll now see the items defined in your CSV have populated as nodes, with all the field data mapped to the appropriate fields.
The CSV content has been imported into "Item" nodes.
And if we check inside, we'll see our individual fields migrated as we'd hoped.
Our title, description, color, and weight columns have imported. Note that the ID column is simply a unique identifier, and is not imported into a node attribute.
Now that we’ve got the basics down, we can move on to exploring some real-world complexities. Here’s where manipulating the format of the data in the CSV file becomes particularly handy.
A deeper dive: Getting spreadsheet or CSV data in Drupal
Migrating CSV data into a Drupal website is perhaps the most cost effective when A) The source data is inherently complex, and B) It can be easily manipulated, reformatted and re-exported to facilitate an easier migration. In the case of NEON’s content the volume and uniqueness of data fields, their relational aspects (think tags or entity references), and their inherent complexity (like multiple values for the same field) made it a good candidate on the first account. The client team’s capacity to fiddle with formats and re-export their data to spreadsheets made it a good candidate on the second.
For more complex field types we rely on Migrate Plus plugins which will be used to populate the data. For the complex example included in the download to run successfully, you’ll need to configure a new content type complex_item with the appropriate fields. My example uses the Geofield and Address modules which allows the setup of the field_coordinates and field_address fields respectively. The rest of the fields are core.
The complex_item content type configuration implements core, Address and Geofield fields: field_address (address), body, field_coordinates (geofield), field_link (link), field_multiple_taxonomy_term (entity reference: term), field_single_entity_reference (entity reference: node), field_single_taxonomy_term (entity reference: term).
It’s fields like field_multiple_taxonomy_term that really shine with CSV or spreadsheet imports, in that having the client separate multiple values with a “|” in their export and then implementing the explode plugin quickly solves what could otherwise be a complex problem.
Code highlight is from migrate_plus.migration.aten_complex_csv_example.yml. All mentioned code is included in the Aten CSV Migrate download included with this post.
Another thing to notice in migrate_plus.migration.aten_complex_csv_example.yml is the difference between the entity_lookup plugin and the entity_generate plugin. The entity_generate plugin will search by title for a matching entity, and create one if it doesn’t exist. The entity_lookup plugin, on the other hand, skips the field if an appropriate existing entity isn’t found.
Code highlight is from migrate_plus.migration.aten_complex_csv_example.yml. All mentioned code is included in the Aten CSV Migrate download included with this post.
The Migrate Plus module ships with an impressive collection of plugins and example code. Once you’ve gotten your feet wet, it’s likely you can find everything you need for complex data migrations from CSV within the various Migrate Plus example submodules.
There are as many ways to migrate data into Drupal as there are unique combinations of source data, client teams, and project requirements. Knowing more about the methods available to you is a great way to broaden your options and choose the right tools for each unique job. What are your go-to migration methods or tricks of the trade? I’d love to hear about them in the comments below.