Importing Data from a REST API with Entity Import and Migrate in Drupal 8
We recently needed to import hundreds of projects, and hundreds of thousands of time cards, from Harvest into Drupal. I liked the idea of having a user interface for running the migrations, and was curious to see if the Entity Import module could do the trick.
TLDR: it worked great. Here’s the code. To learn how to write your own source plugin for Entity Import, read on.
Why Entity Import?
What I love about Entity Import is that it provides a UI for running migrations. Further, migrate source plugins for Entity Import have their own custom configuration forms, allowing site admins to configure options for each importer. I envisioned a setup that would let us easily add new Harvest importers through the admin UI, configuring basic parameters like date range and endpoint for each.
The Recipe
To import data from Harvest using Migrate and Entity Import, we needed two main ingredients:
- A simple REST API client to make authenticated requests to Harvest .
- An entity import / migrate source plugin to pull the data into Drupal.
REST API Client
I won’t go in-depth on the API client in this post, but here’s a great article from my coworker, Joel, that outlines an approach similar to the one I took: Building Flexible REST API Clients in Drupal 8. You can download the code for the simple Harvest API client on GitHub. The end result was a Drupal 8 service I can inject into my migrate plugin, making API requests with a single line of code.
Here are a couple examples:
// Get all active projects.
$data = $this->harvestApiService->get(‘projects’, [“is_active”:TRUE]);
// Get all time cards entered since November 1.
$data = $this->harvestApiService->get(’time_entries’, [“from”:”2019-11-01”]);
Migrate Source Plugin for Entity Import
There are tons of great resources about migrate source plugins on Drupal.org covering everything from using core source plugins (like CSV), to leveraging contributed plugins (see Migrate Plus), to instructions on how to create your own. Writing a source plugin specifically for Entity Import is almost the same process, plus a few key additions.
Choosing the Right Base Plugin
Entity Import has two different options for base plugins. You can extend either one to create your own new source plugin.
Use EntityImportSourceBase for Configurable Importers
The EntityImportSourceBase class extends the core SourcePluginBase. You get all the methods and properties available in core, plus a couple new features specific to Entity Import.
First, you can add your own configuration form with the buildConfigurationForm() method. For my Harvest example, this is the form admins can use to add Harvest’s authentication keys, specify which endpoint the importer will use, and add basic parameters to the request.
Second, you can add your own importer form with the buildImportForm() method. This is the form admins will see when they are actually running the importer. For Travis’s original use case of importing CSVs, this form is where users actually upload the data. For my Harvest example, I used this form simply to show users some high level data for what they are about to input.
And of course, you can load up the data from your source. This comes directly from the core SourcePluginBase and isn’t specific to the Entity Import classes. But it’s important enough that I had to mention it, since without source data we wouldn’t be importing anything. In my Harvest example, I extended the initializeIterator() method to load the data from Harvest’s API.
Alright, we’re ready to import data. Only problem is, there’s a lot of it. When I configured a Harvest importer for time entries, there were hundreds of thousands of items to import. That’s way too much to try to do in a single HTTP request through the browser. Fortunately, Entity Import has us covered.
Use EntityImportSourceLimitIteratorBase for Batch Processing
The EntityImportSourceLimitIteratorBase class extends EntityImportSourceBase and gives you everything we covered above, plus a very important capability: this class was designed to work with Drupal’s batch API. As the name implies, it uses a “Limit Iterator” to process one segment of the data at a time. Extend this class and voila, your importer will run in batches. In my Harvest example, I successfully imported hundreds of thousands of records into Drupal through the browser. The batch system gives feedback about how things are going and makes sure we’re not doing too much in a single HTTP request.
Using Paginated APIs
We’ve covered the basics of which class to extend to get the results you want. In my case, I extended the EntityImportSourceLimitIteratorBase class so my importer would run in batches. There’s a small problem with that approach, though. The Harvest API, like most other APIs, is paginated. Working with pages isn’t the same as working with limited iterators.
The limited iterator approach works perfectly for CSVs. You can grab the entire list of source row ids at once and then limit processing to one segment at a time, advancing through the entire dataset in batches.
With paginated APIs, you can’t grab the entire list of IDs at once. You are limited to requesting one page at a time. I needed the batch process to advance one page at a time through the entire set of data offered by the API.
While processing paginated APIs works differently than processing limited iterators, the internal methods and properties are similar enough that the EntityImportSourceLimitIteratorBase class still worked great for solving the problem. We just needed to treat the iterator a little differently than other non-paginated sources.
Here’s the initializeIterator() method from the base class. It returns one segment of a complete array, using PHP’s built-in LimitIterator:
/**
* {@inheritdoc}
*/
public function initializeIterator() {
return new \LimitIterator(
$this->limitedIterator(), $this->getLimitOffset(), $this->getLimitCount()
);
}
And here is the Harvest API version that adds pagination, relying on the limit offset and count to request the right page:
/**
* Initialize the migrate source iterator with results from an API request.
*
* The request is paginated. We determine the current page based on the
* limit count and offset. The limit count is stored in configuration.
* The limit offset is set by the batch processor and increments for
* each batch (see \Drupal\entity_import\Form\EntityImporterBatchProcess).
*/
public function initializeIterator() {
$this->currentPage = ($this->getLimitCount() + $this->getLimitOffset())
/ $this->getLimitCount();
$results = $this->apiRequest($this->currentPage);
$this->createIterator($results);
return $this->iterator;
}
Wrapping it Up
To learn more about Entity Import and how to both configure and run Drupal migrations through the browser, check out this earlier post from Travis. Grab the latest, complete code for the Harvest API example on GitHub. We’d love to hear if (and how) you’re using Entity Import with custom source plugins. Leave us a comment or drop us a line on our contact form!