Running Large Drupal 8 CSV Migrations in Batches
A versatile solution for importing large CSV files into Drupal.
In this post we will cover...
- Point one
- Point two
- Point three
Migrating content from an existing site or an external data source can help reduce the effort required by content editors to get a new site ready for launch. As a result, constructing and executing content migrations is a common task we undertake as part of the site build process. While these migrations can vary in type, typically spreadsheets are exported in a comma separated value (CSV) format due to their simplicity.
While Drupal has robust support for migrating in from a CSV file, the current structure can struggle when presented with large CSV files. In particular, the migration import process can run out of memory part way through the migration process. We encountered this problem while migrating tens of thousands of locations for a client. Increasing the PHP memory limit for the migration was an initial step, but proved not to be enough:
- Memory usage is 1.21 GB (80% of limit 1.51 GB), reclaiming memory.
[warning]
Memory usage is now 1.21 GB (80% of limit 1.51 GB), not enough reclaimed, starting new batch
[warning]
Even though the migration module attempts to reclaim memory and start a new batch, the process does not always complete.
Some approaches to get around this issue include scripting your migration and utilizing the limit option when running a migration. However, we wanted a solution that could be more versatile and wouldn’t require custom scripting for each new migration we would write.
As a result, we wrote a custom Drush command that acts as a wrapper around the default Migrate import command. Our custom command splits a large CSV file into smaller files that can be imported in batches.
As an example, the following command may be run:
drush migrate:import:batch sample_migration --batch-size=100
When the migration is run, the CSV source file for the sample_migration is split into smaller CSV files with 100 lines each. The migration runs for each of these files. These files are temporarily stored in the private files directory and are cleaned up after the migration is finished.
Other migration operations run like normal and all of the default options may be passed in. Migration mapping hashes are maintained, so the migration may be rolled back like normal, too.
The module’s code currently exists in a Github repository which also contains more information on the module’s usage, but we plan on releasing it as a contributed module on Drupal.org in the future. Feel free to give it a try on your project and let us know how it works for you!
Fall migration copy by ashokboghani licensed under CC BY-NC 2.0.