Migrating XML in Drupal 8
Migrating XML in Drupal 8
brandt
Thu, 10/20/2016 - 17:14
Kelsey Bentham
Oct 21, 2016
Migrate in Drupal 8 is a flexible and powerful tool - you just need to know where to look.
In this post we will cover...
-
Some findings from our first D8 projects
-
How to use the Migrate Plus XML data process plugin
- A note on prefixed namespaces
Stay connected with the latest news on web strategy, design, and development.
Drupal 8 is here which means I have had the privilege of working on my first D8 projects and the migrations that accompany them. I wanted to share some of the key findings I’ve taken away from the experience.
Migrate in Drupal 8 is awesome as long as you know what you are looking at. It is flexible, powerful and relatively easy to read. But as is the case with most things, a lot of its power is tucked away where it is hard to find if you don't know where to look. This is definitely the case with Migrate Plus XML data process plugin which is presently available only in the dev version of Migrate Plus. It is a pretty solid tool for migrating from a variety of XML based sources and today we are going to talk about how to use it.
The first thing we have to consider is where our data is coming from. Migrate plus expects to have this information fed to it in the form of a url which gives us two options:
- our source is from outside the website, like an rss feed; or
- it is stored locally.
If you have an external url, all you need to do is plug it into the url’s parameter. If your source is stored locally, you will either need to construct a url for the source or store it in the private file directory, using the private:// stream wrapper. I would go for the latter as it involves less overhead. At this point your migration source should look something like this:
source: <br> plugin: url <br> data_fetcher_plugin: http <br> data_parser_plugin: xml <br> urls: private://migration.xml
This brings us to parsing out the XML. All of the selectors we will be talking about are using xpath. The first thing you need to do is define the item selector so migrate can identify the individual items to migrate into your choose destination. For example, if we were migrating posts from a WordPress export it might look something like this:
item_selector: /rss/channel/item[wp:post_type="post"]
Next up we need to map all of our fields to nice, readable machine names that we can use in the process part of the migration. Each field will have a name that will identify it in other parts of the migration, a label for describing what sort of data we will find in that XML element, and a selector so the migration can map that data from the xml file:
fields: <br> - <br> name: title <br> label: Content title <br> selector: title <br> - <br> name: post_id <br> label: Unique content ID <br> selector: wp:post_id <br> - <br> name: content <br> label: Body of the content <br> selector: content:encoded <br> - <br> name: post_tag <br> label: Tags assigned to the content item <br> selector: 'category[@domain="post_tag"]/@nicename'
If you are using anything more complicated than the XML node names, you will need to wrap the selector as a string. The selectors are being passed to xpath in the data processor, so you can get pretty precise in selecting XML nodes.
All that is left to do is define the migration id and you have your source all ready to go:
ids: <br> post_id: <br> type: integer
Put it all together and you should have something that looks something like this:
source: <br> plugin: url <br> data_fetcher_plugin: http <br> data_parser_plugin: xml <br> urls: private://migration.xml <br> item_selector: /rss/channel/item[wp:post_type="post"] <br> fields: <br> - <br> name: title <br> label: Content title <br> selector: title <br> - <br> name: post_id <br> label: Unique content ID <br> selector: wp:post_id <br> - <br> name: content <br> label: Body of the content <br> selector: content:encoded <br> - <br> name: post_tag <br> label: Tags assigned to the content item <br> selector: 'category[@domain="post_tag"]/@nicename' <br> ids: <br> post_id: <br> type: integer
A note on prefixed namespaces: you can see we mixed XML nodes that have prefixes with those that don’t. Sometimes Migrate handles this with no problem at all; sometimes it refuses to fetch data from XML nodes that don’t have prefixes. As far as I can tell, it does this when one of the nodes in the item_selector has a prefix (although it doesn’t seem to have this problem with the filters in the item_selector). If you should have a datasource with a parent prefixed node, you can still get non-prefixed children by using the following syntax:
name: description <br>label: Content description <br>selector: '*[local-name()="description"]'
It will allow you to select XML nodes with a given local name regardless of the prefix, which is very handy when you have no prefix at all.
Stay connected with the latest news on web strategy, design, and development.