Migrating XML in Drupal 8

Migrating XML in Drupal 8
brandt
Thu, 10/20/2016 - 17:14
Kelsey Bentham
Oct 21, 2016

Migrate in Drupal 8 is a flexible and powerful tool - you just need to know where to look.

In this post we will cover...

Some findings from our first D8 projects
How to use the Migrate Plus XML data process plugin
A note on prefixed namespaces

Stay connected with the latest news on web strategy, design, and development.

Drupal 8 is here which means I have had the privilege of working on my first D8 projects and the migrations that accompany them. I wanted to share some of the key findings I’ve taken away from the experience.

Migrate in Drupal 8 is awesome as long as you know what you are looking at. It is flexible, powerful and relatively easy to read. But as is the case with most things, a lot of its power is tucked away where it is hard to find if you don't know where to look. This is definitely the case with Migrate Plus XML data process plugin which is presently available only in the dev version of Migrate Plus. It is a pretty solid tool for migrating from a variety of XML based sources and today we are going to talk about how to use it.

The first thing we have to consider is where our data is coming from. Migrate plus expects to have this information fed to it in the form of a url which gives us two options:

our source is from outside the website, like an rss feed; or
it is stored locally.

If you have an external url, all you need to do is plug it into the url’s parameter. If your source is stored locally, you will either need to construct a url for the source or store it in the private file directory, using the private:// stream wrapper. I would go for the latter as it involves less overhead. At this point your migration source should look something like this:

source:        plugin: url        data_fetcher_plugin: http        data_parser_plugin: xml       urls: private://migration.xml

This brings us to parsing out the XML. All of the selectors we will be talking about are using xpath. The first thing you need to do is define the item selector so migrate can identify the individual items to migrate into your choose destination. For example, if we were migrating posts from a WordPress export it might look something like this:

item_selector: /rss/channel/item[wp:post_type="post"]

Next up we need to map all of our fields to nice, readable machine names that we can use in the process part of the migration. Each field will have a name that will identify it in other parts of the migration, a label for describing what sort of data we will find in that XML element, and a selector so the migration can map that data from the xml file:

fields:    -     name: title     label: Content title     selector: title    -     name: post_id     label: Unique content ID     selector: wp:post_id    -     name: content     label: Body of the content     selector: content:encoded    -     name: post_tag     label: Tags assigned to the content item     selector: 'category[@domain="post_tag"]/@nicename'

If you are using anything more complicated than the XML node names, you will need to wrap the selector as a string. The selectors are being passed to xpath in the data processor, so you can get pretty precise in selecting XML nodes.

All that is left to do is define the migration id and you have your source all ready to go:

ids:    post_id:      type: integer

Put it all together and you should have something that looks something like this:

source:        plugin: url        data_fetcher_plugin: http        data_parser_plugin: xml       urls: private://migration.xml    item_selector: /rss/channel/item[wp:post_type="post"]    fields:       -        name: title        label: Content title        selector: title       -        name: post_id        label: Unique content ID        selector: wp:post_id       -        name: content        label: Body of the content        selector: content:encoded       -        name: post_tag        label: Tags assigned to the content item        selector: 'category[@domain="post_tag"]/@nicename'       ids:          post_id:            type: integer

A note on prefixed namespaces: you can see we mixed XML nodes that have prefixes with those that don’t. Sometimes Migrate handles this with no problem at all; sometimes it refuses to fetch data from XML nodes that don’t have prefixes. As far as I can tell, it does this when one of the nodes in the item_selector has a prefix (although it doesn’t seem to have this problem with the filters in the item_selector). If you should have a datasource with a parent prefixed node, you can still get non-prefixed children by using the following syntax:

name: description label: Content description selector: '*[local-name()="description"]'

It will allow you to select XML nodes with a given local name regardless of the prefix, which is very handy when you have no prefix at all.

Stay connected with the latest news on web strategy, design, and development.

Original Article:

Migrating XML in Drupal 8