API-First Drupal: file uploads!
Drupal 8’s REST API has been maturing steadily since the Drupal 8.0.0 was released in November 2015. One of the big missing features has been file upload support. As of April 3 2018, Drupal 8.6 will support it, when it ships in September 2018! See the change record for the practical consequences: https://www.drupal.org/node/2941420.
It doesn’t make sense for me to repeat what is already written in that change record: that already has both a tl;dr
and a practical example.
What I’m going to do instead, is give you a high-level overview of what it took to get to this point: why it took so long, which considerations went into it, why this particular approach was chosen. You could read the entire issue (#1927648), but … it’s one of the longest issues in Drupal history, at 572 comments1. You would probably need at least an entire workday to read it all! It’s also one of the longest commit messages ever, thanks to the many, many people who shaped it over the years:
Issue #1927648 by damiankloip, Wim Leers, marthinal, tedbow, Arla, alexpott, juampynr, garphy, bc, ibustos, eiriksm, larowlan, dawehner, gcardinal, vivekvpandya, kylebrowning, Sam152, neclimdul, pnagornyak, drnikki, gaurav.goyal, queenvictoria, kim.pepper, Berdir, clemens.tolboom, blainelang, moshe weitzman, linclark, webchick, Dave Reid, dabito, skyredwang, klausi, dagmar, gabesullice, pwolanin, amateescu, slashrsm, andypost, catch, aheimlich: Allow creation of file entities from binary data via REST requests
Thanks to all of you in that commit message!
I hope it can serve as a reference not just for people interested in Drupal, but also for people outside the Drupal community: there is no One Best Practice Way to handle file uploads for RESTful APIs. There is a surprising spectrum of approaches2. Some even avoid this problem space even entirely, by only allowing to “upload” files by sending a publicly accessible URL to the file. Read on if you’re interested. Otherwise, go and give it a try!
Design rationale
General:
- Request with
Content-Type: application/octet-stream
aka “raw binary” as its body, because base64-encoded means 33% more bytes, implying both slower uploads and more memory consumption. Uploading videos (often hundreds of megabytes or even gigabytes) is not really feasible with base64 encoding. - Request header
Content-Disposition: file; filename="cat.jpg"
to name the uploaded file. See the Mozilla docs. This also implies you can only upload one file per request. But of course, a client can issue multiple file upload requests in parallel, to achieve concurrent/batch uploading. - The two points above mean we reuse as much as possible from existing HTTP infrastructure.
- Of course it does not make sense to have a
Content-Type: application/octet-stream
as the response. Usually, the response is of the same MIME type as the request. File uploads are the sensible exception. - This is meant for the raw file upload only; any metadata (for example: source or licensing) cannot be associated in this request: all you can provide is the name and the data for the file. To associate metadata, a second request to “upgrade” the raw file into something richer would be necessary. The performance benefit mentioned above more than makes up for the RTT of a second request in almost all cases.
PHP-specific:
php://input
because otherwise limited by the PHP memory limit.
Drupal-specific:
- In the case of Drupal, we know that it always represents files as
File
entities. They don’t contain metadata (fields), at least not with just Drupal core; it’s the file fields (@FieldType=file
or@FieldType=image
) that contain the metadata (because the same image may need different captions depending on its use, for example). - When a file is uploaded for a field on a bundle on an entity type, a
File
entity is created withstatus=false
. The response contains the serializedFile
entity. - You then need a second request to make the referencing entity “use” the
File
entity, which will cause theFile
entity to getstatus=true
. - Validation: Drupal core only has the infrastructure in place to use files in the context of an entity type/bundle’s file field (or derivatives thereof, such as image fields). This is why files can only be uploaded by specifying an entity type ID, bundle ID and field name: that’s the level where we have settings and validation logic in place. While not ideal, it’s pragmatic: first allowing generic file uploads would be a big undertaking and somewhat of a security nightmare.
- Access control is similar: you need
create
access for the referencing entity type and fieldedit
access for the file field.
Result
If we combine all these choices, then we end up with a new file_upload
@RestResource
plugin, which enables clients to upload a file:
- by
POST
ing the file’s contents - to the path
/file/upload/{entity_type_id}/{bundle}/{field_name}
, which means that we’re uploading a file to be used by the file field of the specified entity type+bundle, and the settings/constraints of that field will be respected. - … don’t forget to include a
?_format
URL query argument, this determines what format the response will be in - sending file data as a
application/octet-stream
binary data stream, that means with aContent-Type: application/octet-stream
request header. (This allows uploads of an arbitrary size, including uploads larger than the PHP memory limit.) - and finally, naming the file using the
Content-Disposition: file; filename="filename.jpg"
header - the five preceding steps result in a successfully uploaded file with
status=false
— all that remains is to perform a second request to actually start using the file in the referencing entity!
Four years in the making — summarizing 572 comments
From February 2013 until the end of March 2017, issue #1927648 mostly … lingered. On April 3 of 2017, damiankloip posted an initial patch for an approach he’d been working on for a while, thanks to Acquia (my employer) sponsoring his time. Exactly one year later his work is committed to Drupal core. Shaped by the input of dozens of people! Just look at that commit message!
Want to actually read a summary of those 572 comments? I got you covered!
-
It currently is the fifth longest Drupal core issue of all time! The first page, with ~300 comments, is >1 MB of HTML. ↩︎
-
Examples: Contentful, Twitter, Dropbox and others. ↩︎
- API
- Acquia
- Drupal