Status of RDF in Drupal (November 09) and wrap up of ISWC2009
I had the pleasure to give a presentation of the paper "Produce and Consume Linked Data with Drupal!" at ISWC2009 last, and I was very honored we won the Best Semantic Web in Use Paper award! The 30 minutes of presentation + Q/A passed very quickly and I didn't have much time to expand on the status of RDF in Drupal 7 vs. Drupal 6 after describing the inner workings of the modules we developed. I'm sure this will also interest some people outside the attendees. First of all, the current stable version of Drupal is Drupal 6 (the latest version at the time of this writing being Drupal 6.14). This is the version on which we started to implement the contributed modules presented at ISWC2009, namely RDF CCK, RDF external vocabulary importer (Evoc), SPARQL Endpoint and RDF SPARQL Proxy. Contributed modules means they do not get included in the core Drupal package, but people can download them from drupal.org for free and drop them on their server so Drupal core can be extended. These 4 modules work pretty well on Drupal 6, you can get RDF export in RDF/XML, N-Triples, turtle, json. However generating RDFa is not very easy as it requires to patch the CCK on which we rely to generate the content pages and store the various field data. We made sure this would not be a problem in the next version of Drupal (Drupal 7) which is still under development, and due to be released sometime next year. While we were at it, we also worked on porting one of the functionality present in the RDF CCK and Evoc module to Drupal 7 core: the ability to map the data structure to RDF and expose this in RDFa. This means that, by default and without requiring any knowledge about RDF from their administrator, Drupal 7 sites will expose the following elements as RDFa: title, date, author, content, comments, terms, users, etc. Of course, only publicly available data will be available as RDFa, whatever is private (like user emails addresses) will remain private. This will be part of Drupal 7 core. Needless to say that the rest of the functionalities offered by the set of already existing RDF contributed modules for Drupal 6 will also be available for Drupal 7 once these modules have been ported. We're starting to port these to Drupal 7 next Sunday, as part of the #D7CX Contrib upgrade code sprint in Boston. If you plan to use RDF in your next site, and can wait until Drupal 7 is released, I'd strongly encourage you to start looking at the new Drupal APIs and functionalities. Some RDF features which were not addressed in Drupal 6 will be much easier to achieve in Drupal 7. Try the latest development snapshot of Drupal 7 and report any bug you encounter.
Now, here are some of the highlights of ISWC2009. I found Thursday to be the most interesting day, maybe because I was done with my presentation... It started with Nova Spivack's Keynote, with a retrospective on the beginnings of Radar Networks (the company behind Twine), and a look on the future of Semantic Web search at the Web scale. With Twine 2.0, Nova really wants to be part of this revolution. I found the demo of Twine 2.0 to be very similar to VisiNav, with a better polish on the UI. Worth noting is site mapping tool, allowing you to tell Twine what each piece of data on your site means.
Later, Andreas Langegger talked about XLWrap, allowing transforming spreadsheets into RDF. It supports many spreadsheet formats and cross tables references. You need to be a bit familiar with the TriG format, or follow the examples on the project site. I like these practical tools for converting existing data into RDF! It's already being used by Richard Cyganiak for maintaining the Semantic Web Dogfood website. Next, Bernard Schandl goes beyond XLWrap and explains how to do things (function) with this RDF data. Data analysis operations, for example return all the names that the resource of a given cell knows. Tripcel is an IU for manipulating and editing RDFunctions. It looks in fact like a RDF data browser shown in a tabular fashion, where several resources can be grouped into a cell. Bernard admits it still requires some RDF skills to be usable, but future works include a better UI for tackling this. I can see the potential of this project in efforts like the Semantic Desktop initiative for example.
The first pedantic web meetup was held during lunch. A dozen of people gathered around a table and we talked about the best practices in publishing RDF data, and what workflow we should adopt when it comes to notifying people of their mistakes. We want to have a gentle approach as opposed to threaten publishers or boycotting broken data: we don't want people to stop using RDF, instead we want to guide them. Visit http://pedantic-web.org/ to read more about the group and join the pedantic-web mailing list!
Evan Sandhaus from the New York Times presented the work they did in bringing their news data online as Linked Data. Evan showed the Times tag API and how you can query their huge archive and ask for article about specific topics. In the middle of the session, Evan announced the release of the first 5000 NYTimes tags as Linked Data. They each have a unique id for all resources and redirect variants of human readable subject heading to it. Needless to say that with so many RDF geeks in the room, it only took a few minutes for the Linked Data folks to identify some minor issues with the content negotiation and licensing but no doubt that these will be fixed by the time this post goes live!
ISWC was a great place for socializing and meeting new people like George Thomas, Fabien Gandon, Ivan Herman, Juan Sequeda, Jamie Taylor, and also catching up with my former colleagues: Axel Polleres (ex-supervisor), Richard Cyganiak, Andreas Harth, Aidan Hogan, Sheila Kinsella, John Breslin and more.
Photo credit: Kasei
Tags: