France 24 migrates to Drupal 6, codebase to be open-sourced
France 24 is a public 24/7 international news channel broadcast in three languages: French, English and Arabic. Its mission is to cover international current events from a French perspective and to convey French values throughout the world. The channel provides keys to understanding complex events through in-depth analysis. France 24 also puts culture at the forefront of its programming. France24 is part of the AEF (the "Audiovisuel Extérieur de la France" or French foreign media), along with RFI (a radio station) and TV5 (a TV station).
Launched in December, 2006, the website was originally based on a Java CMS, Magnolia. But due to stability problems, we switched to Drupal 5 in mid-2008. We have just migrated to Drupal 6 and a brand new codebase. This case study covers this migration, focusing on the technical part, and describes some of our homegrown modules to be open-sourced.
The monthy traffic of the France 24 websites is around 5 million unique visitors. A more geeky metric is that the site runs 300-400 concurrent active Apache threads at all times.
The migration scope
This was not a simple migration. Indeed, since the first migration to Drupal 5, some lessons were learned, and a few technical choices had scalability issues. So it was decided to restart the code from scratch.
Then, we wanted to add much more flexibility to frontpages and stories. The structure was quite rigid, especially in the frontpages.
Plus, we were not going to make one migration, but two migrations at the same time: France 24 and RFI. The RFI website is based on ASP.NET with an homegrown CMS. And that had to be done in 6 months.
That was an interesting object specialization paradigm applied to a full-scale website development in Drupal: can we work faster by sharing some of the code?
Developing two websites at once
Well, that works quite well. We finished the websites in time (RFI to go live in a couple of weeks), and we gained time every day by working on both websites at the same time.
Basically, we have three sets of modules, "AEF", which is shared by both sites, RFI-specific modules, and France24-specific modules. In the AEF set, we define Views, Content Type, basic templates, taxonomies, and so on. We create the big common features: menu, tabs, easy views, externodes, etc. Then we specialize them if necessary in the RFI and France24 modules. For example, we can add a field to a content type, a filter to a view, or a RFI/France24-specific process like the fetching of videos for France24 video-on-demand service, or the RFI radio editions.
As a result, more than half of the code is common, with a quarter specialized to each RFI and France24.
The project development
For this project, we were a team of 9 developers/projects leaders, and 2 sysadmins.
We worked using the Scrum methodology, with short 3-weeks sprints: 2.5 weeks of development, followed by a demo to the journalists for feedback. That way, they could easily monitor the progress of the project and provide feedback *early*.
Main concepts
Multimedia Element: One thing we learned with the first version of the site is that often new stuff is asked for which already exists hardcoded somewhere else in the website (e.g., a carousel of images for the frontpage is asked for, but that element is already available hardcoded to the story content type).
So we created the concept of the "Multimedia Element", which is a mix of videos, sounds, diaporamas, carousels, links to stories, Twitter, external links, text, quotes, etc. Everything on the website is either an story or a multimedia element that can be placed in a box anywhere in a story, on the frontpage, in a special report, etc.
CCK Formatters: The usual way to theme nodes on Drupal would be to theme the node page template and the Views item template. Unfortunately this approach means that it's not easy to reuse the themes elsewhere (e.g. in a multimedia element). So we are making heavy use of the standard CCK formatter concept: we are creating node themes in CKK formatters for each content type, and we can reuse them easily.
Contrib modules and homegrown modules
We are using quite a lot of contrib modules: 35. Among the classics, the lightweight Composite is used instead of Panels for the frontpage.
We also developed quite a lot of homegrown modules that you can preview in our Drupalcon Paris presentation. This includes:
- AEF Easy View: Views is a very powerful tool, but too complicated for journalists: can you see them creating their own view, and including the result somehow in their story?? This module is a CCK field that let you easily configure an existing view and put it in your story. You can even choose the theme of the results, and if it is going to be a carousel. For example, say you want to show all the latest stories with the tag France. Select the tag, select the number of stories you want to show, the theme, preview it, and you're done!
Also, you can reorder the results of the automatic list, making it a manual list. Or you can keep it automatic even if you have reordered results. - AEF Multimedia Element: This is the multimedia element content type as described earlier, plus the FCKEditor plugin to insert them in the body of a story. A very powerful module.
- AEF Externodes: A module allowing one Drupal installation to access another Drupal installation's nodes remotely, using Nid/Fid Address Space abstraction, and executing Views remotely. For example, if you're looking at node 100000001 on Drupal 1, you are in fact looking at node 1 on Drupal 2. This is truly powerful, allowing you to save SQL CPU time by querying nodes on a remote Drupal installation and share a set of nodes between installations. A good example we're going to use it for is image nodes. A collection of 20,000 image nodes will be put on a separate Drupal installation, which we will able to plug to other Drupal installations. And the heavy SQL full-text search on these images by journalists will be done on a database different than the main one. I strongly recommend watching the videos to see it live.
- AEF Image: A very powerful image CCK field. When you upload an image, you see it in all the different imagecache presets used in the website. And you can scale/crop each one of the presets differently using JCrop: you are overriding an automatic imagecache preset. Useful when you are cutting one's head :) Plus, you can even upload and scale/crop another image for a given preset. And finally this field supports both the direct image-upload approach or the image-as-a-node approach. See it live on the video!
- AEF Editor Toolbox: A small fixed frame where you can search stuff, manage bookmarks, history, and search your image collection.
- AEF Embedded Edit: Do everything in a single window! Creating a image, then searching it, inserting it in a multimedia element, saving it, then going to your article, searching your multimedia element and inserting it... can be quite a lengthy process. With this module, you create your image directly on the same page in an iframe, and when you save, the nodereference of the multimedia element is automatically filled with the result. And you can edit/view every node referenced from a nodereference.
- AEF Formatter Selector: Have you ever been frustrated by the fact that there is no way in Drupal to select a theme in the node edit form? And no contrib module for it? Well, this module is doing it! It let you select a theme in a list of themes under each nodereference you selected.
39 more generic "AEF" modules were also developed.
The server architecture
Our server infrastructure is basically laid out as follows:
First, the Akamai CDN, which act as a giant reverse-proxy and saves our server from 90% of hits.
Then, 4 load-balanched Apache servers, each one sharing the same webroot with a NFS mount.
Finally, a replicated MySQL database linked with the Apache server at 1Gbit/s.
Problems we encountered, lessons we learned
Problem encountered: Before the migration, on the first version of the site, we had some MySQL slow queries with a cron we made that sometimes crashed the database.Lesson learned: Choose your data model very carefully. Be very careful with the database. That's the only part that is not scalable. You can add as many apaches servers as you want, but only one database.
Problem encountered: When we were working at reducing the amount of traffic between the database and Apache we found out than 80% of the traffic was due to Lightbox2, which was unnnecessarily generating thousands of CCK formatters. These formatters definition were stored on cache tables and transfered to apache on each page load. If we hadn't found out that, the server infrastructure would have probably collapsed with 1.2 Gbit/s traffic on a 1Gbit wire between Apache and MySQL.Lesson learned: Take your average number of active Apache threads, multiply it by the SQL *data size* transferred for a page, and check your wire capacity.
Problem encountered: When we hit the "migration" button, all the Apache servers went crazy at 200M of load. After some times of investigation, we found out that they were simply swapping like hell.Lesson learned: Take your average number of active apache threads, multiply it by the average memory usage of a page, in our case 30M, and check that your Apache servers have enough RAM.
Problem encountered: Before the migration, on the first version of the site, loading a France 24 page in a browser was quite slow. I am talking here about the user experience in the browser, the total loading time you can see on your network tab on firebug, when all JS,CSS,images are loaded. We were at about 6-8s, and the user experience was not that good, which may sound weird since we had Akamai caching our files.
In fact, part of this was due to quite a large number of Javascript files, including one tracking file that was loading 9 more Javascript files. Now, on the new version, we have a *much* better loading time, 2-4s.Lesson learned: Aggregate your Javascript files!! In fact, your browser can load images and CSS concurrently, but it will load JS files *sequentially*! And don't forget to also aggregate your CSS files.
Problem encountered: As we were developing the website, the Apache 2 computing time got longer. Often, we were able to reduce dramatically the loading time by commenting a single line or two.Lesson learned: There is always room to reduce your loading time, and that's usually due to simple mistakes. Put timers in your code, display the time needed to generate parts of the page, and narrow down the part eating the most of the CPU time.
Open-sourcing
We've announced that we are going to open source this code, all 45 AEF modules. First, we need to release the RFI site and package the code, removing the bits of non-generic stuff that may remain on AEF modules.
This should be done by the end of year; meanwhile, you can see it live on this Drupalcon Paris presentation.
Conclusion
Every development team has a given level of expertise in development. And having an excellent base such as Drupal for a project allows us to increase the final quality of this project.
By contributing (soon) these new Drupal modules, we hope to help strengthen the newspaper module base, and we want to thank the Drupal community for this wonderful product.
Drupal version: Drupal 6.x