Information.dk - Another Drupal Newspaper Site
Information is an independent daily distributed all over Denmark, Europe. It is issued six days a week with a circulation of 22.000 copies and has about 100.000 readers. Until recently the papers website, information.dk, was based on a home grown cms-solution. As of the 28th of August 2007 the website is running on Drupal. The design and CSS was done by Jens Christoffersen and the Drupal hacking was done by Johs. Wehner. The project was lead by Nikolai Thyssen, chief of new media.
Test case: luftskibet.information.dk
The website on information.dk is the result of approximately half a year of fulltime development, but before we began this work, we launched a blogsite, luftskibet.information.dk ("luftskibet" is Danish for "the airship") developed in Drupal.
We did this in to get our feet wet and get some experience with Drupal development, before diving in to the development of the main site. "Luftskibet" was developed in Drupal 4.7 and was launched on the 4th of October 2006 and is the home of our journalists blogs. One of the things we learned from our test study was the whole drupal terminology. Some of the things we did on "Luftskibet" would have been done in another fashion today, but it was a really good way to get started. Our experience with "Luftskibet", and a visit - generously arranged by Ken Rickard og Steve Yelvington - to The Savannah Morning News, convinced us that Drupal was the way to go with the main site as well.
Originally "Luftskibet" was a WordPress installation, but it was fairly easy to port to drupal thanks to the wordpress migration module. This meant that we had no idea of whether or not it would be difficult to port the main site.
The main site
The development of the main site began in February 2007.
Data Migration
The migration of our archive was initially one of our biggest concerns. Our archive dates back to 1997 and consists of more than 180.000 articles, but as it turned out, it was far easier than we expected. The articles on our old system were stored in MySQL as well, so we just made the MySQL database on our then production server accessible from our development box, traversed the article database using php, and built a node object for each article, changed the database connection and used drupal"s node_save to insert all the articles in to the database. Once we understood how the node object was structured, it was not nearly as painful as we feared.
The Daily Import
The paper edition of Information is produced in SaxoPress. When articles are ready for the press they"re exported to our webserver as xml-files (almost NITF-format) with images as jpg's. Initially our hope was to develop a NIFT-module, but because we have our own custom content type and because the xml produced by our editorial system is not NIFT standard compliant (requiring all sorts of ugly hacks) it's impossible to abstract the code enough for it to become an actual distributed module. If you're interested in the code, you can contact Johs. Wehner (http://drupal.org/user/58666)
CCK
Our articles have their own content type. We didn't design a module, we simply used CCK. The content type initially reflected the printed article precisely. But in order to make subheaders and other text formats more suitable for web use, we duplicated some of the fields, so they could be manipulated for use on the frontpage and list, while the page view maintained the original content. Besides the duplicate textfield, we also included imagefields for alternative images and images especially for the frontpage.
Taxonomy
The article content type initially only had one vocabulary reflecting the papers editorial desks. But when we began constructing the different sections of the site, we found out, that we needed more dimensions to the taxonomy in order to distribute and display content the way we wanted. So we made a "genre" vocabulary (ie "note", "review", "editorial", "cronicle" etc.) and a "subsections" vocabulary containing the papers most common topics and we made a freetag taxonomy. Just before launch we found out, that the subheaders of the articles could be used as another freetaging category, since columns often use the same subheader over time. So we changed it from a regular text field to a taxonomy.
Front Page
Our front page is made using panels and views and not least David Straus" magnificent pressflow preempt panels. Panels enables us to make a more complex layout on our frontpage. Before we found preempt panels our relatively complex frontpage took forever to load. Thanks to David Strauss for his super panels caching module. Details about the module can be found at http://drupal.org/project/pressflow_preempt_panels
The publication of top stories is done with the handy Node Queue. It’s a very simple yet very effective way of handling and prioritizing news.
Views
Our sections (ie the "Culture" section - http://information.dk/kultur) were originally complex views showing articles from one or more category excluding other categories and sorted by a date field. This led to poor performance on our sections. We didn't know what to do, so we contacted David Strauss, because we were very impressed with his work on the preempt panels module. Following his advice we created three new modules, one for each section using the node api hook to insert articles into a separate table if they met the conditions we'd established for the original view. He called it materialized views - inspired by Oracle. This works really well. Again a super effort by David Strauss.
This doesn't mean that we don't like views. We use them a lot of places - almost everywhere but the three above mentioned sections and the frontpage. Views meant that we could save a lot of development time, because non-programmers fairly easily could build pages, that otherwise would have required a lot of coding. So a great thanks to Earl Miles and others working on views. Views is a very important contribution.
Images
Because we only have the digital rights for the pictures we use in the paper for two weeks, we had to make the pictures run out. We did this in the theming. If the article is older than two weeks the picture and caption does not display.
We use the fantastic Imagecache module on almost all of our images. Except profile pictures and pictures made especially for the front page. But because we found out that the huge print-sized pictures that come from our editorial system we're too much for our hard-working webserver, we prescale them using applescript and "Image Event".
Premium
For access control we use a home hacked version of the premium module. The module works great, but because we have several types of subscriptions, we needed to hack it to give all week subscribers other privileges than those who only subscribe in the weekend.
Sphinx
One of the our biggest challenges was search. Using the drupal built-in search to index 180.000 articles turned out to be impossible. With our amounts of data, the core search module just doesn't cut it. So after some weeks of despair, another Danish newspaper doing a Drupal project pointed towards Sphinx (http://www.sphinxsearch.com/). Sphinx is a standalone full-text search engine and a MySQL storage engine (SphinxSE). In order to make sphinx works as a storage engine it must be compiled into mysql. We tried this, but did not succeed. Luckily our hosting company could do it. Where it took nearly a month to index our database using core search, sphinx does it in a couple of minutes. It's very fast, both the indexing and the search.
A great thanks goes to Andrew Aksyonoff, father of the Sphinx project. It's really a fantastic project!
Backend
We use the Garland theme for our backend. Besides the built-in administration pages we"ve made a view with some actions from the Actions module. We use this to display all of one days articles on one page. We"ve also made a small block view displaying all unpublished content. This gives the people tagging and "enhancing" our articles an overview.
Other Modules
Inspired by The New York Observer, we use the Related Links module. The blockcache module helps us performance wise on almost all pages except for the front page.
To make se-friendly and short copy-paste-able urls we use the fabulous Pathauto.
We also use:
Actions
Auto assign role
Captcha
Find URL Alias
Flag content
Global redirect
Printer friendly pages
Simplenews
Suggested terms
Tagadelic
XML Sitemap
We created approximately 15 custom modules to handle import, users (we use email as user names, so we also needed a screen name), user pages (both subscribers and reporters a regular Drupal-users, so we needed some tweaking), subscriber-privileges, web tracker, search, integration with blogging site etc.
Hardware (Geek alert!!)
In case any of you should care, our site is running on a CentOS powered HP dl140g3 with a 1,6 GHz intel quadcore cpu, 5 gb of ram and two 73 Gb SAS 15 K discs (raid1).
Now, switching to Jens, our design and CSS guy:
Designing information.dk
Quite a lot of attention was given to the design of the new site. We quickly realized that we had to go beyond modification of existing templates to get where we wanted, which were to give the site a genuine newspaper look and feel. Especially on the frontpage it was important to break away from the basic, blog-like format of listing stories newest on top (the "river of news" format as Dave Winer would call it).
Frontpage layout
We had several ideas early on about how to organize stories on the front page, and made several prototypes with more or less elaborate layouts. A lot of these ideas is used in the present design: dividing the front page into three main areas on top of each other, each area with a different layout principle; how to organize top stories (reflecting the importance of the story), etc.
One very important aspect, however, was the decision to use a grid-based layout. A standard reference for grid-based layout is NYTimes.com, but the relaunch of Timesonline.co.uk and recently Guardian.co.uk are other notable examples.
The base layout used for information.dk is a rather simple six-column grid. Each column is 140px wide (+ 20px spacing between columns), allowing all kinds of combinations. At the moment we"re using three different widths for content (including ads) on the frontpage: 140px (one unit), 300px (two units + spacing) and 460px (three units + spacing). The main column for the section and article template is 620px (four units + spacing).
The frontpage comprises, as mentioned above, of three main areas. On top is where today"s top stories go (surprise!) and this is also the all-important "free-area" of the site where we display some of the most dynamic parts of the site (blog updates, recent comments etc.). The middle section is basically two lists of stories for subscribers only. The idea here is to provide an easy overview over today"s content. The bottom section is more of a "slow pace" section, with a "best of" - collection of articles. Different organizing principles kept together on the same grid.
The decision to use a tight grid-based layout came rather late in the design proces. Luckily, most of the work we'd done could be adjusted without much hassle. One of the main advantages of using a grid is the modularity it offers. Content can easily be moved from one place to another, and we"ve found that this flexibility works extremely well with the logic of views, blocks and node queues in Drupal. When it comes to the daily maintainance (editorial, not the technical) and the continued development of the site, design and content-wise, a unit-based layout makes things easier to work with.
Layout continued: Sections and articles
The complexity on the frontpage layout is contrasted by the relative simplicity on the sections pages and article pages. The article template was designed to enhance readability. This was an important to us, since the newspaper is known for it's lengthy in-depth articles. We spend a lot of time getting the right balance of fonts, sizes, line-heights, as well as the placement of images, external links etc.
Designing a one-size-fit-all template for articles is always tricky, and compromises have been made. We will continue to refine the template, and I hope we will be able soon to add more flexibility, e.g. with regards to image presentation.
Sections were pretty straightforward (lists with paging on the bottom of the page), but required some attention to how the article is presented as a list element. We use auto-cropped quad-thumbs (when image is avilable), which give surprisingly good results. The quad format is easy to work with, and adds a consistent look to the page, something that would otherwise be difficult should we integrate landscape and portrait thumbs in the design. It was originally Flickr's (and other photo-sharing sites) succesful use of quad-thumbs that pointed us in this direction, and I would estimate that more than 80% of these auto-croppings turn out completely usable. As for the remaining less-than-elegantly cropped thumbs, well, it would be nice with a simple built-in editing function.
Finally: A comment on CSS development in Drupal
Several designers and CSS coders have been complaining about the way Drupal generates code. A simple thing such as adding a list as a block, and Drupal spits out DIV classes enough to confuse anyone. How to get rid of this?
From my point of view there are two problems with the Drupal style of outputted code. One is elegance: a lot of frontend-CSS guys like to keep things fairly simple and clean, and Drupal"s the-more-the-merrier approach can cause a lot of frustration. Even more annoying (oh yes, I'm still bitter :-) is that Drupal in some ways forces you to give up a lot of control. You cannot freely decide names of IDs and classes, you cannot always control precisely how you want you classes and sub-classes to be organized and, to some extend, this affects the hierarchical order you might have planned. All in all, this relative loss of control over the code takes some time to get used to, especially when you go from prototyping to production. On a more practical level, because the outputted number of DIV ids and classes often are staggering, it becomes harder to use hierarchies properly when you apply styles to elements. This can lead to loss of overview, which of course is very frustrating, increases errors etc. It also makes things such as cross-browser compliancy more time-consuming.
At first we tried to address this problem with programming. If the Drupal code were either "too rich" or "too ugly" to work with, we would simplify the code. This, of course, did not work very well for very long. Not only did it take too much time to work this way, we would also end up with a system difficult to upgrade.
The only real sollution to the problem was simply to learn and to live with the Drupal style. It may not always be pretty, but if you spend enough time theming, you"ll eventually understand how to work with the code.
Switching back to Johs:
Current Challenges
After we've handled the initial small bugs and stupidities, we're just about ready to take on new challenges.
One of the first things we need refine is our search. We need a more advanced search function. Right now search doesn't care about parenthesis. That would be nice to have and maybe even functionality à la facetted search.
Another thing would be providing some sort of day view, so you could see all the articles from a particular day on one list. The articles are marked with a publishing date (using the date module), but today we don't really have "editions". This we need to get into.
Finally our blog site, "luftskibet" is in a separate installation. We would like to - both graphically and not least user wise - integrate this installation with our main site using multisite functionality.
Drupal version: Drupal 5.x