Genomeweb - a science publisher
Genomeweb, a print and online publisher for the molecular biology research community, has successfully migrated its web site to Drupal. Cyrve built the Drupal site and migrated data into it from a legacy Microsoft SQL Server application. Drupal craftsmen Moshe Weitzman and Mike Ryan developed the site, and Maureen Lyons authored the theme.
- Data Migration
- Premium Content
- Email Domain Authorization
- Email Newsletter Integration
- Member Messages
- Challenges
- GIVEBACK
Data Migration
The Genomeweb data migration was the coming of age of the migrate and table wizard (TW) modules. Cyrve has contributed these modules on drupal.org for your migration pleasure. The methodology goes like this:
- Get your legacy data into comma separated file format (CSV) or mysql tables.
- Run table wizard analysis on these tables. TW creates a default view for each table, including every column. Thats right - instant Views integration.
- Review each column with the client and annotate using the textboxes in TW admin. At this point, there is common understanding about what each legacy column does.
- Each set of data to be migrated needs to be represented by a view. The default view TW created for one of your tables can be used, but you can also create custom views to filter the data, join multiple source tables, etc.
- In the migrate module, create content sets based on these views for each distinct destination step. For example, different content types will usually be migrated using different content sets. Similarly, taxonomy term migrations, comment migrations, and user migrations will be distinct content sets from nodes.
- Use migrate web UI to map legacy columns to Drupal properties such as old.date => node.created. Columns which don't map cleanly are mapped using a migrate module prepare hook.
- Run and re-run migrations for each content set. Migrate can run a full migration or subsets like “Next 20 story nodes from 2008” or “Records, 20, 22, 29 from User content set”. Migrate provides a quick and accurate way to rollback migrations so you can tweak the code and re-run. Migrations may be run via drush or via the web.
- Migrate provides a dashboard in drush and on the web so you can monitor progress.
Planning the Genomeweb migration was complex as content, users, terms, comments, and Ubercart transaction records all had to be created in Drupal. In the end, the migration went quite smoothly on launch day.
Premium Content
All stories with a lock icon beside the title are protected by a premium membership requirement. When an unauthorized user follows such a link, she sees a teaser and then an offer for a trial membership.
Cyrve chose to implement this requirement without using the node access control API. That API is perfect for cases when unauthorized should not ever see premium content titles or teasers. In our case, we borrowed ideas from premium module which uses the nodeapi(‘view’) hook to replace the full body of the premium node with custom “upgrade” message.
Customers and companies may actually buy access to each newsletter on the site so the access control calculation considers the newsletter for a given piece of content and the user's roles. Users buy these memberships using an Ubercart powered store and are assigned role(s) at the end of the transaction. These roles expire after 3 months or 12 months, depending on the offer that was purchased.
Email Domain Authorization
Genomeweb supports an alternate way to gain access to premium content. Genomeweb wants to give away free access to all members of academic institutions. So, users whose email addresses end in .edu are granted premium access. In addition, companies may purchase a company wide license for all their staff. Thus, domains like foo.com or bar.net can be manually added to a ‘premium domains’ table once their payment arrives.
To support this requirement, Cyrve uses Email registration and Email change confirmation modules. Email registration lets users login using an email address instead of a username and Email change conformation requires that users click on a link in a verification email when they choose to change their email address.
Email Newsletter Integration
Genomeweb maintains 14 very popular email newsletters. Genomeweb Daily News boasts 30,000 subscribers and publishes twice a day. Mass email delivery is a specialized expertise and is technically ill suited to PHP and Drupal. After a vendor search, Genomeweb chose Lyris as its email partner. Cyrve then built full integration between www.genomeweb.com and the Lyris API. Editors author their content in Drupal and never copy/paste content into Lyris. Instead, they use a custom Drupal form to schedule newsletters and customize their contents.
Similarly, all email subscribe/unsubscribe activity is handled in Drupal. This way, Drupal has complete knowledge of the payment status for any given user or domain. Drupal periodically POSTS to Lyris a full, up to date email subscriber list.
Member Messages
In the upper right hand side of each page, Genomeweb can broadcast small messages to its users. Cyrve built an audience targetting application to meet this requirement. Examples of such messages are ‘Your BioInform subscription is about to expire’ or ‘Get ProteoMonitor headlines delivered to your Inbox’. These messages are content specific (e.g. ProteoMonitor message appears on a Proteo story) and user specific (e.g. expiration date is less than 30 days away). Further, messages have an interval during which they may not be repeated. This prevents barraging the user with excessive messages. Any given message can be permanently dismissed; thats useful if a user never intends to subscribe via email (for example). Messages may also be targeted at particular email domains and may have custom expiration dates.
Challenges
Genomeweb’s site features lots of dense information blocks. The Drupal 6 block caching feature is instrumental to serving up fast content to authenticated users. The default caching strategy of BLOCK_PER_ROLE
is perfect for this site, as lock icons are role specific depending users access to premium content. Given this heavy reliance on block cache, the site does wobble a bit when all caches are cleared. Custom and preemptive cache clearing is a real need here and Cyrve intends to work on this during the Drupal 8 development cycle.
menu_rebuild() is another performance problem for most Drupal sites, including genomeweb.com. Lets all help get this locking issue committed to Drupal 6.
GIVEBACK
Genomeweb and Cyrve are committed to GIVEBACK to the Drupal community. We have received so much, and want to keep the flow of contributions growing. As such, we contributed the following during the project:
- Table Wizard and Migrate modules. The Cyrve data migration methodology is now open for all to use. Lets see more great sites moving to Drupal.
- Auditfiles patch. Audit your files table and references to those files in your content.
- Minor feature patch for Reroute email module.
- drush
sql load
command, which is deprecated by the upcomingsql sync
command. Easily copy databases from one drupal environment to another. - Block page visibility module. Port to Drupal 6.
- Email change confirmation module. Use user_save()
- Primary term module. Port to Drupal 6.
- This drupal.org post which informs site builders and prospective Drupal users about real world Drupal successes.
Front page news: Drupal News