Better control of multilingual urls in Drupal
It wasn't long after launching this site that I noticed a problem with Google in that it was indexing German versions of pages for which:
a) there were no translations
b) I didn't intend to be translated
On this site - so far - only the Home, About and Contact pages were intended to be translatable, the blog and link section were not. The solution was actually pretty easy.
Stop Google indexing non-English language sections
I'm generating simple path aliases for link and taxonomy term pages which makes controlling those urls easier, so the first step was to edit the robots.txt file to stop search engines indexing those pages. I added these directives:
Disallow: /node/
Disallow: /de/blog/
Disallow: /de/node/
Disallow: /de/tagadelic/chunk/
Disallow: /de/tags/
I added /node since all the pages I want Google to index have nice url aliases. The other urls I don't care about.
Stop language specific links appearing
This didn't fix the internal page linking though since Drupal was still prefixing all German pages with:
de/
Therefore Google would still find those pages. I wanted to find a Drupal solution rather than fix with with apache and so I added two simple functions to my 'klaus' module for this site:
/**
* Rewrite incoming links to untranslated sections to the en version
*
* Note: Normally you would do this with an apache redirect
*
* @param $path
* @param $original_path
* @param $path_language
*/
function klaus_url_inbound_alter(&$path, $original_path, $path_language) {
global $language ;
global $base_url;
$language_default = language_default();
$location = '';
// No need to check further if the user is using the default site language.
if($language->language == $language_default->language) return;
if(strpos($path, 'blog') !== FALSE) {
$location = $base_url.'/blog';
} elseif(preg_match('/(tagadelic|taxonomy)/', $path)) {
$location = $base_url.'/tagadelic';
}
// Redirect user to a default language page
if($location) drupal_goto($location, array(), 301);
}
/**
* Rewrite links to the blog and tags sections which aren't translated to
* other languages to English versions. English is the site default.
*
* @param $path
* @param $options
* @param $original_path
*/
function klaus_url_outbound_alter(&$path, &$options, $original_path) {
global $language;
static $language_default = NULL;
if(is_null($language_default)) $language_default = language_default();
// No need to do anything if the user is using the default site language.
if($language->language == $language_default->language) return;
if(preg_match('/(tagadelic|tags|blog)/', $path)) {
$options['language'] = $language_default;
}
}
You can see the code working on this site; switch to a German page then try to get to the blog or taxonomy pages, notice you don't get German URLs for those pages and if you do enter one manually, you'll be redirected to an English page.
Regular Drupal modules to do this
Check out the Translation redirect module, part of the i18n package which takes care of redirecting back to the language default of a piece of content, if a language specific piece of content is missing.
Blog tags:
Link tags: