Block Google from Drupal 7 node types on the cheap
In these post Google panda/penguin days it is important that you get your website's crawl profile right and make best uses of your crawl budget. It probably doesn't matter on small sites, certainly not this one but on large ones with millions of pages it does. If Google is crawling useless pages, it could be missing important ones and at the same time this will weaken your site's overall ranking and visibility in Google.
On this site, I have a 'link' node type like this one. How do I stop google crawling those types of pages without installing a module? It's easy.
1. I use Pathauto already and just changed the path alias for my link node type to add a directory name like this:
2. Through the interface, I deleted my aliases and then regenerated them. Use caution here, I don't know how well Drupal handles very large numbers of aliases, perhaps doing it directly in the DB might be safer.
3. Add an entry in your robots.txt to block that directory:
Disallow<span style="color: #339933;">:</span> <span style="color: #339933;">/</span><a href="http://www.php.net/link"><span style="color: #990000;">link</span></a><span style="color: #339933;">/</span>
This will now remove those nodes from Google and stop them being crawled.
Edit: I have /node/ blocked in my robots.txt and canonicals set up to point to the url alias, but rooby's suggestion to have redirection from node/* to the url alias with redirect module or something similar is ideal.
My words of warning then. Update url aliases with great caution, especially on commercial or heavily indexed sites unless you know exactly what you doing. If you're setting up a new site, this is a harmless strategy. If you are changing urls and care about search engines, have a redirection strategy in place.
Blog tags:
Link tags: