Re-Indexing your content to Solr, the fast way ...
There are rare occasions when you want to re-index all your site's content in Solr. Such occasions include:
- Major Drupal version upgrade (e.g. from Drupal 6.x to Drupal 7.x).
- Changing your Solr schema to include more search criteria.
- Upgrading your Solr server to a new major version.
- Moving your Solr server from an old server to a new one.
The usual way of doing this re-indexing is to make cron run more frequently. However, if you do that, there is a risk of cron being blocked because of other long running cron tasks. Moreover, you are usually limited to a few hundred items per cron run, and then you have to wait until the next iteration of cron running.
The indispensable swiss army knife of Drupal, Drush, has hook for Solr. Therefore, for someone like me who does almost everything from the command line, using drush was the natural fit for this task.
To do this, in one terminal, I enter this command:
while true<br>do<br> drush @live --verbose solr-index<br> sleep 5<br>done
This command runs the indexing in a loop, and is not dependent on cron. As soon as the 100 items (or whatever limit you have in your settings) is done, another batch is sent.
In another terminal, you would monitor the progress as follows:
while true<br>do <br> drush @live solr-get-last-indexed<br> sleep 30<br>done
Once you see that the number of items in the second terminal to stop increasing, you check the first terminal for any errors. Usually, it means that indexing is complete. However, if there are errors, they may be due to bad content in nodes, which needs to be fixed (e.g. bad fields) or unpublished as the case may be.
Doing this reindexing on a Drupal 7.x site sending content to a Solr 4.x server, took from 11 pm to 1 pm the next day (14 hours), for 211,900 nodes. There was an overnight network disconnect for the terminals, and it was restarted in the morning, so the actual time is actually less.
In all cases, this is much faster than indexing a few hundred items every 5 minutes via cron. That would have taken several days to complete.