Get search results for compound words not in content with Drupal, Search API and Solr
It is possible to expand compound search terms to multi-term synonyms. That is, if your Drupal site content contains text such as "dark room" or "key note", and you don't want your users to get No results pages on searches for "darkroom" or "keynote" (respectively), you'll need to do a bit of extra work to make this happen.
Let's assume we've got a Drupal 7 site working alongside Solr to provide the advanced back-end search functionality, and the Search API plus Search API Solr Search modules to integrate the two systems. At the time of this writing, this is a widely used best-practice approach. However, it doesn't natively support the above use case.
Some potential options for setting this up include spellchecking and fuzzy searching. But Solr itself already supports the use of synonyms even though the Search API does not. So let's tweak Search API's set-up to work with it.
There are several steps required to make this happen.
- If you're got the tokenizer enabled on your search index, disable it by unchecking the box over at Administration » Configuration » Search and metadata » Search API » Your index name » Filters » Processors » Tokenizer, and then save the configuration. If the Tokenizer option is enabled, it will prevent the synonym functionality from kicking in.
- Modify the Solr configuration in your search collection over at /path/to/solr/collection-name/conf/schema.xml around line 162.
- Before:
<!-- in this example, we will only use synonyms at query time<br> <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/><br> -->
- After:
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
- Before:
- Define multi-term synonyms in the synonyms.txt file that's in the same folder as the above schema.xml file. Follow the form here.
- darkroom => dark room
- keynote => key note
- Restart the search engine. This is system dependent, but if you're using the GlassFish application server for example, you may be able to restart Solr with a command like sudo service GlassFish_solr restart.
- Clear the search index and rebuild it.
- Surf to Administration » Configuration » Search and metadata » Search API » Your index name.
- Hit the "Queue all items for reindexing" button.
- Hit the "Index now" button.
That should do it. You're all set!
Background reading
For more information on how all of this really works, here are some useful articles on the subject.
- Why is Multi-term synonym mapping so hard in Solr?
- Solution for multi-term synonyms in Lucene/Solr using the Auto Phrasing TokenFilter
- Better synonym handling in Solr
This article, Get search results for compound words not in content with Drupal, Search API and Solr, appeared first on the Colan Schwartz Consulting Services blog.