Solr for Drupal Developers, Part 1: Intro to Apache Solr
It's common knowledge in the Drupal community that Apache Solr (and other text-optimized search engines like Elasticsearch) blow database-backed search out of the water in terms of speed, relevance, and functionality. But most developers don't really know why, or just how much an engine like Solr can help them.
I'm going to be writing a series of blog posts on Apache Solr and Drupal, and while some parts of the series will be very Drupal-centric, I hope I'll be able to illuminate why Solr itself (and other search engines like it) are so effective, and why you should be using them instead of simple database-backed search (like Drupal core's Search module uses by default), even for small sites where search isn't a primary feature.
As an aside, I am writing this series of blog posts from the perspective of a Drupal developer who has worked with large-scale, highly customized Solr search for Mercy (example), and with a variety of small-to-medium sites who are using Hosted Apache Solr, a service I've been running as part of Midwestern Mac since early 2011.
Apache Solr's wiki leads off it's Why Use Solr page with the following:
If your use case requires a person to type words into a search box, you want a text search engine like Solr.
At a basic level, databases are optimized for storing and retrieiving bits of data, usually either a record at a time, or in batches. And relational databases like MySQL, MariaDB, PostgreSQL, and SQLite are set up in such a way that data is stored in various tables and fields, rather than in one large bucket per record.
In Drupal, a typical node entity will have a title in the node
table, a body in the field_data_body
table, maybe an image with a description in another table, an author whose name is in the users
table, etc. Usually, you want to allow users of your site to enter a keyword in a search box and search through all the data stored across all those fields.
Drupal's Search module avoids making ugly and slow search queries by building an index of all the search terms on the site, and storing that index inside a separate database table, which is then used to map keywords to entities that match those keywords. Drupal's venerable Views module will even enable you to bypass the search indexing and search directly in multiple tables for a certain keyword. So what's the downside?
Mainly, performance. Databases are built to be efficient query engines—provide a specific set of parameters, and the database returns a specific set of data. Most databases are not optimized for arbitrary string-based search. Queries where you use LIKE '%keyword%'
are not that well optimized, and will be slow—especially if the query is being used across multiple JOIN
ed tables! And even if you use the Search module or some other method of pre-indexing all the keyword data, relational databases will still be less efficient (and require much more work on a developer's part) for arbitrary text searches.
If you're simply building lists of data based on very specific parameters (especially where the conditions for your query all utilize speedy indexes in the database), a relational database like MySQL will be highly effective. But usually, for search, you don't just have a couple options and maybe a custom sort—you have a keyword field (primarily), and end users have high expectations that they'll find what they're looking for by simply entering a few keywords and clicking 'Search'.