Debug Solr queries
Debug Solr queries
Vasi Chindris
Tue, 06/30/2015 - 13:00
Solr is great! When you have a site even with not so much content and you want to have a full text search, then using Solr as a search engine will improve a lot the speed of the search itself and the accuracy of the results. But, as most of the times happen, all the good things also come with a drawback too. In this case, we talk about a new system which our web application will communicate to. This means that, even if the system is pretty good by default, you have to be able in some cases to understand more deeply how the system works.This means that, besides being able to configure the system, you have to know how you can debug it. We'll see in the following how we can debug the Solr queries which our applications use for searching, but first let’s think of a concrete example when we need to debug a query.
An example use case
Let’s suppose we have 2 items which both contain in the title a specific word (let’s say ‘building’). And we have a list where we show search results ordered by their score first, and when they have equal scores by the creation date, desceding. At a first sight, you would say that, because both of them have the word in the title, they have the same score, so you should see the newest item first. Well, it could be that this is not true, and even if they have the word in the title, the scores are not the same.
Preliminaries
Let’s suppose we have a system which uses Solr as a search server. In order to be able to debug a query, we first have to be able to run it directly on Solr. The easiest is when Solr is accessible via http from your browser. If not, the Solr must be reached from the same server where your application sits, so you call it from there. I will not insist on this thing, if you managed to get the Solr running for you application you should be able to call it.
Getting your results
The next thing you do is to try to make a query with the exact same parameters as your application is doing. To have a concrete example, we will consider here that we have a Drupal site which uses the Search API module with the Apache Solr as the search server. One of the possibilities to get the exact query which is made is to check the SearchApiSolrConnection::makeHttpRequest() method which makes a call to drupal_http_request() using an URL. You could also use the Solr logs to check the query if it is easier. Let's say we search for the word “building”. An example query should look like this:
http://localhost:8983/solr/select?fl=item_id%2Cscore&qf=tm_body%24value%...
If you take that one and run it in the browser, you should see a JSON output with the results, something like:
To make it look nicer, you can just remove the “wt=json” (and optionally “json.nl=map”) from your URL, so it becomes something like:
http://localhost:8983/solr/select?fl=item_id%2Cscore&qf=tm_body%24value^5.0&qf=tm_title^13.0&fq=index_id%3A"articles"&fq=hash%3Ao47rod&start=0&rows=10&sort=score desc%2C ds_created desc&q="building"
which should result in a much nicer, xml output:
List some additional fields
So now we have the results from Solr, but all they are containing are the internal item id and the score. Let's add some fields which will help us to see exactly what texts do the items contain. The fields you are probably more interested in are the ones which are in the “qf” variable, in your URL. In this case we have:
qf=tm_body%24value^5.0&qf=tm_title^13.0
which means we are probably interested in the “tm_body%24value” and the “ tm_title” fields. To make them appear in the results, we add them to the “fl” variable, so the URL becomes something like:
http://localhost:8983/solr/select?fl=item_id%2Cscore%2Ctm_body%24value%2...^5.0&qf=tm_title^13.0&fq=index_id%3A%22articles%22&fq=hash%3Ao47rod&start=0&rows=10&sort=score%20desc%2C%20ds_created%20desc&q=%22building%22
And the result should look something like:
Debug the query
Now everything is ready for the final step in getting the debug information: adding the debug flag. It is very easy to do that, all you have to do is to add the “debugQuery=true” to your URL, which means it will look like this:
http://localhost:8983/solr/select?fl=item_id%2Cscore%2Ctm_body%24value%2...^5.0&qf=tm_title^13.0&fq=index_id%3A%22articles%22&fq=hash%3Ao47rod&start=0&rows=10&sort=score%20desc%2C%20ds_created%20desc&q=%22building%22&debugQuery=true
You should see now more debug information, like how the query is parsed, how much time does it take to run, and probably the most important one, how the score of each result is computed. If your browser does not display the formula in an easy-readable way, you can copy and paste it into a text editor, it should look something like:
As you can see, computing the score of an item is done using a pretty complex formula, with many variables as inputs. A few more details about these variables you can find here: Solr Search Relevancy
Further reading and useful links
- To visualize the explaination of your Solr queries, you could use this: http://explain.solr.pl/
- If you want to dig deeper into how the score is computed: http://lucene.apache.org/core/3_5_0/api/core/org/apache/lucene/search/Similarity.html