What the fq? A short summary of Solr query fields.
#And how to use them in Drupal
A popular search engine for Drupal is Apache Solr. Although installation and configuration of Solr can be done almost completely via the Drupal Admin UI, in some cases it can be very instructive to see and understand what data is sent to and from Solr when a search is done from Drupal.
First place to look when using Solr inside Tomcat is the log file of Tomcat, usually /var/log/tomcat6/catalina.out. If this file is not present in this directory on your system use
locate catalina.out
or a similar command to find it.
If Solr is used in its own Jetty-container and is run as a seperate service (which is the only option for Solr 5.x), log4j is used to implement logging and configuration is done in the log4j.properties-file.
By default the logs are written to 'logs/solr' under the Solr root, this can be set with in log4j.properties with the 'solr.log'-option, for example:
solr.log=/var/solr/logs
For more information about log4j, see Apache Log4j 2.
In the log, each line like the following represents one search query:
INFO: [solrdev] webapp=/solr path=/select params={spellcheck=true&
spellcheck.q=diam&
hl=true&
facet=true&
f.bundle.facet.mincount=1&
f.im_field_tag.facet.limit=50&
f.im_field_tag.facet.mincount=1&
fl=id,entity_id,
entity_type,
bundle,
bundle_name,
label,
ss_language,
is_comment_count,
ds_created,
ds_changed,
score,
path,
url,
is_uid,
tos_name&
f.bundle.facet.limit=50&
f.content.hl.alternateField=teaser&
hl.mergeContigious=true&
facet.field=im_field_tag&
facet.field=bundle&
fq=(access_node_tfslk0_all:0+OR+access__all:0)&
mm=1&
facet.mincount=1&
qf=content^40&
qf=label^5.0&
qf=tags_h2_h3^3.0&
qf=taxonomy_names^2.0&
qf=tos_content_extra^0.1&
qf=tos_name^3.0&
hl.fl=content&
f.content.hl.maxAlternateFieldLength=256&
json.nl=map&
wt=json&
rows=10&
pf=content^2.0&
hl.snippets=3&
start=0&
facet.sort=count&
q=diam&
ps=15} hits=10 status=0 QTime=12
NB: one way to get a clearer look at the log-lines, is by copying one of them into a text editor and replace '&' with '&\n' and ',' with ',\n' to get a more readable text.
Here '[solrdev]' indicates the core the query was submitted to and 'path=/select' the path.
Everything between the {}-brackets is what is added to the query as parameter. If your Solr host is localhost, Solr is running on port 8080 and the name of your core is solrdev then you can make this same query in any browser by starting with:
http://localhost:8080/solr/soldev/select?
followed by all the text between the {}-brackets.
This looks like no simple query and in fact a lot is going on here: not only is the Solr/Lucene index searched for a specific term, we also tell Solr which fields to return, to give us spellcheck suggestions, to higlight the search term in the return snippet, to return facets etcetera.
For better understanding of the Solr query we will break it down and discuss each (well, maybe not all) of the query parameters from the above log line.
Query breakdown
q: Search term
The most basic Solr query would only contain a q-field, e.q.
http://localhost:8080/solr/solrdev/select?q=diam
This would return all fields present in Solr for all matching documents. This will either be fields directly defined in the schema.xml (in this examples we use the schema's based on the schema included in the Search API Solr module) like bundle_name:
<field name="bundle_name" type="string" indexed="true" stored="true"/>
or dynamic fields, which are created according to the field definition in the schema.xml, eg:
<dynamicField name="ss_*" type="string" indexed="true" stored="true" multiValued="false"/>
The above query run on my local development environment would return a number of documents like this one:
<doc> <bool name="bs_promote">true</bool> <bool name="bs_status">true</bool> <bool name="bs_sticky">false</bool> <bool name="bs_translate">false</bool> <str name="bundle">point_of_interest</str> <str name="bundle_name">Point of interest</str> <str name="content">Decet Secundum Wisi Cogo commodo elit eros meus nisl turpis.(...) </str> <date name="ds_changed">2015-03-04T08:45:18Z</date> <date name="ds_created">2015-02-19T18:55:44Z</date> <date name="ds_last_comment_or_change">2015-03-04T08:45:18Z</date> <long name="entity_id">10</long> <str name="entity_type">node</str> <str name="hash">tfslk0</str> <str name="id">tfslk0/node/10</str> <long name="is_tnid">0</long> <long name="is_uid">1</long> <str name="label">Decet Secundum Wisi</str> <str name="path">node/10</str> <str name="path_alias">content/decet-secundum-wisi</str> <str name="site">http://louis.atlantik.dev/nl</str> <arr name="sm_field_subtitle"> <str>Subtitle Decet Secundum Wisi</str> </arr> <arr name="spell"> <str>Decet Secundum Wisi</str> <str>Decet Secundum Wisi Cogo commodo elit eros meus nisl turpis. (...) </str> </arr> <str name="ss_language">nl</str> <str name="ss_name">admin</str> <str name="ss_name_formatted">admin</str> <str name="teaser"> Decet Secundum Wisi Cogo commodo elit eros meus nisl turpis. Abluo appellatio exerci exputo feugiat jumentum luptatum paulatim quibus quidem. Decet nutus pecus roto valde. Adipiscing camur erat haero immitto nimis obruo pneum valetudo volutpat. Accumsan brevitas consectetuer fere illum interdico </str> <date name="timestamp">2015-03-05T08:30:11.7Z</date> <str name="tos_name">admin</str> <str name="tos_name_formatted">admin</str> <str name="url"> http://louis.atlantik.dev/nl/content/decet-secundum-wisi </str> </doc>
(In this document I have left out most of the content of the fields content and spell).
Obviously, when searching we are not interested in all fields, for example the afore mentioned field content contains a complete view (with HTML tags stripped) of the node in Drupal, which most of the times is not relevant when showing search results.
fl: Fields
So the first thing we want to do is limit the number of fields Solr is returning by adding the fl parameter, in which we name the fields we want returned from Solr:
http://localhost:8080/solr/solrdev/select?q=diam&fl=id,entity_id,entity_type,bundle,bundle_name,label,ss_language,score,path,url
This would return documents like:
<doc> <float name="score">0.1895202</float> <str name="bundle">point_of_interest</str> <str name="bundle_name">Point of interest</str> <long name="entity_id">10</long> <str name="entity_type">node</str> <str name="label">Decet Secundum Wisi</str> <str name="path">node/10</str> <str name="ss_language">nl</str> <str name="url"> http://localhost/nl/content/decet-secundum-wisi </str></doc>
Here we not only use fields which are direclty present in the index (like bundle) but also a generated field score which indicates the relevance of the found item. This field can be used to sort the results by relevance.
By the way, from Solr 4.0 on, the fl can be added multiple times to the query with in each parameter one field. However the "old" way of a comma-seperated field list is still supported (also in Solr 5).
So in Solr 4 the query could (or should I say: should?) be written as:
http://localhost:8080/solr/solrdev/select?q=diam&fl=id&fl=entity_id&fl=entity_type&fl=bundle&fl=bundle_name&fl=label&fl=ss_language&fl=score&fl=path&fl=url
NB: in Solr 3 this would give a wrong result, with only the first fl field returned.
Please note that you can not query all dynamic fields at once with a fl-parameter like 'fl=ss_*': you must specify the actual field which are created while indexing: fl=ss_language,ss_name,ss_name_formatted... etc.
fq: Filter queries
One thing we do not want is users finding unpublished content which they are not allowed to see. When using the Drupal scheme, this can be accomplished by filtering on the dynamic fields created from
<dynamicField name="access_*" type="integer" indexed="true" stored="false" multiValued="true"/>
To filter, we add a fq-field like this:
http://localhost:8080/solr/solrdev/select?q=diam&fl=id,entity_id,entity_type,bundle,bundle_name,label,ss_language,score,path,url&fq=(access_node_tfslk0_all:0+OR+access__all:0)
The queries in fq are cached independent from the other Solr queries and so can speed up complex queries. The query can also contain range-queries, e.g. to limit the returned documents by a popularity-field present in the Drupal-node (and of course, indexed) between 50 and 100, one could use
fq=+popularity:[50 TO 100]
For more info see the Solr wiki, CommonQueryParameters
To add filter queries programmatically in Drupal when using the Apache Solr Search-module, implement
hook_apachesolr_query_alter()
and use
$query->addFilter
to add filters to the query.
For example, to filter the query on the current language, use:
function mymodule_apachesolr_query_alter(DrupalSolrQueryInterface $query) { global $language; $query->addFilter("ss_language", $language->language);}
If using the Solr environment with Search API Solr Search, implement
hook_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query)
In this hook you can alter or add items to the
$call_args['params']['fq']
to filter the query.
In both cases the name of the field to use can be found via the Solr admin. In this case it is a string fieks created from the dynamic ss_ *-field in 'schema.xml'.
rows and start: The rows returnd
The 'row'-parameter determines how many documents Solr will return, while 'start' detemines how many documents to skip. This basically implements pager-functionality.
wt: Type of the returned data
The 'wt'-parameter determines how the data from Solr is returned. Most used are:
- xml (default)
- json
See https://cwiki.apache.org/confluence/display/solr/Response+Writers for a complete list of available formats and their options.
qf: Boosting
The DisMax and EdisMax plugins have the abilty to boost the score of documents. In the default Drupal requestHandler ("pinkpony") the Edismax-parser is set as query plugin:
<requestHandler name="pinkPony" class="solr.SearchHandler" default="true"> <lst name="defaults"> <str name="defType">edismax</str>
Boosting can be done by adding one or more qf-parameters which define the fields ("query field") and their boost value in the following syntax:
[fieldName]^[boostValue]
E.g:
qf=content^40&qf=label^5.0&qf=tags_h2_h3^3.0&qf=taxonomy_names^2.0&
In Drupal this is done by implementing the relevant (Search API or Apache Solr) hook and adding the boost.
For example, let's say you want to boost the result with a boost value of "$boost" if a given taxonomy term with id "$tid" is present int the field "$solr_field"
For Apache Solr module we should use:
function mymodule_apachesolr_query_alter(DrupalSolrQueryInterface $query) { $boost_q = array(); $boost_q[] = $solr_field . ':' . $tid . '^' . $boost; $query->addParam('bq', $boost_q);}
For Search API Apache Solr:
function mymodule_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query) { $call_args['params']['bq'][] = $solr_field . ':' . $tid . '^' . $boost;}
NB: both examples we ignore any boost-values set by other modules for the same field. In real life you should merge the exisitng boost-array with our new one.
Negative boosting
Negative boosting is not possible in Solr. It can however be simulated by boosting all documents who do not have a specific value. In the above example of boosting on a specific taxonomy term, we can use:
$boost = abs($boost);$boost_q[] = '-' . $solr_field . ':' . $tid . '^' . $boost;
or, for Search API
$boost = abs($boost);$call_args['params']['bq'][] = '-' . $solr_field . ':' . $tid . '^' . $boost;
where the '-' for the field name is used to indicated that we want to boost the items that do not have this specific value.
mm: minimum should match
The Edismax parser also supports querying phrases (as opposed to single words). With the 'mm' parameters the minimum amount of words that must be present in the Solr-document is given. For example: if the query is "little red corvet" and 'mm' is set to '2', only documents which contain at least:
- "little" and "red"
- "little" and "corvette"
- "red" and "corvette"
are returned, and documents which contain only of the words are not.
q.alt: for empty queries
If specified, this query will be used when the main query string is not specified or blank. This parameter is also specific for the EdisMax query handler.
And even more
Of course the above mentioned fields are not all fields used in a Solr query. Much more can be done with them and there are a lot of other parameters to influence the output of Solr.
To mention just a few:
Facets
Facets are one of the strong assets of Solr. A complete description of all possibilities and settings for facets would take to far in this scope, but a number of usefull parameters are discussed her.
The Drupal Facet API module and its range of plugins have a plethora of settings and possibilites, like the Facet Pretty Paths-module or, for an example of the numerous possibilities of the Facet API, the Facet API Taxonomy Sort module.
The most important Solr parameters in relation to facets are:
facet
Turn on the facets by using "facet=true" in the query.
facet.field
The field to return facets for, can be added more than once.
facet.mincount
This indicates the minimum amount of results for which to show a facet.
If this is set to '1', all facets items which would return documents are returned, if set to '0' a facet item for each value in the field will be returned, even if clicking on the item would return zero documents.
facet.sort
How to sort the facet items. Possible values are:
facet.sort=count
Sort by number of returned documents
facet.sort=index
Amounts to sorting by alphabet, or, in the Solr-wiki style: return the constraints sorted in their index order (lexicographic by indexed term). For terms in the ascii range, this will be alphabetically sorted.
facet.limit
The maximum amount of facet items returned, defaults to 100.
facet.offset
The number of facet items skipped. Defaults to '0'.
'facet.limit' and 'facet.offset' can be combined to implement paging of facet items.
Highlighting
hl
Highlighting is turned on with 'hl=true' and enables highlighting the keywords in the search snippet Solr returns. Like facetting and spellchecking, highlighting is an extensive subject, see http://wiki.apache.org/solr/HighlightingParameters for more info.
hl.formatter
According to the Solr wiki: currently the only legal value is "simple". So just use 'simple'.
hl.simple.pre/hl.simple.post
Set the tags used to surround the highlight, defaults to <em> / </em> To wrap the highlight in for example bold tags, use:
hl.simple.pre=<b>&hl.simple.post=<b>
Spellchecking
spellcheck
Spellchecking is enabled with 'spellcheck=true'.
Because spellchecking is a complicated and language dependend process, it is not discussed here in full, see http://wiki.apache.org/solr/SpellCheckComponent for more information about the spellcheck component.
If the query for the spellchecker is given in a seperate 'spellcheck.q'-parameter like this:
spellcheck.q=<word>
this word is used for spell checking. If the 'spellcheck.q'-parameter is not set, the default 'q'-parameters is used as input for the spellchecker. Of course the word in the 'spellcheck.q' should bare a strong relation to the word in the 'q'-parameter, otherwise ununderstandable spelling suggestions wpould be given.
One can also make seperate requests to the spellchecker:
http://localhost:8080/solr/solrdev/spell?q=<word>&spellcheck=true&spellcheck.collate=true&spellcheck.build=true
Where <word> in the 'q-parameter is the word to use as input for the spellchecker.
One important subject releated to spellchecking is the way content is analyzed before it is written into the index or read from it. See SpellCheckingAnalysis for the default settings for the spellcheck-field.
In Drupal there is a spellcheck module for Search API Search API Spellcheck which can be used for multiple search backends.
Conclusion
Although most of the parameters mentioned above are more or less hidden by the Drupal admin interface, a basic understanding of them, can help to understand why your Solr search does (or more usually: does not) returns the results you expected.
As said in the introduction: looking at the queries in the Tocmat log can help a lot when debugging Solr.