Creating a Crazy Apache Solr Search in Drupal 6
A video communication area, arranged in several views:
A students’ portfolio accordion:
And a very complicated courses search mechanism - the subject of this post:
Background / What we’re working with here
Since John Bryce is all about training, the heart of their site’s content is obviously *courses*:
There are courses, and for each course, there’s optional information, related to a specific instance of that course. That way, if a course has several opening times, we only store most of its info once, and keep per-instance related info, which we get from a web-service, separately.
The Drupal structure for our courses is this:
The courses themselves, implemented as 2 slightly-different content types, hold information such as the name of the course, an introduction, body text (in tabs), and various informative fields and taxonomies.
The optional info, which we call “course instances”, contains mainly an opening date, the location where the course is given, how long the course is, and whether it’s a day or an evening course.
The really tricky bit, which we’ll discuss here, was creating the search mechanism, which needs to search and filter in both the courses and their instances. Keep in mind, though, that we might want to find a course containing a certain text, but which also starts next month - and none of our content types hold both of these bits of information.
So the challenge is: searching content for information it doesn’t actually have.
First step / Indexing
In order to find what you’re searching for - it’s got to be there!
That’s why we interfere in apache solr’s indexing process using hook_apachesolr_update_index().
When indexing, we check each node’s type. If it’s of type *course* or *course instance*, we generate some fields for it.
When dealing with course instances, there’s almost nothing there, as far as content goes. So we load the course that this course instance relates to, and copy the info off it.
We append the fields that should be search-able through the free-text search (like the course’s body and introduction) to solr’s index main body field, like so:
$document->body .= $field_content_from_the_course
Taxonomy field values get added as separate index fields (which we make up), like so:
$document->addField($field_key, $field_value);
<?php /** * Implementation of hook_apachesolr_update_index() */function jb_solr_apachesolr_update_index(&$document, $node) { if ($node->type == 'course_instance') { //Adding short_introduction field $intro = $course_node->field_short_introduction[0]['value']; $document->body .= $intro; }}?>
Now the course instances have all of the courses’ content.
But we still need to treat courses:
Apache solr indexes taxonomy term names by default. We wanted term ids, which is also what we indexed for the course instances.
So we repeat some of the above process for nodes of type course - the parts that relate to taxonomy.
And, voila! Indexing is done.
Second Step / Modifying the Query
To actually use our search mechanism, our next logical step is creating a custom advanced search form, which we meddle with on submit, to get everything nicely to the URL’s querystring:
For example, we want to allow users to search for courses in several categories at once. The default is to use the “AND” operator, but we want “OR”, because each course only belongs to one category. So in the form’s submit function, we have:
implode(" OR ", array_filter($form_state['values']['sections']))
which gives us a nice OR filter in the querystring.
To help apache solr use all those carefully crafted filters from the URL, we use hook_apachesolr_modify_query(), and add the relevant filters to solr’s query, like so:
$query->add_filter("field_name", $value_from_get, FALSE)
<?php /** * Implementation of hook_apachesolr_modify_query() */function jb_solr_apachesolr_modify_query(&$query, &$params, $caller) { if ($caller == 'apachesolr_views_query') { if ($args = $_GET['tra_sec']) { $filter = "(" . $args . ")"; $query->add_filter("ss_tra_section", $filter, FALSE); } }}?>
Now apache solr should know how to deal with all the info our user wanted to send it.
Third Step / Reducing Duplicates
So we run cron, and everything is indexed, and using apachesolr_views, we even get results, BUT - now we get multiple results for many of the courses, because we get both the course and its instances.
What to do?
Using hook_views_post_render(), we grab the results of the search from the view, loop through them, and insert the nid from each result into an array. Since we have so-called duplicates, this array will have duplicate values. Using array_unique(), we end up with a unique-valued array, which we now explode with plus(+) signs and stick into a variable.
To display these results, we call a different view we’ve prepared beforehand, and give it the variable, using views_embed_view().
<?php /*** Implementation of hook_views_post_render()*/function jb_solr_views_post_render(&$view, &$output, &$cache) { if ($view->name == 'jb_search' && $view->current_display == 'page_1') { if (empty($view->result)) { variable_set('jb_uniques_search_results', NULL); $output .= $view->empty['option']['content']; } else { foreach ($view->result as $result) { if ($result->type == 'course_instance') { $uniques[] = $result->node_data_field_course_field_course_nid; } elseif ($result->type == 'professional_course' || $result->type == 'training_course') { $uniques[] = $result->nid; } } //Get a unique element array from the nodereference course field. $uniques = array_unique($uniques); $uniques = implode("+", $uniques); //Set a string of the unique elements glue with "+" as an arguments for the //"course_by_arg" view. See jb_solr_search_courses_block(). variable_set('jb_uniques_search_results', $uniques); //hide the original "jb_search" result output. $output = ''; } }}?>
In the last line, $output = ""; we hide the original view, which we don't want to display, because we'll be showing the new view.
No more duplicates!
Fourth Step / Facets
Apache solr makes facets out of the box, but since we’ve created custom index fields, we now have to tell solr to make facets for them:
- First things first, declare your facets to the world:
Using hook_apachesolr_facets(), we declare a facet, using the exact name of the field we created when we did the custom indexing.
<?php /** * Implementation of hook_apachesolr_facets() */function jb_solr_apachesolr_facets() { $facets = array(); $facets['ss_category'] = array( 'info' => t('facet: By Category'), 'facet_field' => 'ss_category', ); // …more of the same return $facets;}?>
This adds the facets to the "enabled filters" page: admin/settings/apachesolr/enabled-filters,
where we need to: - Enable the newly-born facets at the above link.
After we enable the facets, Drupal tells us to go to the blocks page and place the new facet blocks in some region. We actually prefer context, so we add the blocks there.
Oh, and to have the blocks actually appear on the page, in hook_block’s view part, we use apachesolr_facet_block to print the block to the page. - Create some arguments for views:
For views to be able to deal with the new facets, we need to declare arguments for them. In hook_views_data_alter(), we add a section for each new index field, using apachesolr_views_handler_argument as the handler.
<?php function jb_solr_block($op = 'list', $delta = 0, $edit = array()) { switch ($op) { case 'view': /* * Add custom facet blocks here - this makes them appear on the search * results page. */ if ($delta == 'ss_start') { $block = apachesolr_facet_block($response, $query, 'ss_start', $delta, $delta, t('Filter by Start date'), 'jb_solr_facet'); } return $block; break; }}?>
This done, we add the argument in the view, and now it can filter on our new facets.
Sounds easy, right?
Well, not really, but here’s a little tip to make it easier for you: You can shorten the 2 minute delay in processing the indexed data if you edit the value for maxtime in the example/solr/conf/solrconfig.xml file.
Happy searching!