Introducing Federated Search
Introducing Federated Search
Mon, 01/07/2019 - 13:38
Ken Rickard and Avi Schwab
Jan 7, 2019
Search API Federated Solr is Palantir.net’s open source solution to federated search.
Last year, Google announced Google Search Appliance would be discontinued. This announcement means that enterprise clients needing a simple yet customizable search application for their internal properties will be left without a solution some time in 2019.
As the request of an existing client, Palantir has worked for the past year to produce a replacement for the GSA and other federated search applications using open-source tools. We abstracted this project into a reusable product to index and serve data across disparate data sources, Drupal and otherwise, and we’re now happy to share it with the community.
What is Federated Search?
Federated Search is being released publicly as an open source solution to a common problem. It works out-of-the-box, and can also be customized. There are three main parts to the product:
- Content indexing via Drupal integration (provided)
- Result serving via React application (provided)
- Data storage in a Solr backend (required; we can recommend SearchStax as an option.)
How was Federated Search built?
Every search application, no matter what the implementation, has three main parts: the source, the index, and the results.
Working from the results backward, we began with identifying a schema in which all of our source data would be stored. A basic review of search pages across the internet reveals a fairly common set of features. A title, some descriptive text, and a link are the absolute minimum for displaying search results. Some extra metadata like an image, date, and type are also useful to give the user a richer experience and some filter criteria. Finally, since we’re searching across sites, we’ll need some data about where the item comes from.
With that schema in mind, and knowing Drupal would be our data source, we identified a need to get data from some unknown structure in Drupal (because every site might have vastly different content types) into a fixed set of buckets. Since much of the terminology is the same, the Metatag module quickly came to mind — Metatag allows users to take data from Drupal fields using Tokens and output it into specific meta-tags on the site. With that same pattern in mind, we built Search API Field Map. This module allows us to use tokens to set bundle-level patterns, which all get indexed into the same field in our index.
At Palantir, search is part of every project. We’ve implemented numerous custom and complex search configurations, and almost every time we lean on Apache Solr for our backend. Solr is a CMS-agnostic search index that has a well-supported and robust existing toolchain for Drupal. Search API and Search API Solr provided a solid groundwork from which to build our source plugins, so then the last step was getting our data out. Solr comes out of the box with “Response Writers” that cover almost every known data format, so our options were wide open.
We started with an existing framework to provide the query handlers and basic front-end components, then extended it with our own set of component packs to build out the user interface. Search API Federated Solr provides the React application as a Drupal library, adds a search block, and surfaces some custom per-site configuration for the search application.
A Flexible, Open Source Search Solution
With Drupal, Solr, and React working together, we’re able to index data from completely arbitrary sources, standardize it, and then output it in an easily consumable way. This approach means more flexibility for site administrators and a cleaner experience for users.
A number of commercial applications exist to provide this functionality, but our solution provides a number of benefits:
- Keeping the data source tightly coupled with Drupal allows for maximum customization and access to the source content.
- Providing a decoupled front-end allows us to surface results anywhere, even outside of Drupal.
- Being built on 100% open-source code allows for community improvement and sharing.
How can you use this or download the code?
Between the Drupal modules and React code, there’s a lot going on to make this application work, and even with those, you’ll still need to bring your own Solr backend to index the data. Luckily, we’ve put all those pieces together into a fully functional demo box using Palantir’s open source Vagrant environment and build tasks.
If you’d like to inspect the pieces individually, here they are:
- Federated Search Demo (GitHub)
- Search API Federated Solr Handbook (D8, D7)
- Search API Federated Solr (GitHub, Drupal.org)
- Search API Field Map (GitHub, Drupal.org)
- Federated Solr React (GitHub)
Palantir plans to maintain these projects as a cohesive unit moving forward, and pull requests or D.o issues on the projects above are always welcome.
Does it have to be a Drupal site?
No! While we provide everything needed to index a Drupal 8 or Drupal 7 site, there’s no reason you can’t configure an additional data source to send content to the same Solr index, as long as it conforms to the required schema. The front-end is also CMS-agnostic, so you could search Drupal sites from Wordpress, another CMS, or even from a statically generated site.