Query for nodes which reference multiple terms in Drupal 7
Have you ever needed to run a custom query that returns all the nodes which reference two or more taxonomy terms? I have, and most of the time it was very simple because the requirement was for nodes which reference either term, rather than ALL the terms.
When we are dealing with an OR type of select query, it couldn't be easier:
$query = db_select('node', 'n');
$query->join('taxonomy_index', 'ti', 'ti.nid = n.nid');
$query->condition('ti.tid', array(1, 2));
$query->fields('n', array('nid'));
$result = $query->execute();
The above example keeps things simple by assuming the term reference maintains an index in the taxonomy_index
table and that we already have our term IDs.
Things get a bit more complicated when we are trying to query the nodes which reference both term 1 AND term 2. After banging my head against the wall for a bit, I came up with a solution inspired by how Views generates its query:
$tids = array(1, 2);
$query = db_select('node', 'n');
foreach ($tids as $key => $tid) {
$query->innerJoin('taxonomy_index', 'ti_' . $key, 'ti_' . $key . '.nid = n.nid AND ti_' . $key . '.tid = ' . $tid);
}
foreach ($tids as $key => $tid) {
$query->condition('ti_' . $key . '.tid', $tid);
}
$query->fields('n', array('nid'));
$result = $query->execute();
So basically the solution is to create a join for each term ID we want to filter by. A crucial part of this is that the join needs to happen on the node ID between the two tables (as expected) but also on the term ID we are using for the filter matching the record in the taxonomy_index
table. This will ensure that the two joins are not the same but that they reflect the relation between the node record and each individual term ID. Are you still following? Then, we just add our conditions on the newly created joins.
Caveats
This should work, and depending on the size of your dataset, it should not have too much of a performance impact. However, as no joins should be done if not absolutely necessary, investigate the possibility of querying for the nodes that reference the first term and then filtering the rest out using PHP. This can be a viable option if you know that usual result sets are not too big so you are not running array_filter()
on 2000 terms. And of course, when possible, cache the query appropriately.