Updated: Has taxonomy term ID (with depth) query performance
Problem/Motivation
When using views + taxonomy term id with depth, either as view argument or view filter, the MySQL query can have quite few subqueries and joins, sorting, which is very slow, specially when MySQL cannot cache these subqueries. MySQL will use temporary tables, wich can be really slow writing on the disk for large databases.
In the example below, this is the resulting query of a view using taxonomy term id with depth 3 as argument:
SELECT node.sticky AS node_sticky
,
nodequeue_nodes_node.position AS
nodequeue_nodes_node_position,
field_data_field_published_date.field_published_date_value AS
field_data_field_published_date_field_published_date_value,
node.nid AS nid
FROM node node
LEFT JOIN nodequeue_nodes nodequeue_nodes_node
ON node.nid = nodequeue_nodes_node.nid
AND nodequeue_nodes_node.qid = '1'
LEFT JOIN field_data_field_published_date field_data_field_published_date
ON node.nid = field_data_field_published_date.entity_id
AND ( field_data_field_published_date.entity_type = 'node'
AND field_data_field_published_date.deleted = '0' )
WHERE (( ( node.status = '1' )
AND ( node.nid IN (SELECT tn.nid AS nid
FROM taxonomy_index tn
LEFT OUTER JOIN taxonomy_term_hierarchy th
ON th.tid = tn.tid
LEFT OUTER JOIN taxonomy_term_hierarchy th1
ON th.parent = th1.tid
LEFT OUTER JOIN taxonomy_term_hierarchy th2
ON th1.parent = th2.tid
LEFT OUTER JOIN taxonomy_term_hierarchy th3
ON th2.parent = th3.tid
WHERE ( ( tn.tid = '37' )
OR ( th1.tid = '37' )
OR ( th2.tid = '37' )
OR ( th3.tid = '37' ) )) ) ))
ORDER BY node_sticky DESC,
nodequeue_nodes_node_position DESC,
field_data_field_published_date_field_published_date_value DESC
LIMIT 10 offset 0;
The problem here is the sub query that is finding the nids to show on this index. In my sites database this query returns 5425 records in 1 second. It is slow because it is joining 5 tables, taxonomy index is one of them and has 103k records. Depth is terribly inefficient in databases that store hierarchy with parent references; the nested set model would be much better but that is something that would require a huge rewrite of drupal.
Proposed resolution
I believe the depth query can be achieved with no extra joins, but with a few other queries and some php. I suggest the tree of terms is loaded and we check the tid is in a set of taxonomy terms, such as:
SELECT node.sticky AS node_sticky
,
nodequeue_nodes_node.position AS
nodequeue_nodes_node_position,
field_data_field_published_date.field_published_date_value AS
field_data_field_published_date_field_published_date_value,
node.nid AS nid
FROM node node
LEFT JOIN nodequeue_nodes nodequeue_nodes_node
ON node.nid = nodequeue_nodes_node.nid
AND nodequeue_nodes_node.qid = '1'
LEFT JOIN field_data_field_published_date field_data_field_published_date
ON node.nid = field_data_field_published_date.entity_id
AND ( field_data_field_published_date.entity_type = 'node'
AND field_data_field_published_date.deleted = '0' )
INNER JOIN taxonomy_index ON node.nid = taxonomy_index.nid AND taxonomy_index.tid IN ( '37', '38', '39', '40',
'41', '42', '43', '44',
'45', '46', '47', '48',
'49', '50', '51', '52',
'35524', '53', '54', '56',
'57', '58', '59')
WHERE (( ( node.status = '1' ) ))
ORDER BY node_sticky DESC,
nodequeue_nodes_node_position DESC,
field_data_field_published_date_field_published_date_value DESC
LIMIT 10 offset 0;
This query is much more efficient and as far as I can work out, will return exactly the same results.
Remaining tasks
Patch views_handler_argument_term_node_tid_depth.inc
Patch views_handler_filter_term_node_tid_depth.inc
Original report by jamiecuthill
I have a view emulates the taxonomy index page but uses depth to pull in nodes assigned to children of the current term (set to 3 in this case). This view also has some complex ordering based on sticky, nodequeue position and a date field.
The generated query looks something like this
SELECT node.sticky AS node_sticky, nodequeue_nodes_node.position AS nodequeue_nodes_node_position, field_data_field_published_date.field_published_date_value AS field_data_field_published_date_field_published_date_value, node.nid AS nid FROM node node LEFT JOIN nodequeue_nodes nodequeue_nodes_node ON node.nid = nodequeue_nodes_node.nid AND nodequeue_nodes_node.qid = '1' LEFT JOIN field_data_field_published_date field_data_field_published_date ON node.nid = field_data_field_published_date.entity_id AND (field_data_field_published_date.entity_type = 'node' AND field_data_field_published_date.deleted = '0') WHERE (( (node.status = '1') AND (node.nid IN (SELECT tn.nid AS nid FROM taxonomy_index tn LEFT OUTER JOIN taxonomy_term_hierarchy th ON th.tid = tn.tid LEFT OUTER JOIN taxonomy_term_hierarchy th1 ON th.parent = th1.tid LEFT OUTER JOIN taxonomy_term_hierarchy th2 ON th1.parent = th2.tid LEFT OUTER JOIN taxonomy_term_hierarchy th3 ON th2.parent = th3.tid WHERE ( (tn.tid = '37') OR (th1.tid = '37') OR (th2.tid = '37') OR (th3.tid = '37') ))) )) ORDER BY node_sticky DESC, nodequeue_nodes_node_position DESC, field_data_field_published_date_field_published_date_value DESC LIMIT 10 OFFSET 0;
The problem here is the sub query that is finding the nids to show on this index. In my sites database this query returns 5425 records in 1 second. It is slow because it is joining 5 tables, taxonomy index is one of them and has 103k records. Depth is terribly inefficient in databases that store hierarchy with parent references; the nested set model would be much better but that is something that would require a huge rewrite of drupal.
Now I believe the depth query can be achieved with no extra joins, but with a few other queries and some php. I suggest the tree of terms is loaded and we check the tid is in a set of taxonomy terms, such as:
SELECT node.sticky AS node_sticky, nodequeue_nodes_node.position AS nodequeue_nodes_node_position, field_data_field_published_date.field_published_date_value AS field_data_field_published_date_field_published_date_value, node.nid AS nid FROM node node LEFT JOIN nodequeue_nodes nodequeue_nodes_node ON node.nid = nodequeue_nodes_node.nid AND nodequeue_nodes_node.qid = '1' LEFT JOIN field_data_field_published_date field_data_field_published_date ON node.nid = field_data_field_published_date.entity_id AND (field_data_field_published_date.entity_type = 'node' AND field_data_field_published_date.deleted = '0') WHERE (( (node.status = '1') AND (node.nid IN (SELECT tn.nid AS nid FROM taxonomy_index tn WHERE ( (tn.tid IN ('37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '35524', '53', '54', '56', '57', '58', '59')) ))) )) ORDER BY node_sticky DESC, nodequeue_nodes_node_position DESC, field_data_field_published_date_field_published_date_value DESC LIMIT 10 OFFSET 0;
This query is much more efficient and as far as I can work out, will return exactly the same results.
I have attached a patch with my code changes for this and I would really like some feedback as to whether it's worth considering.