Problem/Motivation
This issue is to propose two significant optimizations to \Drupal\taxonomy\Plugin\views\argument\IndexTidDepth
and \Drupal\taxonomy\Plugin\views\filter\TaxonomyIndexTidDepth
.
- the subselect the code generates is a multi-level left-join which is inefficient on MySQL because it cannot make use of indexes
- in particular, the left join generates many duplicates
- it generates a
WHERE ... IN ( subselect )
clause, which is inefficient on MySQL if the subselect returns anything but a trivially small result set
Here is the complete query HEAD generates for a single TID query with a depth of 3.
SELECT DISTINCT node_field_data.created AS node_field_data_created, node_field_data.nid AS nid FROM node_field_data node_field_data LEFT JOIN taxonomy_index taxonomy_index ON node_field_data.nid = taxonomy_index.nid LEFT JOIN node__field_promote_states node__field_promote_states ON node_field_data.nid = node__field_promote_states.entity_id AND node__field_promote_states.field_promote_states_value = 'landing' WHERE (((node__field_promote_states.field_promote_states_value IS NULL)) AND (node_field_data.nid IN (SELECT tn.nid AS nid FROM taxonomy_index tn LEFT OUTER JOIN taxonomy_term__parent th ON th.entity_id = tn.tid LEFT OUTER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id WHERE (tn.tid = '353') OR (th1.entity_id = '353')))) AND ((taxonomy_index.status = '1') AND (node_field_data.status = '1') AND (node_field_data.type IN ('article', 'recipe'))) ORDER BY node_field_data_created DESC LIMIT 6 OFFSET 5;
+-------------------------+--------+
| node_field_data_created | nid |
+-------------------------+--------+
| 1594378800 | 110686 |
| 1594360800 | 110446 |
| 1593946800 | 110321 |
| 1593342000 | 109946 |
| 1593241200 | 109896 |
| 1592823600 | 109591 |
+-------------------------+--------+
6 rows in set (0.49 sec)
Proposed resolution
The patch changes the code in two aspects:
taxonomy hierarchy subselect with union of specific inner joins instead of left join
Existing code
The existing code generates subselect following this pattern (tid 353, +3 levels):
explain SELECT tn.nid AS nid FROM taxonomy_index tn LEFT OUTER JOIN taxonomy_term__parent th ON th.entity_id = tn.tid LEFT OUTER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id LEFT OUTER JOIN taxonomy_term__parent th2 ON th1.parent_target_id = th2.entity_id WHERE (tn.tid = '353') OR (th1.entity_id = '353') OR (th2.entity_id = '353');
+----+-------------+-------+------------+-------+---------------+-----------+---------+------------------------------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------------------------------+--------+----------+--------------------------+
| 1 | SIMPLE | tn | NULL | index | term_node | term_node | 14 | NULL | 124367 | 100.00 | Using index |
| 1 | SIMPLE | th | NULL | ref | PRIMARY | PRIMARY | 4 | instyle.tn.tid | 1 | 100.00 | NULL |
| 1 | SIMPLE | th1 | NULL | ref | PRIMARY | PRIMARY | 4 | instyle.th.parent_target_id | 1 | 100.00 | NULL |
| 1 | SIMPLE | th2 | NULL | ref | PRIMARY | PRIMARY | 4 | instyle.th1.parent_target_id | 1 | 100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------------------------------+--------+----------+--------------------------+
The code in head has to scan all of the taxonomy_index rows.
New code
The patch generates much more efficient code using a UNION of specific inner joins, the replacement for the example above is:
explain SELECT tn.nid AS nid FROM taxonomy_index tn WHERE tn.tid = '353' UNION SELECT tn.nid AS nid FROM taxonomy_index tn INNER JOIN taxonomy_term__parent th ON tn.tid = th.entity_id INNER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id WHERE th1.entity_id = '353' UNION SELECT tn.nid AS nid FROM taxonomy_index tn INNER JOIN taxonomy_term__parent th ON tn.tid = th.entity_id INNER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id INNER JOIN taxonomy_term__parent th2 ON th1.parent_target_id = th2.entity_id WHERE th2.entity_id = '353';
+----+--------------+--------------+------------+------+--------------------------+------------------+---------+-----------------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+--------------+------------+------+--------------------------+------------------+---------+-----------------------+------+----------+--------------------------+
| 1 | PRIMARY | tn | NULL | ref | PRIMARY,term_node,bundle | term_node | 4 | const | 102 | 100.00 | Using index |
| 2 | UNION | th1 | NULL | ref | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | Using index |
| 2 | UNION | th | NULL | ref | PRIMARY,parent_target_id | parent_target_id | 4 | const | 5 | 100.00 | Using where; Using index |
| 2 | UNION | tn | NULL | ref | term_node | term_node | 4 | instyle.th.entity_id | 17 | 100.00 | Using index |
| 3 | UNION | th2 | NULL | ref | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | Using index |
| 3 | UNION | th1 | NULL | ref | PRIMARY,parent_target_id | parent_target_id | 4 | const | 5 | 100.00 | Using where; Using index |
| 3 | UNION | th | NULL | ref | PRIMARY,parent_target_id | parent_target_id | 4 | instyle.th1.entity_id | 223 | 100.00 | Using index |
| 3 | UNION | tn | NULL | ref | term_node | term_node | 4 | instyle.th.entity_id | 17 | 100.00 | Using index |
| NULL | UNION RESULT | <union1,2,3> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------+--------------+------------+------+--------------------------+------------------+---------+-----------------------+------+----------+--------------------------+
Far fewer rows are scanned and indexes are used.
using an INNER join instead of a WHERE IN (...) for the taxonomy filter
if the outer query constructed by views is inefficient (as can easily be the case with any non-trivial LEFT-join), the query as a whole will still remain inefficient because the efficient subselect is used in a WHERE IN (...) which MySQL only uses as a filter on the outer query result.
This can be made much more efficient by using an INNER join. Here's an example for how the patch will change the SQL (reformatted for readability again)
Existing code + the above fix
SELECT DISTINCT node_field_data.created AS node_field_data_created, node_field_data.nid AS nid FROM node_field_data node_field_data LEFT JOIN taxonomy_index taxonomy_index ON node_field_data.nid = taxonomy_index.nid LEFT JOIN node__field_promote_states node__field_promote_states ON node_field_data.nid = node__field_promote_states.entity_id AND node__field_promote_states.field_promote_states_value = 'landing' WHERE (((node__field_promote_states.field_promote_states_value IS NULL)) AND (node_field_data.nid IN (SELECT tn.nid AS nid FROM taxonomy_index tn WHERE tn.tid = '353' UNION SELECT tn.nid AS nid FROM taxonomy_index tn INNER JOIN taxonomy_term__parent th ON tn.tid = th.entity_id INNER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id WHERE th1.entity_id = '353'))) AND ((taxonomy_index.status = '1') AND (node_field_data.status = '1') AND (node_field_data.type IN ('article', 'recipe'))) ORDER BY node_field_data_created DESC LIMIT 6 OFFSET 5;
+-------------------------+--------+
| node_field_data_created | nid |
+-------------------------+--------+
| 1594378800 | 110686 |
| 1594360800 | 110446 |
| 1593946800 | 110321 |
| 1593342000 | 109946 |
| 1593241200 | 109896 |
| 1592823600 | 109591 |
+-------------------------+--------+
6 rows in set (0.08 sec)
New code
SELECT DISTINCT node_field_data.created AS node_field_data_created, node_field_data.nid AS nid FROM node_field_data node_field_data LEFT JOIN taxonomy_index taxonomy_index ON node_field_data.nid = taxonomy_index.nid LEFT JOIN node__field_promote_states node__field_promote_states ON node_field_data.nid = node__field_promote_states.entity_id AND node__field_promote_states.field_promote_states_value = 'landing' INNER JOIN (SELECT tn.nid AS nid FROM taxonomy_index tn WHERE tn.tid = '353' UNION SELECT tn.nid AS nid FROM taxonomy_index tn INNER JOIN taxonomy_term__parent th ON tn.tid = th.entity_id INNER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id WHERE th1.entity_id = '353') node_taxonomy_depth ON node_taxonomy_depth.nid = node_field_data.nid WHERE ((node__field_promote_states.field_promote_states_value IS NULL)) AND ((taxonomy_index.status = '1') AND (node_field_data.status = '1') AND (node_field_data.type IN ('article', 'recipe'))) ORDER BY node_field_data_created DESC LIMIT 6 OFFSET 5;
+-------------------------+--------+
| node_field_data_created | nid |
+-------------------------+--------+
| 1594378800 | 110686 |
| 1594360800 | 110446 |
| 1593946800 | 110321 |
| 1593342000 | 109946 |
| 1593241200 | 109896 |
| 1592823600 | 109591 |
+-------------------------+--------+
6 rows in set (0.01 sec)
EXPLAIN is not so useful for this one because the join via query makes for complicated reading - but you can clearly see the performance improvement when running the query.
Overall gains
Use MySQL profiling we can compare the original query versus the final query...
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | 0.49241525 | SELECT DISTINCT node_field_data.created AS node_field_data_created, node_field_data.nid AS nid FROM node_field_data node_field_data LEFT JOIN taxonomy_index taxonomy_index ON node_field_data.nid = taxonomy_index.nid LEFT JOIN node__field_promote_states node__field_promote_states ON node_field_data.n |
| 2 | 0.00789575 | SELECT DISTINCT node_field_data.created AS node_field_data_created, node_field_data.nid AS nid
FROM
node_field_data node_field_data
LEFT JOIN taxonomy_index taxonomy_index ON node_field_data.nid = taxonomy_index.nid
LEFT JOIN node__field_promote_states node__field_promote_states ON node_field_data.n |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
This shows that original query took 0.49241525 seconds to execute and the new query takes 0.00789575 seconds. This is 98% decrease in the time take to run the query!
Performance testing notes
node_field_data contains 34299 rows
taxonomy_index contains 129814 rows
taxonomy_term__parent contains 6712 rows
node__field_promote_states contains 96159 rows
Remaining tasks
User interface changes
N/a
API changes
N/a
Data model changes
N/a
Release notes snippet
Views that use either the taxonomy depth filter or taxonomy depth argument are massively more performant.