Quantcast
Channel: Issues for Drupal core
Viewing all articles
Browse latest Browse all 300745

Content: Has taxonomy term ID (with depth) query performance

$
0
0

Updated: Has taxonomy term ID (with depth) query performance

Problem/Motivation

When using views + taxonomy term id with depth, either as view argument or view filter, the MySQL query can have quite few subqueries and joins, sorting, which is very slow, specially when MySQL cannot cache these subqueries. MySQL will use temporary tables, wich can be really slow writing on the disk for large databases.

In the example below, this is the resulting query of a view using taxonomy term id with depth 3 as argument:

SELECT node.sticky                                                AS node_sticky
       ,
       nodequeue_nodes_node.position                              AS
       nodequeue_nodes_node_position,
       field_data_field_published_date.field_published_date_value AS
       field_data_field_published_date_field_published_date_value,
       node.nid                                                   AS nid
FROM   node node
       LEFT JOIN nodequeue_nodes nodequeue_nodes_node
              ON node.nid = nodequeue_nodes_node.nid
                 AND nodequeue_nodes_node.qid = '1'
       LEFT JOIN field_data_field_published_date field_data_field_published_date
              ON node.nid = field_data_field_published_date.entity_id
                 AND ( field_data_field_published_date.entity_type = 'node'
                       AND field_data_field_published_date.deleted = '0' )
WHERE  (( ( node.status = '1' )
          AND ( node.nid IN (SELECT tn.nid AS nid
                             FROM   taxonomy_index tn
                                    LEFT OUTER JOIN taxonomy_term_hierarchy th
                                                 ON th.tid = tn.tid
                                    LEFT OUTER JOIN taxonomy_term_hierarchy th1
                                                 ON th.parent = th1.tid
                                    LEFT OUTER JOIN taxonomy_term_hierarchy th2
                                                 ON th1.parent = th2.tid
                                    LEFT OUTER JOIN taxonomy_term_hierarchy th3
                                                 ON th2.parent = th3.tid
                             WHERE  ( ( tn.tid = '37' )
                                       OR ( th1.tid = '37' )
                                       OR ( th2.tid = '37' )
                                       OR ( th3.tid = '37' ) )) ) ))
ORDER  BY node_sticky DESC,
          nodequeue_nodes_node_position DESC,
          field_data_field_published_date_field_published_date_value DESC
LIMIT  10 offset 0;

The problem here is the sub query that is finding the nids to show on this index. In my sites database this query returns 5425 records in 1 second. It is slow because it is joining 5 tables, taxonomy index is one of them and has 103k records. Depth is terribly inefficient in databases that store hierarchy with parent references; the nested set model would be much better but that is something that would require a huge rewrite of drupal.

Proposed resolution

I believe the depth query can be achieved with no extra joins, but with a few other queries and some php. I suggest the tree of terms is loaded and we check the tid is in a set of taxonomy terms, such as:

SELECT node.sticky                                                AS node_sticky
       ,
       nodequeue_nodes_node.position                              AS
       nodequeue_nodes_node_position,
       field_data_field_published_date.field_published_date_value AS
       field_data_field_published_date_field_published_date_value,
       node.nid                                                   AS nid
FROM   node node
       LEFT JOIN nodequeue_nodes nodequeue_nodes_node
              ON node.nid = nodequeue_nodes_node.nid
                 AND nodequeue_nodes_node.qid = '1'
       LEFT JOIN field_data_field_published_date field_data_field_published_date
              ON node.nid = field_data_field_published_date.entity_id
                 AND ( field_data_field_published_date.entity_type = 'node'
                       AND field_data_field_published_date.deleted = '0' )
        INNER JOIN taxonomy_index ON node.nid = taxonomy_index.nid AND taxonomy_index.tid IN ( '37', '38', '39', '40',
                                                   '41', '42', '43', '44',
                                                   '45', '46', '47', '48',
                                                   '49', '50', '51', '52',
                                                   '35524', '53', '54', '56',
                                                   '57', '58', '59')
WHERE  (( ( node.status = '1' ) ))
ORDER  BY node_sticky DESC,
          nodequeue_nodes_node_position DESC,
          field_data_field_published_date_field_published_date_value DESC
LIMIT  10 offset 0; 

This query is much more efficient and as far as I can work out, will return exactly the same results.

Remaining tasks

Patch views_handler_argument_term_node_tid_depth.inc
Patch views_handler_filter_term_node_tid_depth.inc

Original report by jamiecuthill

I have a view emulates the taxonomy index page but uses depth to pull in nodes assigned to children of the current term (set to 3 in this case). This view also has some complex ordering based on sticky, nodequeue position and a date field.

The generated query looks something like this
SELECT node.sticky AS node_sticky, nodequeue_nodes_node.position AS nodequeue_nodes_node_position, field_data_field_published_date.field_published_date_value AS field_data_field_published_date_field_published_date_value, node.nid AS nid FROM node node LEFT JOIN nodequeue_nodes nodequeue_nodes_node ON node.nid = nodequeue_nodes_node.nid AND nodequeue_nodes_node.qid = '1' LEFT JOIN field_data_field_published_date field_data_field_published_date ON node.nid = field_data_field_published_date.entity_id AND (field_data_field_published_date.entity_type = 'node' AND field_data_field_published_date.deleted = '0') WHERE (( (node.status = '1') AND (node.nid IN (SELECT tn.nid AS nid FROM taxonomy_index tn LEFT OUTER JOIN taxonomy_term_hierarchy th ON th.tid = tn.tid LEFT OUTER JOIN taxonomy_term_hierarchy th1 ON th.parent = th1.tid LEFT OUTER JOIN taxonomy_term_hierarchy th2 ON th1.parent = th2.tid LEFT OUTER JOIN taxonomy_term_hierarchy th3 ON th2.parent = th3.tid WHERE ( (tn.tid = '37') OR (th1.tid = '37') OR (th2.tid = '37') OR (th3.tid = '37') ))) )) ORDER BY node_sticky DESC, nodequeue_nodes_node_position DESC, field_data_field_published_date_field_published_date_value DESC LIMIT 10 OFFSET 0;

The problem here is the sub query that is finding the nids to show on this index. In my sites database this query returns 5425 records in 1 second. It is slow because it is joining 5 tables, taxonomy index is one of them and has 103k records. Depth is terribly inefficient in databases that store hierarchy with parent references; the nested set model would be much better but that is something that would require a huge rewrite of drupal.

Now I believe the depth query can be achieved with no extra joins, but with a few other queries and some php. I suggest the tree of terms is loaded and we check the tid is in a set of taxonomy terms, such as:
SELECT node.sticky AS node_sticky, nodequeue_nodes_node.position AS nodequeue_nodes_node_position, field_data_field_published_date.field_published_date_value AS field_data_field_published_date_field_published_date_value, node.nid AS nid FROM node node LEFT JOIN nodequeue_nodes nodequeue_nodes_node ON node.nid = nodequeue_nodes_node.nid AND nodequeue_nodes_node.qid = '1' LEFT JOIN field_data_field_published_date field_data_field_published_date ON node.nid = field_data_field_published_date.entity_id AND (field_data_field_published_date.entity_type = 'node' AND field_data_field_published_date.deleted = '0') WHERE (( (node.status = '1') AND (node.nid IN (SELECT tn.nid AS nid FROM taxonomy_index tn WHERE ( (tn.tid IN ('37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '35524', '53', '54', '56', '57', '58', '59')) ))) )) ORDER BY node_sticky DESC, nodequeue_nodes_node_position DESC, field_data_field_published_date_field_published_date_value DESC LIMIT 10 OFFSET 0;

This query is much more efficient and as far as I can work out, will return exactly the same results.

I have attached a patch with my code changes for this and I would really like some feedback as to whether it's worth considering.


Viewing all articles
Browse latest Browse all 300745

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>