Taxonomy Term Reference Field tables are missing a critical unique index.
Each such table has a common set of fields:
entity_type
bundle
deleted
entity_id
revision_id
language
delta
taxonomy_vocabulary_NN_tid
where NN is the internal numeric id of the vocabulary in question.
The characteristic MySQL create syntax for these tables is:
CREATE TABLE `field_data_taxonomy_vocabulary_11` (
`entity_type` varchar(128) NOT NULL DEFAULT '' COMMENT 'The entity type this data is attached to',
`bundle` varchar(128) NOT NULL DEFAULT '' COMMENT 'The field instance bundle to which this row belongs, used when deleting a field instance',
`deleted` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'A boolean indicating whether this data item has been deleted',
`entity_id` int(10) unsigned NOT NULL COMMENT 'The entity id this data is attached to',
`revision_id` int(10) unsigned DEFAULT NULL COMMENT 'The entity revision id this data is attached to, or NULL if the entity type is not versioned',
`language` varchar(32) NOT NULL DEFAULT '' COMMENT 'The language for this data item.',
`delta` int(10) unsigned NOT NULL COMMENT 'The sequence number for this data item, used for multi-value fields',
`taxonomy_vocabulary_11_tid` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`entity_type`,`entity_id`,`deleted`,`delta`,`language`),
KEY `entity_type` (`entity_type`),
KEY `bundle` (`bundle`),
KEY `deleted` (`deleted`),
KEY `entity_id` (`entity_id`),
KEY `revision_id` (`revision_id`),
KEY `language` (`language`),
KEY `taxonomy_vocabulary_11_tid` (`taxonomy_vocabulary_11_tid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Data storage for field 6 (taxonomy_vocabulary_11)';
There is a problem with this state of affairs. Namely, it allows a single entity to have multiple references to the same term (i.e., term id) for the same value of the language field.
I encountered this shortcoming in practice. My site utilizes a number of taxonomies, some flat, some a deep tree, some very large. During my prototyping of node displays, I noticed that terms within multiple vocabularies were being displayed multiple times for many nodes.
Upon inspecting the DB, sure enough I found duplicate data rows, the only difference being the value of the 'delta' field. Well, that is merely an array index. For any given combination of the entity_id, language, and taxonomy_vocabulary_NN_tid fields, there should never be more than one (1) row for a given term id. But there were many, across many vocabularies.
To rectify the situation for my site, I ran ALTER TABLE queries of the form:
ALTER IGNORE TABLE field_data_taxonomy_vocabulary_66 ADD UNIQUE INDEX (entity_id, language, taxonomy_vocabulary_66_tid), FORCE;
against the MySQL 5.5.xx DB that my site sits on.
This results in removal of duplicate rows, i.e., the ones conflicting with the new unique index. The corresponding create syntax becomes:
CREATE TABLE `field_data_taxonomy_vocabulary_11` (
`entity_type` varchar(128) NOT NULL DEFAULT '' COMMENT 'The entity type this data is attached to',
`bundle` varchar(128) NOT NULL DEFAULT '' COMMENT 'The field instance bundle to which this row belongs, used when deleting a field instance',
`deleted` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'A boolean indicating whether this data item has been deleted',
`entity_id` int(10) unsigned NOT NULL COMMENT 'The entity id this data is attached to',
`revision_id` int(10) unsigned DEFAULT NULL COMMENT 'The entity revision id this data is attached to, or NULL if the entity type is not versioned',
`language` varchar(32) NOT NULL DEFAULT '' COMMENT 'The language for this data item.',
`delta` int(10) unsigned NOT NULL COMMENT 'The sequence number for this data item, used for multi-value fields',
`taxonomy_vocabulary_11_tid` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`entity_type`,`entity_id`,`deleted`,`delta`,`language`),
UNIQUE KEY `entity_id_2` (`entity_id`,`language`,`taxonomy_vocabulary_11_tid`),
KEY `entity_type` (`entity_type`),
KEY `bundle` (`bundle`),
KEY `deleted` (`deleted`),
KEY `entity_id` (`entity_id`),
KEY `revision_id` (`revision_id`),
KEY `language` (`language`),
KEY `taxonomy_vocabulary_11_tid` (`taxonomy_vocabulary_11_tid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Data storage for field 6 (taxonomy_vocabulary_11)';
Here are the musical instruments associated with 5 nodes prior to my addition of a unique index to the offending table. Note that 4 display duplicate data.
Following surgery, the same nodes have been fixed up.
Attachment | Size | Status | Test result | Operations |
---|---|---|---|---|
data duplication.png | 138.52 KB | Ignored: Check issue status. | None | None |
no duplicates.png | 79.72 KB | Ignored: Check issue status. | None | None |
field-data-taxonomy-vocabulary-11-table-fields.png | 111.36 KB | Ignored: Check issue status. | None | None |