Problem/Motivation
Drupal core ships with cache tags support - awesome.
To ensire that tags for an item were not invalidated on cache read a CacheTagsChecksumInterface service is used (eventuially) and D8 core provides such a service as a DB variant.
My issue is with the cache tags sub-system internals. As it is implemented, the list of cache tags on the platform will grow endlessly due to the DatabaseCacheTagsChecksum implementation.
In the following scenario with a highly-volatile custom entities - add 100k instances, delete them, add new 100k back.
The system will end up with a 200k cache tags in the table, 100k of them will not be used again ever. They will just stay there and clutter the database and cause overall slow-downs. Imagine when this process continues for a while...
Even after a full cache clear (that happens only rarely) all tags are still kept there.
In my case the cache tags is the highest throughput and biggest time consumer as a DB query in the whole of system, even though it's fast on average. I have around 40-50k valid entities in the system and around 120-130k cache tags in the table.
I think this problem affects only the DB implementation, as Memcache and Redis (if they have implementations on the interface) will scale in O(1) compared to Log(N) based on the amount of data in the system. On top of that they have a robust garbage collection mechanisms in case of memory pressure. SQL databases have none of that.
Proposed resolution
Can we have the list of cache tags on the portal truncated when the whole cache get's cleared.
As I suspect this should not be a problem, as the whole set of cache was just invalidated either way.
I think cache tags should be deleted whenever content is deleted on the system.
We should also consider deleting cache tag entries whenever the related entity is deleted as well (if possible). For example: delete of node with ID 1 will delete the node:1
tag from cachetags as well. Cache check-sums that depend on it will be invalidated based on the checksum, as the counter will not be valid (0) instead of anything that was present before.
Any other ideas are welcome.
Remaining tasks
Discussion, decision, patch...
User interface changes
None.
API changes
TBD. None expected.
Data model changes
TBD. None expected.
Release notes snippet
TBD.