Problem/Motivation
We have a site where some menus have numeric machine names (i.e. menu title is 'Transport', but its machine name is '583'). I don't know the reason for this (probably they were migrated or created via code to match some legacy system), but this can also be created easily via menu UI.
We noticed occasional problems with these menus, causing them to disappear from the pages where they would normally appear on. After much debugging, I found out that the problem comes from corrupted entries in cache_menu (cid begins with active-trail:....), which affect menus with number-like machine names.
This issue has happened for us on multiple Drupal 9.x versions. I was able to replicate and debug it in Drupal 9.5.11, and then replicate it on 10.1.6 also.
In the following steps to reproduce this bug, I used two menus, which I called 2 and 4. Note that the bug is independent of these values.
Steps to reproduce
Phase 1. Initial corruption of the cached entries
1. install a fresh Drupal site (standard profile is enough)
2. create a basic page (e.g. /node/1)
3. create two menus with numeric machine name (call them 2 and 4 and any title you want) and one link in each:
2
link 2.1 (pointing to any url)
4
link 4.1 (pointing to any url)
4. place two blocks (e.g. in content area) to display these two menus on /node/1 (i.e. Restrict to certain pages set to /node/1)
5. go to /node/1 and confirm the menus are showing
6. check the cache_menu table and look at the active-trail entry for that page:
drush sql-query "select data from cache_menu where cid='active-trail:route:entity.node.canonical:route_parameters:a:1:{s:4:\"node\";s:1:\"1\";}'"
You should see something like this:
a:4:{s:4:"main";a:2:{s:54:"menu_link_content:53f35910-253f-4c3f-9089-abd5884416a3";s:54:"menu_link_content:53f35910-253f-4c3f-9089-abd5884416a3";s:0:"";s:0:"";}s:7:"account";a:1:{s:0:"";s:0:"";}i:2;a:1:{s:0:"";s:0:"";}i:4;a:1:{s:0:"";s:0:"";}}
Note that the last two entries in this serialized data are i:2 and i:4 (i.e. the machine names of the menus, but converted to integer values).
7. rebuild the cache using drush cr
or at least clear these bins drush cc bin menu render page dynamic_page_cache
8. send multiple simultaneous requests for /node/1 page, either via browser using F5 multiple times very fast, or (better) via commands:
wget -qO /dev/null https://drupal.sandbox.local/node/1 &
wget -qO /dev/null https://drupal.sandbox.local/node/1 &
wget -qO /dev/null https://drupal.sandbox.local/node/1 &
wget -qO /dev/null https://drupal.sandbox.local/node/1 &
wget -qO /dev/null https://drupal.sandbox.local/node/1 &
wget -qO /dev/null https://drupal.sandbox.local/node/1 &
9. check again the cache_menu table:
drush sql-query "select data from cache_menu where cid='active-trail:route:entity.node.canonical:route_parameters:a:1:{s:4:\"node\";s:1:\"1\";}'"
This time you will see something like this:
a:6:{s:4:"main";a:1:{s:0:"";s:0:"";}s:7:"account";a:1:{s:0:"";s:0:"";}i:0;a:1:{s:0:"";s:0:"";}i:1;a:1:{s:0:"";s:0:"";}i:2;a:1:{s:0:"";s:0:"";}i:3;a:1:{s:0:"";s:0:"";}}
Note that i:2 and i:4 have been renumbered (by array_merge here https://github.com/drupal/core/blob/11.x/lib/Drupal/Core/Cache/CacheColl...) into i:0 and i:1 and also duplicated as i:2 and i:3. This is according to https://www.php.net/manual/en/function.array-merge.php ("Values in the input arrays with numeric keys will be renumbered with incrementing keys starting from zero in the result array."). Basically, two copies of the active trail (one created by current request and another one cache microseconds before by another request), got merged together: the text-like menus were kept (a single entry for each), but the numeric-like menus were renumbered and duplicated.
If you don't see this, try again to clear the cache and run the wget commands (maybe add some more to simulate a busier site).
Depending on the site load (i.e. number of simultaneous requests that don't find the active-trail cache entry) and the number of menus with numeric machine names, you might see tens or hundreds of such entries.
Phase 2. Additional corruption of the cached entries
Moreover, when entries in the dynamic pages cache expire, this problem will be increased even more. Each request that runs into this use case, will copy again the numeric entries. The steps to replicate this are (run them multiple times):
1. Use the steps from Phase 1. to corrupt the cache.
2. Clear page caches (don't clear entire cache as that will cancel previous step):
drush cc bin dynamic_page_cache page
3. load the /node/1 page (just one page request is enough, no need for parallel ones):
wget -qO /dev/null https://drupal.sandbox.local/node/1
4. view the cached entry
drush sql-query "select data from cache_menu where cid='active-trail:route:entity.node.canonical:route_parameters:a:1:{s:4:\"node\";s:1:\"1\";}'"
You will see this:
a:7:{s:4:"main";a:1:{s:0:"";s:0:"";}s:7:"account";a:1:{s:0:"";s:0:"";}i:0;a:1:{s:0:"";s:0:"";}i:1;a:1:{s:0:"";s:0:"";}i:2;a:1:{s:0:"";s:0:"";}i:3;a:1:{s:0:"";s:0:"";}i:4;a:1:{s:0:"";s:0:"";}}
5. Repeat previous three steps and you will see this:
a:8:{s:4:"main";a:1:{s:0:"";s:0:"";}s:7:"account";a:1:{s:0:"";s:0:"";}i:0;a:1:{s:0:"";s:0:"";}i:1;a:1:{s:0:"";s:0:"";}i:2;a:1:{s:0:"";s:0:"";}i:3;a:1:{s:0:"";s:0:"";}i:4;a:1:{s:0:"";s:0:"";}i:5;a:1:{s:0:"";s:0:"";}}
6. Repeat previous step and will see the cached entry having more and more numeric entries.
Phase 3. Disappearing blocks due to corrupted data
Eventually, if the site runs long enough with corrupted cache, the bad entries start to overwrite the correct entries coming from numeric menus, causing various problems with them (e.g. menus to disappear from site pages). This can be replicated like this:
1. edit the basic content type to allow adding these nodes to menu 2 and 4
2. edit node/1 and add a menu entry for it in menu 4
3. edit menu 4 to look like this (link 4.1 is sublink of node 1 link)
4
node 1 (added in step 2 before)
link 4.1 (pointing to any url)
4. edit the block that displays menu 4 and set Initial visibility level to 2
5. clear the cache (drush cr)
6. visit /node/1 and check that link 4.1 (from menu 4) shows on that page
7. Look at the cached entry:
drush sql-query "select data from cache_menu where cid='active-trail:route:entity.node.canonical:route_parameters:a:1:{s:4:\"node\";s:1:\"1\";}'"
You will see something like this (with different UUIDs):
a:4:{s:4:"main";a:1:{s:0:"";s:0:"";}s:7:"account";a:1:{s:0:"";s:0:"";}i:2;a:1:{s:0:"";s:0:"";}i:4;a:2:{s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:0:"";s:0:"";}}
Notice the value for i:4 (menu 4) is
i:4;a:2:{s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:0:"";s:0:"";}
because the current page request is in the active trail of the menu.
8. corrupt the cache (using the steps 7-8 from Phase 1 above, but use more wget commands. I used 30)
9. visit /node/1 and check that link 4.1 disappeared from the page (you might need to run previous step multiple times to make it happen, or just use more wgets).
10. Look at the cached entry:
drush sql-query "select data from cache_menu where cid='active-trail:route:entity.node.canonical:route_parameters:a:1:{s:4:\"node\";s:1:\"1\";}'"
You will see something like this:
a:12:{s:4:"main";a:1:{s:0:"";s:0:"";}s:7:"account";a:1:{s:0:"";s:0:"";}i:0;a:1:{s:0:"";s:0:"";}i:1;a:2:{s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:0:"";s:0:"";}i:2;a:1:{s:0:"";s:0:"";}i:3;a:2:{s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:0:"";s:0:"";}i:4;a:1:{s:0:"";s:0:"";}i:5;a:2:{s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:0:"";s:0:"";}i:6;a:1:{s:0:"";s:0:"";}i:7;a:2:{s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:0:"";s:0:"";}i:8;a:1:{s:0:"";s:0:"";}i:9;a:2:{s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:54:"menu_link_content:e7935b9c-1992-4752-931c-e017efba0be0";s:0:"";s:0:"";}}
Note that the same values were copied over and over again. Also notice that the entry for i:4 has changed to
i:4;a:1:{s:0:"";s:0:"";}
(i.e. an empty array), which means the current page is not in active trail of the menu (which causes the menu block to be hidden instead of visible).
11. clear the cache (drush cr or drush cc bin menu dynamic_page_cache page)
12. visit /node/1 and check that link 4.1 (from menu 4) shows again on that page
Proposed resolution
Apply provided patch (which changes MenuActiveTrail.php to use set method instead of changing directly the storage property).
Remaining tasks
Test patch
User interface changes
None
API changes
None
Data model changes
None
Release notes snippet
Fix cache_menu bug affecting menus with numeric machine names