From #1847768-84: Remove ip_address() by Damien Tournoud:
I specifically mentioned accessing the current request from the side... If you do that, there is no caching possible at the sub-request level anymore... If your assumption is that everyone is being to play nice and that you can predict the outcome of everything centrally, you are designing a web framework (like Symfony), not a web platform like Drupal.
My (effulgentia's) interpretation of the problem:
- Much of the Symfony related integration done for D8 was to enable caching of partial html responses (blocks and _content controllers).
- To do that effectively, blocks and _content responses need to identify all of the request data that they depend on. For example, do they vary by user role, by user, by client IP address, etc. The implementation details of how these dependencies get identified is still being worked out, but are mostly irrelevant to the higher level discussion needed in this issue.
- However, in Drupal, we have hooks. For example, hook_node_view(). These hooks don't know which blocks and _content controllers end up (indirectly) invoking them. For example, there can be many different blocks and _content controllers that render nodes. Neither node.module nor any particular implementation of hook_node_view() knows what all of them are.
- If one of these hook implementations ends up calling Drupal::request() to get information from the request that wasn't listed as an explicit dependency of the block or _controller, and uses that information to affect what gets displayed, then the cache will get poisoned: a following request might serve the cached version despite it containing output not suitable to the new request. One example of this is #914382: Contextual links incompatible with render cache, which found contextual links poisoning the forum block cache. Another example is comment_node_view() adds a link for how many comments are new to the user (but only if history.module is enabled). We can likely find more examples in core, and many more examples in contrib. #1605290: Enable entity render caching with cache tag support is partially addressing the issue for entities, but there are likely many non-entity hooks to consider as well.
Options
- Put a warning on Drupal::request() (that's already in HEAD), fix all core use cases where cache poisoning occurs (like the contextual links and comment history examples), and leave it up to contrib developers to figure out their own cache poisoning issues themselves.
- Make cache poisoning impossible by cleansing the $request that represents the subrequest to only that information that is listed as a dependency of that block / _content controller. That way, when a hook calls Drupal::request(), it only has access to the subset of request information on which cache can vary. For example, if the block / _content didn't list ip address as a dependency, then for any hooks that run within that subrequest context, calling Drupal::request()->getClientIp() will return NULL. This necessitates giving modules a way to alter the dependency list of subrequests, but just that much is easy. However, it puts the burden on module developers to figure out which blocks and _content routes need to be altered.
- Whether or not we do the $request cleansing, we can also try to come up with a context mediator between the module implementing a hook and the blocks / _content controllers that indirectly invoke that hook. For example, a module implementing hook_node_view() with per-user information could somehow identify that there's a per-user dependency for any block / _content subrequest that depends on a $node, and then let the mediator be responsible for altering the routes accordingly. We probably need a bit more analysis on whether there are use cases without as clear a relationship between hook and route parameter, and how to mediate those.
Discuss!