Quantcast
Channel: Issues for Drupal core
Viewing all articles
Browse latest Browse all 291943

Introduce #post_render_cache callback to allow for personalization without breaking the render cache

$
0
0

Updated: Comment #0

Definition of "personalization"

There are many kinds of personalization. Personalization is based on one or more contexts: user, geographical location of the user, time, a specific node ("X new comments" on this node for user X), organic group, and so on.

Let's look at an example. Usually in Drupal, a certain page contains some content that is the same for everybody (e.g. viewing node 2118703 on d.o), but some parts of the page are unique for each user (e.g. the comment "new" markers). Because the comment "new" markers are different for every user (and even different on every page load for every user), this means that we cannot cache the resulting HTML. If we would, then the same and hence wrong"new" marker would be shown to each user. So that means we have to generate the *entire* page for each user on every single page load. That is very expensive. The consequence is that each page load is slower, because the entire page has to be built, there is not a single blob of HTML that can be cached and reused.

That is a very specific example, but it's a problem that we frequently encounter in Drupal core. We have the comment "new" and "updated" markers. The "X new comments" links on node teasers. The "edit" link on comments that is user-specific when "edit own comments" permission is enabled. Contextual links are page-specific (due to the ?destination=<current path> query string). Some kind of "Login block" on many sites (see the "Logged in as ." at the top of this page if you're logged in). And so on, and so on.

So, effectively, we're saying "screw caching" in much of Drupal core. Often, the personalization is nested somewhere deep in something that's otherwise perfectly cacheable (e.g. the "edit" links on comments is user-specific, but the rendered node is not; because comments are rendered inside of the node, this prevents any caching of nodes)!

That's what we're trying to solve.

Roughly, there are two types of personalized content:

  1. Primary content (e.g. the rendered blog post node)
  2. Secondary content (e.g. "new" content marker on comments, contextual links metadata, comment "edit" links, toolbar menu tree …)

For primary content, it's essential that it's part of the delivered HTML: otherwise the user would have to wait for it to show up (which is bad UX and bad perceived performance). For a lot of secondary content (like the ones above), it is okay for there to be a slight delay for it to show up. Then there are things in between, such as a "related content" block and a "navigation" block. For those, it may or may not be acceptable to show up with a delay. It depends on the site.

Besides that, there is a personalization data size concern: does the personalization data require a lot of data (which is probably always true for primary content) or very little (which is often but not always true of secondary data: it's not true for the toolbar menu tree). Because of the greatly varying sizes of different personalized things, different delivery mechanisms may be suitable.

Problem/Motivation

Short problem description

In the current state of drupal_render()& the render cache, any personalization breaks the render cache, unless you do it in JavaScript.

Full problem description

The only way to personalize and not break the render cache with the current API, is to implement a solution like the one in #1991684: Node history markers (comment & node "new" indicator, "x new comments" links) forces render caching to be per user :

  1. embed universal truths as data- attributes in the HTML
  2. attach JavaScript that will do the personalization
  3. let the JavaScript make requests to retrieve user-specific data if necessary
  4. apply heuristics and client-side caching (localStorage/sessionStorage) to minimize the number of HTTP requests

However, for some sites/hosting setups/use cases, this is bad: some sites prefer to avoid the HTTP requests and embed the necessary data in the page itself, yet other sites prefer to avoid JavaScript altogether.

So far, no other solution was possible, because we didn't have any way to do react to loading data from the render cache.

Proposed resolution

Short proposed resolution description

Introduce #post_render_cache callbacks.

Requirements

Any solution must meet these six requirements:

  1. Conceptually easy to reason about. In other words: sufficiently simple DX. (Points 2, 3 and 4 aid in this.)
  2. It should be possible to use the same mechanism to either replace a placeholder in the markup, or attach additional JavaScript settings/libraries, or both.
  3. Must continue to work when multiple render cacheable things are merged into one (i.e. #cache is set on an element, but also on its child or grandchild or …). In other words: nested #post_render_cache callbacks must continue to work even when they're no longer nested, after having been stored in/retrieved from the render cache
  4. Even when not using render caching (i.e. an element that does not have #cache set), #post_render_cache callbacks must be executed. This allows (contrib module) developers to inject/alter render arrays with personalized data without breaking the render cache, and more importantly, without having to implement the same functionality in two ways: one implementation for when render caching is disabled, another for when it is enabled.
  5. Must have a unique/random identifier for each placeholder, to guarantee the #post_render_cache callback will never accidentally replace user-generated content.
  6. The default solution should be optimized to work out-of-the-box on any hosting. Roughly, there are two broad types of hosting: shared hosting and enterprise hosting (with a reverse proxy such as Varnish in front of the web servers). There are more types, but these are the extremes. Drupal core should accommodate both cases out of the box, and if that is impossible, it should favor shared hosting.

Three desirable "personalized page" delivery mechanisms

After careful deliberation and prototyping, I've come up with three "personalized page" delivery mechanisms that are needed to cover all use cases and requirements.

Two out of three are impossible today.

They can all be implemented using a single new #post_render_cache callback!

Where "personalized page" is defined as a page that contains something-specific (user-specific, request-specific, end user location-specific …) content.

Finally: a soft assumption is that in the final version of Drupal 8, JSON/AJAX requests will not require a full bootstrap, i.e. will be cheaper than HTML requests.

1. "Only HTML"

At first sight, this is the most desirable delivery mechanism: no JavaScript!

In a nutshell: store placeholders in the render cache, then have some kind of #post_render_cache callback that receives the rendered HTML, finds the placeholders and replaces them with the personalized content.
This is the only mechanism that is acceptable for primary content (e.g. rendered nodes), because it guarantees the personalized content is available as soon as the user sees the page. It is suitable for large chunks of content.

Once we start thinking it through, some significant downsides arise:

  1. Inherently incompatible with a reverse proxy-based setup (Varnish, CDN …), because any page that contains personalized contents.
  2. Inherently less scalable/higher server hosting cost, because each page needs to be completely built on the server side.
  3. Inherently higher minimum page load time, because the server inherently needs more time to generate the page (to find and replace the placeholders).

So, while this seems better in general, it in fact is worse for the majority of sites out there. On shared hosting, sites would be more likely to fail when slash dotted. On enterprise hosting, the reverse proxy is barely effective.

Only when a number of highly specific requirements are met, this is the better delivery mechanism:

  1. The site has already been optimized very heavily for performance: it contains only the essential JavaScript, has very few images, the theme removes all Drupal contrib modules' CSS and reimplements it and loads in <1 s on powerful devices connected to fast networks.
  2. Your site has very important visitors that access the site via high-latency networks on low-CPU power devices and you need to serve the page in <1 s on these devices as well.
  3. Increased server hosting cost is acceptable.

You see, unless you already fulfill requirement number 1, you probably wouldn't even see the benefit of going "only HTML". Without requirement number 2, there is no compelling reason to go "only HTML". And requirement number 3 is a consequence.

IMHO it's clear that this is a bad default for Drupal. On the other hand, it is crucial that we at least support it.

2. JavaScript + HTTP requests + localStorage/sessionStorage

This is the mechanism we've used to prevent these from breaking the render cache (because it's the only method supported by Drupal core today):

In a nutshell: personalized content is hidden or non-personalized initially, and contains a placeholder or identifier. Some #attached JavaScript then finds these placeholders or identifiers and renders the personalized content. To do that, it may have to either talk to localStorage (fast & cheap) or … perform a HTTP request to talk to the server (slow & expensive).
This mechanism is not acceptable for primary content (e.g. rendered nodes) because it may depend on a round trip to the server (and hence network latency) to retrieve the content. Hence it should only be used for secondary content (e.g. "new" content marker). It is suitable for large chunks of content. It is also suitable for metadata or tiny chunks of content.

The downside is extremely obvious: HTTP requests. This keeps a mobile device's antenna awake longer than necessary, thus draining the battery more. Not only that, but on every device, more HTTP requests implies a slower website — though especially on a mobile device.

However, the upside is that we can leverage heuristics and client-side caching to avoid letting the server doing any work at all: only talk to the server when heuristics don't allow you to avoid doing so, and even in that case, first check if the needed data isn't already inlocalStorage/sessionStorage to avoid HTTP requests. When done well, this can actually decrease server load.

It is much better than "only HTML" for enterprise hosting, because this is inherently compatible with reverse proxy-based setups.

The big downside of this approach is not just "HTTP requests" but that if you have multiple personalized things on the page, that results in multiple HTTP requests. Particularly for shared hosting, that could become problematic: they're typically less optimized, which means that the minimum response time will be higher, resulting in less-than-stellar perceived performance and higher server load.

3. JavaScript + drupalSettings

In a nutshell: personalized content is hidden or non-personalized initially, and contains a placeholder or identifier. The attached JavaScript then finds these placeholders or identifiers and renders the personalized content. To do that, it merely has to look the data in drupalSettings, which were added to the page's <HEAD> using some kind of #post_render_cache callback.
This mechanism is not acceptable for primary content (e.g. rendered nodes) because it would cause an excessively large drupalSettings JavaScript declaration in the HTML <HEAD>. Hence it should only be used for secondary content (e.g. "new" content marker). It is not suitable for large chunks of content. It is only suitable for metadata or tiny chunks of content.

The downside is extremely obvious: inherently incompatible with reverse proxy setups, i.e. enterprise hosting.

However, the upside is that no additional HTTP requests are necessary, since all the necessary information is already in the HTML response itself. Since we don't want to break the render cache, we just #attach the necessary information.

This means everything in the <BODY> can be render cached. And only the <HEAD> needs to be generated dynamically. (Note: this distinction is exactly what #2068471: Normalize Controller/View-listener behavior with a Page object aims to formalize.)

This is clearly faster than the "Only HTML" approach, because there's no "let's do many string replacements on a huge blob of HTML" going on.

Why not just use ESI

ESI does not scale very well when handling large numbers of ESI tags, which is exactly the personalized content in Drupal core that makes render caching difficult: contextual links, comment edit links, comment author classes etc. can all appear up to hundreds of times per page each. Each cache miss from ESI tags (or uncacheable requests) means a separate call back to Drupal. Given entities might appear dozens or hundreds of times on a single page, with multiple personalized elements per entity this could lead to thousands of requests against Drupal to render a single page. ESI is great for caching different areas of content with different TTLs and cache keys (roughly equivalent to block level rather than within a block), and the approach here is compatible with using ESI, ensuring a much higher cache hit rate when using the client side replacement.

Conclusion

Delivery mechanisms 1 and 3 are impossible in today's Drupal 8, because we don't have something like #post_render_cache. Hence mechanism 2 is what we have today: it was the only possible solution that didn't break the render cache.

Delivery mechanism 1 is not very desirable: it's only suitable for extremely optimized, high-budget sites.

Delivery mechanism 2 is ideal for enterprise hosting.

Delivery mechanism 3 is ideal for shared hosting.

To accommodate both shared and enterprise hosting, we should default to delivery mechanism 3 and make it easy for enterprise hosting to override it to delivery mechanism 2. And in fact, that is trivial: alter away all #post_render_cache functions that are responsible for attaching personalized data in drupalSettings!
The only requirement is that the accompanying JavaScript first checks whether the necessary information is available in drupalSettings, and if it is not, then it should make the necessary HTTP request.

And again: they can all be implemented using a single new #post_render_cache callback!

Note: delivery mechanism 2 is also ideal when the number of personalized pieces of content per page is low, i.e. when there would only be few HTTP requests over time, because the necessary data would usually be in localStorage. So sites would have the ability to evaluate both approaches, with almost zero changes in the code, to find the best balance of "embed information in drupalSettings" versus "leverage localStorage". It is even possible to use a hybrid approach, where you use delivery mechanism 3 on a landing page to prime the client-side cache for when the user navigates to other pages that are personalized using mechanism 2!

Proposal

I propose to ship Drupal with personalized page delivery mechanism 3 implemented by default. As explained in the above conclusion, it is then easy to have the same JavaScript also support delivery mechanism 2. Combined, they achieve the most critical requirements.

That being said, delivery mechanism 1 should also be made possibly to achieve through #post_render_cache. There are valid use cases for this. (Sites that may not use JavaScript or whose users are connected through very high latency networks.)

Remaining tasks

  1. Comprehensive write-up to explain all considerations (see above)
  • Validate feasibility by implementing #post_render_cache, support all three identified "personalized page" delivery mechanisms
  • Comprehensive test coverage to validate correctness/ensure lack of side effects (!)
  • Implement for concrete use cases in Drupal core
  • Build consensus

User interface changes

None.

API changes

  • New #post_render_cache callback.
  • New #type = render_cache_placeholder element type.
  • Private API change: drupal_render_cache_get() now returns the cached $element rather than its output (`$element['#markup']`), to allow #post_render_cache callbacks to run.

Thanks

Go to catch and amateescu, with whom I had several discussions about this :)

Frequently Asked Questions

Why not simply move #post_render to run after the render cache instead of introducing a new callback?
  1. Makes it impossible to have a post-render callback whose results are cached in the render cache.
  2. Would break existing semantics: #post_render callbacks receive the rendered #children as a string and the element itself, #post_render_cache needs to receive the element itself, the context, and optionally a token to uniquely identify the placeholder. As you can see, very, very different.
With which "personalized page" delivery mechanisms can I use the new #type = render_cache_placeholder element type?
Only with mechanism 1, because only that mechanism is suited for primary content. (See the "in a nutshell" parts for each mechanism as to why that is.) Furthermore, it's only designed for "replace with chunk of HTML".
Mechanisms 2 and 3 are about conditionally attaching CSS/JS and getting personalization metadata to the client, and then having some attached JavaScript use that metadata. Mechanisms 2 and 3 are intended only for use cases like "new/updated" content markers, flags/bookmarks, links, and so on.

Viewing all articles
Browse latest Browse all 291943

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>