Quantcast
Channel: Issues for Drupal core
Viewing all articles
Browse latest Browse all 294415

Add entity caching to core

$
0
0

Updated: Comment #260

Problem/Motivation

Loading entity from the database is slow. Configurable fields have their own cache handing, but entities can now have up to 4 base tables, from which multiple translations might be loaded.

Content entities are optimized for the case when they are initialized with all the values, base fields and configurable fields, which are then stored in the internal ->values array. The necessary field classes are only created when the fields are accessed through the API. Even though configurable fields are cached, they need to be set on already created entity objects, which has to initialize all those field objects.

Proposed resolution

The basic idea is to generalize the concept of the configurable field cache and apply it to the entity itself and all values within.

For now, the entity is built just like before, we query the different base tables, create the entity objects, then call the methods to load the field items/values. #2137801: Refactor entity storage to load field values before instantiating entity objects will eventually try to call the field methods before constructing the entity objects.

The entity itself now implements the PrepareCacheInterface, so that we can ask it to return all cacheable data. This also allows to return values added to entities that are not defined fields, so we can cache them too. Those are typically values added in hook_entity_load() implementations. Internally, it has more or less the same code as we currently have for configurable fields, just applied to all fields. We then save those values in the persistent cache. (Explicitly kept to highlight the difference to the old implementation)

Now that PrepareCacheInterface has been removed, there is no special processing necessary, instead, the $entity object can be stored in the cache directly, which will then serialize it, which will call __serialize() on it, which already ensures that the entity is optimized and no unnecessary definitions/objects are serialized. The only exception are language objects, which are currently added as an optimization. @todo: Open issue to add langcode() that should make that unnecessary.

When loading the entity from the persistent entity, we can just unserialize. At this point, only the field item objects that are explicitly requested are created. With the entity key cache (#2182239: Improve ContentEntityBase::id() for better DX) and the render cache, this means almost none, and some of the remaining ones will hopefully go away soon. Only loading the values on the first access would only be useful for cache misses with this system, in which case it is very likely that it will be accessed as depending caches will likely also have cache misses.

Because some hook_entity_load() implementations are fairly dynamic (comment statistics, translation metadata), the patch introduces a hook_entity_storage_load() which is cached, so implementations can chose which one they want.

The flow of loadMultiple now looks like this:

- Get already loaded entities from the static cache.
-- For remaining entities, check the persistent cache
--- Create entity objects for entities loaded from the persistent cache as explained above.
-- Load entities that were not found in the persistent cache from the database.
--- Create entities with the base field values
--- Load configurable field values and add them to entity objects
--- Call storage load hook.
--- Put entities loaded from storage into the persistent cache.
-- Call hook_entity_load() on entities from database/persistent cache. This allows modules that want to attach non-cacheable (dynamic/personalized?) values to an entity. Also calls ::postLoad().
-- Put entities from persistent cache and database into static cache
- Combine all requested entities together, ensure order, return

Additional changes in this patch:
- Unify the code that ensures that entities are returned in the requested order into a single helper method
- Remove cache handling from all somethingFieldItems() methods, they no longer need to worry about this.
- Drop the doSomethingFieldItems() pattern and rename them to somethingFieldItems(), as the wrapper methods only cared about caching, which they no longer have to.
- Comment and user storage controllers have special data manipulation that needs to run before entity objects are created. They now override mapFromStorageRecords() instead of postLoad(), as that now receives entity objects.

Remaining tasks

User interface changes

-

API changes

- Refactoring of loadMultiple(), that should however not affect anyone
- Introduction of hook_entity_load_uncached(), modules that attach non-cacheable values in hook_entity_load() need to switch to that.
- Removal of the doSomethingFieldItems(), would only affect subclasses of FieldableStorageControllerBase, which I do not think exist, non-database storage controllers will use completely different patterns anyway, those methods always were database specific.

Original report by @catch

This is a long delayed followup to http://drupal.org/node/111127#comment-1494790

Reasons I'm posting this issue:

1. Despite converting lots of things to fields, caching at the node level rather than the field level gives a 10% performance improvement on a single node view with only the default profile and database caching. So the field cache does not 'do the same job' as an entity level cache as was suggested by Dries and bjaspan in that issue - it can't remove the cost of node_load() itself, the query building, the field_attach_load() calls etc.

2. I posted a proof-of-concept module at http://drupal.org/project/entitycache which implements node, taxonomy term and comment caching from contrib. This works, when hacked into the default profile, all but a couple of tests pass, and the entire .module is 14k - some of which is dead code for handling users which should probably be abandoned, some of which could probably be tidied up (pretty much all the NodeController / TaxonomyTermController overrides don't need to be there if we can call back to those classes directly), so potentially < 10k.

So, I'm proposing that we add this module to core, enabled in the default install profile. By having it as a separate module, we're able to maintain all the cache clearing code in one central place, so it doesn't have the messiness of putting node cache clears in taxonomy module, or taxonomy cache clears in image module or whatever. However we'll get real performance benefits in core rather than having to schlep off to contrib to get them.

We have real performance issues in Drupal 7, and this is one possible way to ameliorate those without changing any APIs (it adds a total of one hook). I'm not posting a patch here, since the code is in CVS, and this is more of an architecture/policy decision at this stage. Abusing status though to get it in the real issue queue.


Viewing all articles
Browse latest Browse all 294415

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>