Preface
This is quite long (sorry), so as a preface, here's the conclusion:
I believe it's clear that we need an API addition for the entity loading functions to simply say "After loading this entity (no matter how it is loaded) do not save it to the cache before returning it".
This is because the current API forces you to flush entities from the cache (sometimes very aggressively), when this should not be necessary.
I propose adding a new argument to the load functions in Drupal 7+ as follows:
@param $cache
Whether to save the entity to the cache after loading.
entity_load($entity_type, $ids = FALSE, $conditions = array(), $reset = FALSE, $cache = TRUE)
entity_load_unchanged($entity_type, $id, $cache = TRUE)
Storing the loaded entity in the cache is then made conditional upon the $cache
argument.
Overview
The following issue was triggered by contrib modules, but it all points to a deficiency in the core API since the advent of the pluggable entity controller mechanisms in Drupal 7.
With entity, entitycache and apachesolr on our Drupal 7 sites, I was seeing the following behaviour:
apachesolr wants to index a (potentially) large number of things, so to avoid the static cache memory requirements ballooning during the request it follows the traditional approach of loading each one with:entity_load(..., TRUE); // i.e. $reset = TRUE
which calls:entity_get_controller($entity_type)->resetCache()
(note that the $ids
argument to resetCache()
is empty; see also http://api.drupal.org/comment/44628#comment-44628 )
This is calling the Entity API module's EntityAPIController::resetCache()
which has support for entitycache, and so calls:EntityCacheControllerHelper::resetEntityCache($this, $ids)
(where $ids = NULL
of course)
which, because there are no $ids
, calls:cache_clear_all('*', 'cache_entity_' . $controller->entityType, TRUE)
which, as we are using the database for our cache storage, issues a TRUNCATE
on the database table for entitycache's persistent cache bin for that type; and does so upon every entity_load()
call within that indexing loop.
This entire sequence was potentially repeating every ten minutes (being our cron interval), whenever there were content changes to index (which was extremely frequent in this instance).
The problem is that this aspect of the entity load API, due to its basis on the node_load()
of earlier versions of Drupal, just doesn't mesh with the concept of pluggable cache implementations (and the possibility of a persistent cache especially). Previously the $reset
flag was an okay method of preventing the memory requirements ballooning during a given request when a lot of nodes were being loaded; but with Drupal 7 we have at least two distinct use cases, and this single $reset
argument is no longer sufficient to cover them.
Case 1
You genuinely do want to flush the entire cache bin for the entity type in question before loading an entity.
Obviously in this case, $reset
does exactly what you want (and as this is the stated and expected purpose of the $reset
argument, we can't change this behaviour in current versions of Drupal).
However, flushing a cache bin can be performed more explicitly by calling entity_get_controller($entity_type)->resetCache()
, so for future versions of Drupal I would suggest that the $reset
argument be removed entirely. Using it would almost never be the correct thing to do; and in the rare circumstances that it is needed, resetCache()
can simply be used instead.
Case 2
This is the (far) more likely situation.
You want to load the entity, but do not expect to refer to it subsequently, and therefore do not want it statically cached in memory.
In this case the chances are that you are looping through a potentially-large list of entities (e.g. apache solr doing an indexing cron run), and you want to ensure that the memory limits will not be exceeded.
Here we are actually not interested in flushing any currently-cached entities. We are only interested in the entity we are about to load; and specifically we want to ensure that, if it was not previously in memory, it does not remain in memory once we have finished with it.
So $reset
is not at all appropriate here; and yet it's still the most obvious API for achieving the primary goal of not exceeding the memory limits.
What is actually required is a way to say "load this entity, but do not cache it in memory".
(In fact, this would have been a sensible API option on earlier versions of Drupal as well, even without the possibility of a persistent load cache; but as in that instance a penalty is only ever incurred when a single request has already loaded a specific node and then loads it again with the $reset
flag, it probably didn't seem like much of an issue.)
The other entertaining thing about using $reset
when looping over entity_load()
calls is that, after firstly flushing the cache and generating (or perhaps re-generating) an entity, it caches it. Then the next iteration of the loop flushes the cache, generates an entity, and caches it; etc. So not only are we flushing the cache every time, but we are also pointlessly expending the effort & bandwidth to put each of those objects into the cache.
In the days when the static PHP memory cache was the only cache, this carried no overhead -- the act of caching the object simply meant that PHP kept a reference to it, and flushing the cache just deleted those references. Things are quite different now, however -- we can be writing to a database, or sending it to a memcache server, or pretty much anything.
Workaround 1
For entity types with revisions, I've noticed that there is currently a workaround which enables the "load but do not cache" behaviour (although, as when using $reset
, the existing cache will not be used to obtain the entity).
When you explicitly load an entity revision (with entity_revision_load()
or by passing the revision id in the conditions to entity_load()
), Drupal does not cache the result before returning it. No doubt it was considered that the chances of other code needing to refer to some non-current revision was slim, and therefore caching it was a bad idea.
However, as there is no check to see if the revision being passed is the current revision for the entity, you can query the current revision of an entity and then load the revision instead of using its entity_id, and the result is not cached before it is returned.
By doing this when looping through lots of entities, you avoid flushing the entity cache on every load.
Here's some example code for using the revision workaround:
/**
* Load an entity *without* writing it to the entity load cache,
* if possible. (If we have to cache it, flush it from the cache
* afterwards.)
*/
function entity_load_and_do_not_cache($entity_type, $entity_id) {
$info = entity_get_info($entity_type);
if (!empty($info['entity keys']['revision'])) {
// Revisions are enabled, so we can use entity_revision_load()
// to avoid caching the result.
$query = new EntityFieldQuery();
$results = $query
->entityCondition('entity_type', $entity_type)
->propertyCondition($info['entity keys']['id'], $entity_id)
->execute();
if (!empty($results[$entity_type][$entity_id])) {
$result = $results[$entity_type][$entity_id];
$revision_id = $result->{$info['entity keys']['revision']};
$entity = entity_revision_load($entity_type, $revision_id);
}
}
else {
// Revisions are not enabled for this entity type, so we must
// load the entity normally, and then flush it from the cache.
$entity = entity_load($entity_type, array($entity_id));
entity_get_controller($entity_type)->resetCache(array($entity_id));
}
return isset($entity) ? $entity : NULL;
}
So this is a useful workaround for the present, but obviously it is still not ideal:
When you load an entity revision, the cache is not used at all. Drupal does not save revisions to cache, so therefore it also will not attempt to load them from cache.
So in the use case in which we want to preserve the existing cache (but simply not expand it), the entity_load_revision()
workaround is quite likely to be doing more work than required if a persistent cache is involved.
In most situations, we would be very happy to load the entity from cache if it was already cached, thus saving all the unnecessary effort of generating the object..
Workaround 2
That brings us to the second workaround (also used above when revisions are not available), which is the two-step process of loading the entity normally, and then flushing it from the load cache once we're finished with it (I notice that apachesolr-1.x-dev has recently implemented this to avoid using $reset
).
This approach has the desirable benefit of utilising the load cache to obtain the entity in the first place, but it is still not what we want:
- If the entity was not in cache, we are then unnecessarily sending it to the cache
- Then we are flushing it from the cache even if it had been there before we began (in which case it should actually be left alone).
In the case of previously-cached entities, even though we gain the benefit of the cache when loading the object ourselves, we are then denying that benefit to the next request for that object.
Conclusion
Either workaround is far better than using $reset
, but neither one is a really good solution to the problem.
I believe it's clear that we need an API addition for all the entity loading functions to simply say "After loading this entity (no matter how it is loaded) do not save it to the cache before returning it".
At minimum, this is needed for:
entity_load()
-- which may or may not load from the cache.entity_load_unchanged()
-- which obviously won't load from the cache.
(and ideally also for the type-specific analogs such as node_load()
)
I propose adding a new argument to the load functions in Drupal 7 as follows:
@param $cache
Whether to save the entity to the cache after loading.
entity_load($entity_type, $ids = FALSE, $conditions = array(), $reset = FALSE, $cache = TRUE)
entity_load_unchanged($entity_type, $id, $cache = TRUE)
Storing the loaded entity in the cache is then made conditional upon the $cache
argument.
As previously mentioned, I think the $reset
argument could be removed in Drupal 8. If we wanted to minimise the risk of problems with code being ported to Drupal 8, the $cache
argument could be inverted to $do_not_cache
(in order that any code originally using $reset = TRUE
would then be using the somewhat-analogous $do_not_cache = TRUE
, and there would be fewer porting errors. I generally prefer positive meanings (cache) to negative ones (do_not_cache) in variable names, but I'd be very happy with either one :)