Problem/Motivation
At present Drupal has a file usage API. This is critical to things like private file access and garbage collection of files.
Unfortunately, it relies on modules recording and removing usage entries and cannot be relied upon to provide a source of truth.
There are a wide number of bugs in core that pertain to invalid file usage data and we've had security issues and critical data loss issues as a result of this.
Some examples include:
- #2801777: Give users the option to prevent drupal from automatically marking unused files as temporary
- #2461845: Private files that are no longer attached to an entity should not suddenly become accessible to people who couldn't see them before
- #1452100: Private file download returns access denied, when file attached to revision other than current
- #2708411: [PP-1] editor.module's editor_file_reference filter not tracking file usage correctly for translations
Automatic file deletion in core had to be disabled due to persistent and impossible to resolve data loss issues:
#2821423: Dealing with unexpected file deletion due to incorrect file usage (see also all the linked and related issues of that issue, many of which are unresolved).
In addition, this API is limited to file usage, but Drupal's data model allows much richer entity relationships.
For example, the file usage API may record that a media entity makes use of a file. But content editors need to know if any other content entities make use of that media entity before they can decide if the file is in fact no longer in use.
Proposed resolution
Adapt the entity usage module for core as a low level API.
It provides the following features.
- A configurable usage API allowing an entity type to be flagged as a source or target of a usage record
- Support for revisions
- Support for translations
- Calculated at save time so cheap to query at run time
- A plugin based API allowing usage to be determined in a myriad of ways - e.g. via an entity reference, via a link in HTML, via an image tag, via an inline block in layout builder and many more
- The ability to reload the entire usage dataset via batch (ala node access rebuild) to ensure the data is accurate
With Entity Usage, a content editor can traverse the entity relationship to ascertain that e.g. image file A is attached to media entity B which is referenced from block content C which is used inline in the layout of node D. This allows the content editor to get meaningful usage data.
Remaining tasks
Agree this would be a useful feature to add to core, move to the core issue queue