Problem/Motivation
Per #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols) there is currently no uniqueness constraint on the uri field in the file_managed
table enforced in the db schema in D8, because of limitations on lengths of keys under utf8mb4 with lowest-common-denominator MySQL configs. Instead we rely on a Drupal/application-level constraint in \Drupal\file\Plugin\Validation\Constraint\FileUriUnique
.
However, this constraint is not currently enforced in file_save_upload()
, as the validate()
method is never explicitly called on the $file
object.
Coupled with the way new filenames are chosen in file_destination()
and file_create_filename()
(based on checking file_exists()
on a candidate name) this can lead to a race condition whereby more than one entry can be created in the file_managed
table with the same value for the uri
field.
When this happens, there is a risk of data loss. For example, if garbage collection deletes temporary files which have the same uri
as other (permanent) files.
Slow implementations of hook_file_validate()
(e.g. in the ClamAV module) can exacerbate the race condition, by increasing the likelihood that multiple parallel(ish) processes select the same filename before anything is written to disk.
Proposed resolution
Enforce the unique constraint on files'uri
to avoid multiple rows with the same value in the file_managed
table.
Remaining tasks
- Review the patch.
- Commit the patch.
User interface changes
There may be a new error message presented to users if they try to upload a file and the race condition occurs. However, it actually seems that changes since the initial report of this bug make it less likely that this will happen in the UI (see #52 for example).
API changes
None.
Data model changes
None.
Release notes snippet
Use "Problem/Motivation" section above?
Original report by manarth
First discovered in the ClamAV module: #2864506: Long ajax process causing files to disappear.
A slow implementation of hook_file_validate()
can cause inappropriate entries in the file_managed
table, causing an in-use file to be treated as orphaned, and therefore deleted on a cron-run. This causes permanent loss of data.
More significantly, the upload appears to succeed and the data is available for a period of time, whilst the data-loss happens on cron, between 6 hours and 3 months after the data was uploaded. This can make the cause of the issue harder to recognise, and reduce the likelihood that the original file is still available for recovery.
Steps to reproduce:
Not possible to manually reproduce since 8.6.x (see #52) - see the new test instead: SaveUploadTest.php::testDuplicate()
- Prepare a Drupal install with a content-type that contains a file field.
- Enable a module which has a slow implementation of
hook_file_validate()
; either use example code provided below, or install this sandbox (machine name: slow_file_upload) - Go to the node-add page for the relevant content-type
- Click the "browse" button, and choose a file to upload
- Click the form's "Save and publish" button quickly, before the AJAX upload + validate process has completed
The outcome is that two entries are added to the file_managed
table, and one entry added to the file_usage
table. Here is an example:
mysql> select * from file_managed;
+-----+--------------------------------------+----------+------+----------+----------------------------+------------+----------+--------+------------+------------+
| fid | uuid | langcode | uid | filename | uri | filemime | filesize | status | created | changed |
+-----+--------------------------------------+----------+------+----------+----------------------------+------------+----------+--------+------------+------------+
| 1 | 3bcdd711-988e-4906-b8c6-0b22df271b93 | en | 1 | foo.txt | public://2017-04/foo_0.txt | text/plain | 9 | 0 | 1492187343 | 1492187343 |
| 2 | 0a7a9874-7e73-43da-8eeb-a7caa03ae63d | en | 1 | foo.txt | public://2017-04/foo_0.txt | text/plain | 9 | 1 | 1492187346 | 1492187346 |
+-----+--------------------------------------+----------+------+----------+----------------------------+------------+----------+--------+------------+------------+
2 rows in set (0.00 sec)
mysql> select * from file_usage;
+-----+--------+------+----+-------+
| fid | module | type | id | count |
+-----+--------+------+----+-------+
| 2 | file | node | 1 | 1 |
+-----+--------+------+----+-------+
1 row in set (0.00 sec)
One entry is correctly recorded in the file_managed
table, with a corresponding file_usage
entry; the second is an orphaned file. However, both entries share the same uri. When the cron cleanup process deletes orphaned files, the valid, in-use file is deleted, causing permanent data loss.
The issue was originally reported in the context of the ClamAV module, however, I've been able to reproduce using nothing more than a sleep(10)
command in a new debug module:
/**
* Implements hook_file_validate().
*/
function mymodule_file_validate(Drupal\file\FileInterface $file) {
$errors = [];
if (!isset($_POST['op']) || ($_POST['op'] != 'Save')) {
// Add a delay if we're not actually saving the node.
sleep(30);
}
return $errors;
}
This particularly affects the ClamAV module, where certain virus-scan configurations can add 10 seconds to the file-validate process, but this issue could affect any scenario where validation invokes a slow external process.