First discovered in the ClamAV module: #2864506: Long ajax process causing files to disappear.
A slow implementation of hook_file_validate()
can cause inappropriate entries in the file_managed
table, causing an in-use file to be treated as orphaned, and therefore deleted on a cron-run. This causes permanent loss of data.
More significantly, the upload appears to succeed and the data is available for a period of time, whilst the data-loss happens on cron, between 6 hours and 3 months after the data was uploaded. This can make the cause of the issue harder to recognise, and reduce the likelihood that the original file is still available for recovery.
Steps to reproduce:
- Prepare a Drupal install with a content-type that contains a file field.
- Enable a module which has a slow implementation of
hook_file_validate()
(example code provided below for test purposes) - Go to the node-add page for the relevant content-type
- Click the "browse" button, and choose a file to upload
- Click the form's "Save and publish" button quickly, before the AJAX upload + validate process has completed
The outcome is that two entries are added to the file_managed
table, and one entry added to the file_usage
table. Here is an example:
mysql> select * from file_managed;
+-----+--------------------------------------+----------+------+----------+----------------------------+------------+----------+--------+------------+------------+
| fid | uuid | langcode | uid | filename | uri | filemime | filesize | status | created | changed |
+-----+--------------------------------------+----------+------+----------+----------------------------+------------+----------+--------+------------+------------+
| 1 | 3bcdd711-988e-4906-b8c6-0b22df271b93 | en | 1 | foo.txt | public://2017-04/foo_0.txt | text/plain | 9 | 0 | 1492187343 | 1492187343 |
| 2 | 0a7a9874-7e73-43da-8eeb-a7caa03ae63d | en | 1 | foo.txt | public://2017-04/foo_0.txt | text/plain | 9 | 1 | 1492187346 | 1492187346 |
+-----+--------------------------------------+----------+------+----------+----------------------------+------------+----------+--------+------------+------------+
2 rows in set (0.00 sec)
mysql> select * from file_usage;
+-----+--------+------+----+-------+
| fid | module | type | id | count |
+-----+--------+------+----+-------+
| 2 | file | node | 1 | 1 |
+-----+--------+------+----+-------+
1 row in set (0.00 sec)
One entry is correctly recorded in the file_managed
table, with a corresponding file_usage
entry; the second is an orphaned file. However, both entries share the same uri. When the cron cleanup process deletes orphaned files, the valid, in-use file is deleted, causing permanent data loss.
The issue was originally reported in the context of the ClamAV module, however, I've been able to reproduce using nothing more than a sleep(10)
command in a new debug module:
/**
* Implements hook_file_validate().
*/
function mydebug_file_validate(Drupal\file\FileInterface $file) {
$errors = array();
// Add a delay of 10 seconds.
sleep(10);
return $errors;
}
This particularly affects the ClamAV module, where certain virus-scan configurations can add 10 seconds to the file-validate process, but this issue could affect any scenario where validation invokes a slow external process.