Problem/Motivation
On Drupal sites where users are uploading files clean filenames are hard to achieve because Drupal accepts most filenames as valid input. However lots of Drupal installs override this functionality, for example Thunder has been shipping a similar solution for a very long time. As @dww writes in #132
Content creators name their files all kinds of weird and unfortunate things. Even if everything is in English, it's still great to convert spaces to dashes, remove weird punctuation, special characters, etc. Plus, many users find themselves in a mix of case-sensitive and case-insensitive filesystems, so it's great to have Drupal automatically convert everything to lowercase. I constantly build custom functionality into my sites to cleanup filenames on upload so as to prevent the editors from making a mess of things. It'd be much better if core did that for me automatically.
Follow-up to #1842718: Use new Transliteration functionality in core for machine names and #567832: Transliteration in core.
@catch mentioned in #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols) we may want to transliterate filenames in core, so that we can have a database-level unique constraint on the URI field in the database. (However, this would presumably mean that transliteration of the field could not be optional).
Proposed resolution
Add a checkbox to the File system form to opt-in for filename transliteration. Transliteration for filenames will be disabled by default, because it's not useful for some languages like Japanese.
If the new system.file:filename_transliteration
configuration option is set to TRUE:
- Transliterate the filename.
- Replace whitespace with
-
- Remove remaining characters other than 0-9A-Za-z or
-
or.
or_
- Remove multiple consecutive non-alphabetical characters ie
---
or...
or___
with be replaced with-
or.
or_
- Force lowercase to prevent issues on case-insensitive file systems.
Note that some of the above rules, for example forcing lowercase, means that we can only apply this to uploaded files as this would break JS and CSS aggregation.
Remaining tasks
Agree exactly what should occur when system.file:filename_transliteration
is TRUE.
Apart from transliterating what else should we do? Currently we:
- Replace whitespace with
-
- Remove remaining characters other than 0-9A-Za-z or
-
or.
or_
- Remove multiple consecutive non-alphabetical characters ie
---
or...
or___
with be replaced with-
or.
or_
- Force lowercase to prevent issues on case-insensitive file systems.
User interface changes
If the configuration system.file:filename_transliteration
is to TRUE set filenames are transliterated on upload.
API changes
None
Data model changes
- New configuration key
system.file:filename_transliteration
- File entity labels are updated in core munges or renames the uploaded file in any way.
Release notes snippet
@todo