Quantcast
Channel: Issues for Drupal core
Viewing all articles
Browse latest Browse all 294396

Use new Transliteration functionality in core for file names

$
0
0

Problem/Motivation

On Drupal sites where users are uploading files clean filenames are hard to achieve because Drupal accepts most filenames as valid input. However lots of Drupal installs override this functionality, for example Thunder has been shipping a similar solution for a very long time. As @dww writes in #132

Content creators name their files all kinds of weird and unfortunate things. Even if everything is in English, it's still great to convert spaces to dashes, remove weird punctuation, special characters, etc. Plus, many users find themselves in a mix of case-sensitive and case-insensitive filesystems, so it's great to have Drupal automatically convert everything to lowercase. I constantly build custom functionality into my sites to cleanup filenames on upload so as to prevent the editors from making a mess of things. It'd be much better if core did that for me automatically.

Follow-up to #1842718: Use new Transliteration functionality in core for machine names and #567832: Transliteration in core.

@catch mentioned in #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols) we may want to transliterate filenames in core, so that we can have a database-level unique constraint on the URI field in the database. (However, this would presumably mean that transliteration of the field could not be optional).

Proposed resolution

Add a checkbox to the File system form to opt-in for filename transliteration. Transliteration for filenames will be disabled by default, because it's not useful for some languages like Japanese.

If the new system.file:filename_transliteration configuration option is set to TRUE:

  • Transliterate the filename.
  • Replace whitespace with -
  • Remove remaining characters other than 0-9A-Za-z or - or . or _
  • Remove multiple consecutive non-alphabetical characters ie --- or ... or ___ with be replaced with - or . or _
  • Force lowercase to prevent issues on case-insensitive file systems.

Note that some of the above rules, for example forcing lowercase, means that we can only apply this to uploaded files as this would break JS and CSS aggregation.

Remaining tasks

Agree exactly what should occur when system.file:filename_transliteration is TRUE.

Apart from transliterating what else should we do? Currently we:

  • Replace whitespace with -
  • Remove remaining characters other than 0-9A-Za-z or - or . or _
  • Remove multiple consecutive non-alphabetical characters ie --- or ... or ___ with be replaced with - or . or _
  • Force lowercase to prevent issues on case-insensitive file systems.

User interface changes

If the configuration system.file:filename_transliteration is to TRUE set filenames are transliterated on upload.

API changes

None

Data model changes

  • New configuration key system.file:filename_transliteration
  • File entity labels are updated in core munges or renames the uploaded file in any way.

Release notes snippet

@todo


Viewing all articles
Browse latest Browse all 294396

Trending Articles