Quantcast
Channel: Issues for Drupal core
Viewing all articles
Browse latest Browse all 292194

Provide options to sanitize filenames (transliterate, lowercase, replace whitespace, etc)

$
0
0

Problem/Motivation

On Drupal sites where users are uploading files clean filenames are hard to achieve because Drupal accepts most filenames as valid input. However lots of Drupal installs override this functionality, for example Thunder has been shipping a similar solution for a very long time. As @dww writes in #132

Content creators name their files all kinds of weird and unfortunate things. Even if everything is in English, it's still great to convert spaces to dashes, remove weird punctuation, special characters, etc. Plus, many users find themselves in a mix of case-sensitive and case-insensitive filesystems, so it's great to have Drupal automatically convert everything to lowercase. I constantly build custom functionality into my sites to cleanup filenames on upload so as to prevent the editors from making a mess of things. It'd be much better if core did that for me automatically.

Follow-up to #1842718: Use new Transliteration functionality in core for machine names and #567832: Transliteration in core.

@catch mentioned in #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols) we may want to transliterate filenames in core, so that we can have a database-level unique constraint on the URI field in the database. (However, this would presumably mean that transliteration of the field could not be optional).

Proposed resolution

Add a fieldset to the File system form (/admin/config/media/file-system) to opt-in for various kinds of filename sanitization:

  • Transliterate the filename. Disabled by default, because it's not useful for some languages like Japanese.
  • Replace whitespace with dash (-) or underscore (_).
  • Replace any characters other than 0-9A-Za-z or - or _ or . with dash (-) or underscore (_).
  • Remove duplicate separator characters ie -- or .. or __ with be replaced with - or . or _.
  • Convert to lowercase (to prevent issues on case-insensitive file systems).

On all files uploaded through the API and UI (i.e. not aggregated JS/CSS files, etc), the filename will be sanitized in whatever ways the site is configured to enforce.

We listen to the \Drupal\file\Event\FileUploadSanitizeNameEvent and change the filename according the the configuration.

Note: These sanitization settings only impact new file uploads, all files already uploaded to a site are not affected.

Remaining tasks

  1. Decide if all whitespace or only spaces are converted. Update code and UI text to match. All whitespace as of #159.
  2. Design and dispatch the appropriate event(s) where core's sanitization is done. Complete via #187.
  3. Decide if the FileUploadEvent should also let listeners munge the destination. If so, update the Event accordingly. No (via @alexpott in #192.2).
  4. Decide if we give the user enough feedback about renaming their file upon upload. No (via @alexpott in #203). If not, implement the appropriate feedback (Complete via #214).
  5. Finish cleaning up the UI text, settings names, etc: Done via #219.
    • What's the right label for the 'strip' or 'remove' option for replace_non_alphanumeric? For now: "- Strip -". Previous: "- Remove (do not replace) -". Other? Option D: remove it entirely, via #219.
    • Should the options for the two selects (replace_*) be capitalized? Sure. ;) Via #194
    • Any other concerns/edits/fixes?
  6. Finish writing Kernel test cases for (all?) the combinations of settings. Done via #213 and #219.
  7. Upload screenshot(s) of final settings UI and update the summary. Done via #219.
  8. Final code cleanup and fixes. No more @todo via #219.
  9. Update/fix the change record. Done via #221
  10. Add tests for REST integration
  11. Decide if we're allowing transliteration as an option(!). See #272.
  12. Decide if we should let people pick a transliteration language. See #270 and #272.
  13. Usability review:
    • Decide if we want a UI at all, or if sites should simply configure this via settings.php (like they do for the public + private files dir paths). See #272.
    • If yes to a UI:
      • Decide whether we want to be able to have different replacement characters for whitespace, non-alphanumeric characters, and transliteration unknown characters. See #180, #246, #258, #261, and others...
      • If we have it as a separate / shared setting, decide if the "replacement character" setting should be "near" the checkboxes that provide replacement or at the end of the fieldset (see #263 for comparisons)
      • Decide if "replace non-alphanumeric" should be "near" transliterate option. See #241 and #242.
      • Decide if we should let people pick a transliteration language. See #270 and #272.
  14. Avoid the announcement of redundant text by a screenreader (ref #343)
  15. Final string/UI review + signoff
  16. Final reviews + RTBC.
  17. Commit.

User interface changes

New "Sanitize filenames" detail element with options on FileSystemForm (/admin/config/media/file-system).

Screenshots of the new settings UI :

Sanitize filenames element closed:

Transliteration is not enabled

Sanitize filenames element open:
Transliteration is enabled

Options under "Replacement character" setting:
Transliteration options

API changes

None

Data model changes

New configuration mapping file.settings:filename_sanitization, with options for:

  • transliterate: Transliterate
  • replace_whitespace: Replace whitespace
  • replace_non_alphanumeric: Replace non-alphanumeric characters
  • dedupe_separators: Remove duplicate dots (..), underscores (__) or dashes (--)
  • lowercase: Convert to lowercase

Release notes snippet

[#2972665]


Viewing all articles
Browse latest Browse all 292194

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>