Problem/Motivation
Our transliteration service is a best attempt using relatively simple userspace code. However, this doesn't cover as many cases obviously as the extremely powerful ICU transliterator.
Proposed resolution
Write a wrapper for the relevant functionality in the intl
extension, move the transliteration
service registration into CoreServiceProvider
and register the correct one depending whether the extension is enabled or not.
Remaining tasks
Decide what to add to the status report and in general, how do we communicate this new capability?
Coding this is a bit tricky, but let me see what I can add to help: the C version of removeDiacritics
can be found at https://stackoverflow.com/a/13071166/308851 and the crux of the matter is Transliterator *accentsConverter = Transliterator::createInstance("NFD; [:M:] Remove; NFC", UTRANS_FORWARD, status);
which rhymes with https://www.php.net/manual/en/transliterator.createfromrules.php quite well but that's just documentation and there isn't a lot of it. We need to dig deeper. The PHP source code callsutrans_openU
instead of Transliterator::createInstance
. We need to dig even deeper: the ICU source shows us that utrans_openU
is a simple wrapper around that. Yay! Our tour in C++ land confirms removeDiacritics
can be built around transliterator::createfromrules('NFD; [:M:] Remove; NFC')
.
The transliterate
method surely just wraps https://www.php.net/manual/en/transliterator.create.php I expect trouble with all the optional arguments: the method accepts langcode , I am not quite sure but I believe ICU expects a locale, so I expect some trouble there. I posted https://stackoverflow.com/q/62314532/308851 but I am not holding my breath.
Testing should be easy: amend the component PhpTransliterationTest with the classname to be tested. The testbot is https://git.drupalcode.org/project/drupalci_environments/-/blob/producti... ready.
User interface changes
API changes
Data model changes
Release notes snippet
Original report
when removing diacritics in function search_simplify(), it not considering remove Arabic diacritics. How to test: add this text to any article: "السُّلَّامُ عَلَيْكُمْ وَرَحْمَةُ اللهِ وَبَرَكَاتُهُ" then search for "السلام". learning form this article, I develop an ugly patch to fix it. It may be better to move it to PhpTransliteration.php but I am not sure.