Problem/Motivation
Drupal aims to eliminate a class of security bugs by distinguishing 3 classes of text:
- Text that is already valid safe markup is represented by an object implementing
MarkupInterface
. - A string containing plain text that is automatically escaped when passed to the template system.
- A string containing unsafe markup that should not be passed to the template system as it would be escaped. Instead can use
Xss::filter()
orXss::filterAdmin()
.
It's currently hard to understand how to use Token::replace()
correctly in these 3 cases:
- The comment on the return string instructs the caller to rely on Twig autoescaping. However the return string is markup so this is wrong and will lead to double escaping.
- The return string is unsafe, even if the input was safe - this is potentially surprising and should be made clearer.
- The correct code for escaping tokens in plain text is awkward, requiring 3 nested function calls and leading to unnecessary conversions. This is easily forgotten, and is even wrong in core, see #3264453: Incorrect usage of Token::replace() can corrupt plain text that resembles HTML
This issue has been classed as a bug because the incorrect documentation and difficult usage led to bugged code in core and contrib.
Proposed resolution
- Fix comments to Token::replace().
- Create a new function
replacePlain()
.
Remaining tasks
Deferred to a separate issue: #3264453: Incorrect usage of Token::replace() can corrupt plain text that resembles HTML
User interface changes
None
API changes
See "proposed resolution" and change record.
Data model changes
None
Comment from original issue
@Berdir #2567257-210: hook_tokens() $sanitize option incompatible with Html sanitisation requirements.
Posting this here for now, will probably open a follow-up issue.
I've been working on updating token.module and adjusting functionality/tests for this. Which is probably something we should have done before committing this, to make sure that this works for more than the few use cases that core has.
I think there is at least one example there why supporting some sort of sanitize => FALSE is useful and important.
The basic use case is when you have user-provided, unsafe input and want it to be continue unsafe and un-escaped, because you then rely on autoescape.
One example in token.module is the block label, it has this code:
function token_block_view_alter(&$build, BlockPluginInterface $block) { $config = $block->getConfiguration(); $label = $config['label']; if ($label != '<none>') { // The label is automatically escaped, avoid escaping it twice. $build['#configuration']['label'] = \Drupal::token()->replace($label, array(), array('sanitize' => FALSE)); } }
The problem is that now the block label tokens are escaped twice. There's a test that is creating a node with a ' in it, and right now, that is getting escaped twice (which is exactly what this code is testing), since we force-escape all token return values and then escape the whole string again.
I don't see a proper way to fix this right now. What technically works is using PlainTextOutput::renderFromHtml() but clearly it is not correct to use that in non-plaintext output.
We don't have to pass it to hook implementations, but I really think we need a flag to prevent auto-escaping. We even document:
The caller is responsible for choosing the right escaping / sanitization
but don't actually allow to caller to do that, at least not for token values. But if the token input is untrusted and will be escaped later, we must treat token replacements as untrusted too or we are guaranteed to have double-escaping problems?
(See also the following comments about some discussion and why various things don't work IMHO)