Problem/Motivation
Drupal 8 aims to eliminate a class of security bugs by automatically escaping text. There are 3 classes of text:
- Text that is already valid safe markup is represented by an object implementing
MarkupInterface
. - A string containing plain text that is automatically escaped when passed to the template system.
- A string containing unsafe markup that should not be passed to the template system as it would be escaped. Instead can use
Xss::filter()
orXss::filterAdmin()
.
Token::replace()
doesn't properly follow this system.
- The comment on the return string instructs the caller to escape the return string or rely on Twig autoescaping. However the return string is unsafe markup so both of these are wrong and will lead to double escaping.
- If the
$text
parameter is safe markup, then the return value is definitely safe markup, so should also beMarkupInterface
. Otherwise this leads to double-escaping when the string is automatically escaped later. - The
$text
parameter has a comment to require the caller to escape the text if necessary. However it's easy for a developer not to notice, especially because normally escaping is automatic.
The correct usage of Token::replace()
is awkward as the caller may need both special processing on both input and return value. Developers have got used to auto-escaping so the problems seem extremely widespread.
- Core code, see #25, #27. A quick glance suggests of the calls to replace tokens in Drupal Core, the the majority of are wrong
- Contrib code, see comment #12, which shows a possible workaround but only by using an @internal class \Drupal\Core\Render\Markup.
However the bug is not highly noticeable because it is only visible for strings that contain special characters such as & and quotes.
Proposed resolution
Option 1
Easy, back-compatible
- Change
Token::replace()
to returnMarkupInterface
if $text implementsMarkupInterface
. - Fix comment on the return value of
Token::replace()
. - Fix all calls to
Token::replace()
in core to follow the rules.
Option 2
Fix confusing interface. Deprecate Token::replace()
and create new functions:
Token::replaceSafeMarkup()
output is safe markup, input is safe markup or plain text that will be automatically escapedToken::replacePlainText()
output is plain text, input is plain text or contain safe markup that will be automatically converted.Token::replaceUnsafeMarkup()
output is unsafe markup, input is unsafe markup.
// PLAIN TEXT
// Before
use Drupal\Component\Render\PlainTextOutput;
use Drupal\Component\Utility\Html
PlainTextOutput::renderFromHtml($token_service->replace(Html::escape($text)));
// After
$token_service->replacePlainText($text);
// SAFE MARKUP
// Before
use Drupal\Component\Render\HtmlEscapedText
use Drupal\Component\Render\Markup
$title = $this->token->replace(new HtmlEscapedText($page_title), $data);
return Markup::create($title);
// After
return $this->token->replaceSafeMarkup($page_title, $data);
Remaining tasks
Decide which option. Create patch and fix tests.
User interface changes
None
API changes
Option 1: Change to the type of the return parameter of Token::replace()
.
Option 2: Deprecate Token::replace()
.
Data model changes
None
Comment from original issue
I propose that @Berdir problem will be solved if Token::replace()
returns MarkupInterface
as that will prevent autoescape of already escaped text.
@Berdir #2567257-210: hook_tokens() $sanitize option incompatible with Html sanitisation requirements.
Posting this here for now, will probably open a follow-up issue.
I've been working on updating token.module and adjusting functionality/tests for this. Which is probably something we should have done before committing this, to make sure that this works for more than the few use cases that core has.
I think there is at least one example there why supporting some sort of sanitize => FALSE is useful and important.
The basic use case is when you have user-provided, unsafe input and want it to be continue unsafe and un-escaped, because you then rely on autoescape.
One example in token.module is the block label, it has this code:
function token_block_view_alter(&$build, BlockPluginInterface $block) { $config = $block->getConfiguration(); $label = $config['label']; if ($label != '<none>') { // The label is automatically escaped, avoid escaping it twice. $build['#configuration']['label'] = \Drupal::token()->replace($label, array(), array('sanitize' => FALSE)); } }
The problem is that now the block label tokens are escaped twice. There's a test that is creating a node with a ' in it, and right now, that is getting escaped twice (which is exactly what this code is testing), since we force-escape all token return values and then escape the whole string again.
I don't see a proper way to fix this right now. What technically works is using PlainTextOutput::renderFromHtml() but clearly it is not correct to use that in non-plaintext output.
We don't have to pass it to hook implementations, but I really think we need a flag to prevent auto-escaping. We even document:
The caller is responsible for choosing the right escaping / sanitization
but don't actually allow to caller to do that, at least not for token values. But if the token input is untrusted and will be escaped later, we must treat token replacements as untrusted too or we are guaranteed to have double-escaping problems?
(See also the following comments about some discussion and why various things don't work IMHO)