Quantcast
Channel: Issues for Drupal core
Viewing all articles
Browse latest Browse all 294097

Drupal\Component\Utility\Html::normalize() leaves messy </body></html> in certain situations

$
0
0

Problem/Motivation

Under certain circumstances, Drupal\Component\Utility\Html::normalize() (in D7, it's _filter_htmlcorrector()) will add messy </body></html> to the resulting HTML. This happens when the HTML ends in the middle of an attribute, for example:

<p>Here <img alt="ao

This will produce output like:

You can reproduce on Drupal 7 or 8 by following these steps:

Drupal 8

  1. Install Drupal 8.3.1 with the standard profile
  2. Go to /admin/structure/types/manage/article/display/teaser and configure the "Body" filed to trim at 20 characters
  3. Go to /admin/config/content/formats/manage/basic_html and both (a) enable the "Correct faulty and chopped off HTML" filter and (b) disable the "Restrict images to this site" filter (only necessary for the example HTML, not necessary to trigger the bug)
  4. Go to /node/add/article and use this HTML as the body (be sure to click the "Source" button in the WYSIWYG toolbar before pasting it, otherwise you're adding text not HTML):
    Here <img alt="aoeunhteoas unthoaesn theoausnth oaesntheo asnthoae" src="http://flowjournal.org/wp-content/uploads/2011/12/Im-Not-Here.png"  /> it is
    
  5. Go to /node and observe output like in the screenshot

Drupal 7

  1. Install Drupal 7.54 with the standard profile
  2. Go to /admin/structure/types/manage/article/display/teaser and configure the "Body" filed to trim at 20 characters
  3. Go to /admin/config/content/formats/filtered_html and add <img> to the "Allowed HTML tags" (under "Limit allowed HTML tags")
  4. Go to /node/add/article and use this HTML as the body:
    Here <img alt="aoeunhteoas unthoaesn theoausnth oaesntheo asnthoae" src="http://flowjournal.org/wp-content/uploads/2011/12/Im-Not-Here.png"  /> it is
    
  5. Go to /node and observe output like in the screenshot

Proposed resolution

Remove broken tags at the end of the text before parsing into the DOM. Views has some code that does this to avoid this exact problem when trimming fields in Views. Here's the Views code:

$value = rtrim(preg_replace('/(?:<(?!.+>)|&(?!.+;)).*$/us', '', $value));

Remaining tasks

  1. Port patch to Drupal 8
  2. Write automated tests
  3. Review
  4. Backport final patch to Drupal 7

User interface changes

None.

API changes

None.

Data model changes

None.

Original summary

_filter_htmlcorrector leaves in fragmentary tags that may be passed in and break the rest of the page. This is most liable to happen when using the "Field can contain HTML" filter in the Views module, but could also occur any other time a developer were to trim a string that contains HTML and pass it to this function.

Example (trimmed to 250 chars):

Lorem ipsum dolor sit amet, consectetur adipiscing elit. <strong>Aliquam posuere enim</strong>. Sed ultrices semper tortor. Pellentesque cenim consectetur. Nulla sed risus eu ipsum venenatis <a class="sample" href="http://www.example.com/partial/path

Output is identical to input, breaking any HTML that follows on the page. Ideal output would be:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. <strong>Aliquam posuere enim</strong>. Sed ultrices semper tortor. Pellentesque cenim consectetur. Nulla sed risus eu ipsum venenatis 

Patch attached.


Viewing all articles
Browse latest Browse all 294097

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>