Issue #8125 resolved

Plain text README rendering bug

Kaz Nishimura
created an issue

A '>' character in the plain text README file follows a ';' character.

See https://bitbucket.org/kazssym/prcs/src

Comments (5)

  1. Michael Frauenholtz staff

    Hi Kaz,

    When we parse README files, we have to convert certain characters in order to make them safe for display on the site. One example is converting > into >. Since your text includes that character at the end of your URL, the &gt is parsed as though it's part of the URL, which leaves the last ; in the text. Because of the security we must include for parsing READMEs, we can't change this behavior. To fix the issue in this case, either remove the > or make sure it is not directly next to the URL.

    Cheers,
    Michael

  2. Kaz Nishimura reporter

    I added > to make the URL parsing terminated before it as it is not allowed in URLs and it is not rare to quote URLs as such in plain text. What a paradox.

    Why cannot you apply the two conversions at the same time then? You actually have an option to fix it.

    Edit: Use of angle brackets around a URI is suggested in RFC 3986, Appendix C.

  3. Kaz Nishimura reporter

    If I have interpreted it correctly, the current behavior can break URLs that contain & too. Each & will be converted to & first and the URL will be terminated incorrectly after &amp leaving ; and following characters in plain.

  4. Michael Frauenholtz staff

    Hi Kaz and roman,

    After looking into this more, we were able to fix this. Originally we had separated our HTML escaping and URL parsing for performance reasons. Now, specifically in the case of READMEs that don't use any markup, we've determined that performance is acceptable enough to combine these for the desired output. Sorry for the quick decision to not fix.

  5. Log in to comment