Markdown not extracting text after bullet line with trailing spaces

Issue #695 resolved
ysavourel created an issue

When there is the following markup:

- __Some header__^^^    
^Some additional text

(Where ^ is a space)

Which shows as:

  • Some header
    Some additional text

The "Some additional text" part does not get extracted at all.

@ssikuro : Maybe that is one of the issue you have found already.

Comments (11)

  1. ysavourel reporter

    Another case (probably related to the same bug:

    **27\. Some heading**  
    Then some words with *stars* and more words.
    

    "27. Some heading" and "stars* and more words." get extracted. The part between the second ** of the heading and the start of the paragraph are ignore. This seems to indicate some kind of parsing state problem.

  2. ysavourel reporter

    It looks like there is a general pattern:

    Some kind of markdown code ends a line (with possibly trailing spaces) and the new line after the line-break does not get extracted. For example:

    ![image](http://img.png)
    Additional text
    

    Also causes the issue.

  3. Kuro Kurosaka

    The first two cases involves a hard line break. In its intermediate step, Markdown filter is generating a HARD_LINE_BREAK token with the content text, while it should generate a HARD_LINE_BREAK token with just newlines in its content, and a TEXT token with the actual paragraph text.

    For the third case (image markdown), a SOFT_LINE_BREAK token with the "Additional text" content is generated, instead of a SOFT_LINE_BREAK token and a TEXT token.

  4. Kuro Kurosaka

    @ysavourel Is this still happening with the latest snapshot build of the markup filter? I can't reproduce it.

  5. Kuro Kurosaka

    @ysavourel , I used M36 and ran tikal.sh -x bullet-para.md and "Text that comes under the bullet...." part was extracted. Are you sure it wasn't extracted? But the merged document lacks the trailing spaces in the previous line (just after "bullet__"). This is bad because the lack of 2+ spaces at the end changes the meaning of the markdown file.

  6. Kuro Kurosaka

    hard-breaks.md demonstrates a simpler case of improper handling of hard line breaks. Hard line breaks are completely ignored in XLIFF. No two spaces, backslash exists or a place holder that would represent them, doesn't exist.

  7. Log in to comment