Markdown filter: extra spaces and empty lines added/removed
When a .md file is processed by tikal.sh -x to generate a .md.xlf file, and then it is merged without change by tikal.sh -m, the resulting .out.md file is different than the original .md file. They tend to include extra spaces and empty lines.
Two sample files, found in the comment in another issue #686, are attached.
$ ./tikal.sh -x space1.md
$ ./tikal.sh -m space1.md.xlf
$ diff space1.md space1.out.md
1,8c1,10
< * **PointCloudType** - type of point cloud loaded into a Revit document. Each PointCloudType maps to a single file or identifier (depending upon the type of Point Cloud Engine which governs it).
< * **PointCloudInstance** - an instance of a point cloud in a location in the Revit project.
< * **PointCloudFilter** - a filter determining the volume of interest when extracting points.
< * **PointCollection** - a collection of points obtained from an instance and a filter.
< * **PointIterator** - an iterator for the points in a PointCollection.
< * **CloudPoint** - an individual point cloud point, representing an X, Y, Z location in the coordinates of the cloud, and a color.
< * **PointCloudOverrides** - and its related settings classes specify graphic overrides that are stored by a view to be applied to a PointCloudInstance element, or a scan within the element.
< ### Point cloud file paths
---
> * **PointCloudType** - type of point cloud loaded into a Revit document. Each PointCloudType maps to a single file or identifier (depending upon the type of Point Cloud Engine which governs it).
> * **PointCloudInstance** - an instance of a point cloud in a location in the Revit project.
> * **PointCloudFilter** - a filter determining the volume of interest when extracting points.
> * **PointCollection** - a collection of points obtained from an instance and a filter.
> * **PointIterator** - an iterator for the points in a PointCollection.
> * **CloudPoint** - an individual point cloud point, representing an X, Y, Z location in the coordinates of the cloud, and a color.
> * **PointCloudOverrides** - and its related settings classes specify graphic overrides that are stored by a view to be applied to a PointCloudInstance element, or a scan within the element.
>
> ### Point cloud file paths
>
Comments (9)
-
-
reporter - attached empty-line-test.md
./tikal.sh -x empty-line-test.md ./tikal.sh -m empty-line-test.md.xlf
generates empty-line-test.out.md that have only one empty lines between non-empty lines. Extra empty lines are removed.
However, this behavior may be correct from the Markdown spec point of view. It seems a run of more than one empty lines are interpreted just as one empty line and rendered as such. This is how empty-line-test.md is rendered on GitHub:
-
reporter - attached simple-space-test.md
In this test, each line starts with a different number of spaces. Within a Markdown paragraph unit, i.e. the lines without an empty line between them, the leading spaces are completely removed. This is justifiable from the Markdown syntax point of view because they are actually removed. Spaces mean nothing when they are rendered as a paragraph. See the screen shot of the GitHub rendering of this Markdown file:
From each Markdown paragraph, meaning a line following an empty lines and is followed by an empty line, 1-3 spaces are removed and 4-6 spaces are reduced to 4 spaces. Reducing the lead 1, 2, or 3 spaces could be justified because because they are semantically equal in Markdown. But normalizing 4, 5, or 6 spaces into 4 spaces is not justifiable because they have different semantics; the leading 4 spaces indicate the beginning of code and spaces after 4th space are rendered as they are. See the screen shot.
-
reporter There are several technical limitations in fixing this issue. Because of these, this issue will not be completely resolved.
Within HTML
The HTML filter, which the Markdown filter uses to process HTML elements and blocks, changes the number of and kind of white spaces since the number of white spaces carry no meaning in HTML element except within the pre element. So any Markdown document that includes HTML blocks with newlines or multiple spaces cannot be restored from .xlf file. For example,
<p>This paragraph was originally made of two lines and there were extra spaces here.</p>
will become:
<p>This paragraph was originally made of two lines and there were extra spaces here.</p>
Extra Newlines at EOF
There will be a newline inserted at the end of the file, if the original file ends without the newline. This is necessary because Flexmark loses the information about the end of the file in its subtree in certain occasions. For example, if a list like below is at the end of the file and there is no newline at the end:
* list item 1 * list item 2, not followed by a newline
Flexmark makes two paragraph nodes, one for "list item 1" and another for "list item 2, not followed by a newline". Usually, a paragraph node includes a newline but not when it appears as a list item. Because of that the filter must add a newline after each item list. We could further analyze the top node to see a newline is there and if the node we are dealing is the last node, but that would further complicate the code and the performance would suffer. It was felt this is an acceptable limitation.
-
reporter -
assigned issue to
https://bitbucket.org/okapiframework/okapi/issues/704 blocks this issue.
-
assigned issue to
-
reporter - changed version to M35
-
- changed status to resolved
Fixing issue 687 partially to deal with extra/missing blank lines.
→ <<cset 8de06a33747b>>
-
reporter - changed status to open
The status of this issue was incorrectly changed to resolved. The recent pull request #226 resolves this issue only partially for the extra/removed empty lines under certain situations. It does not resolve the extra/reduced spaces aspect of the issue. I am changing it back to open.
-
- removed responsible
- Log in to comment
Another case for this issue:
If we have several spaces before the hash, new line will be added for each of line (using tikal)