Tags of empty elements are labled as opening tags
Original [issue 59](https://code.google.com/p/okapi/issues/detail?id=59) created by @ysavourel on 2009-04-26T14:31:49.000Z:
The HTML filter labels the tags of empty elements as 'opening' while they should be 'placeholder' (or have closing). Example:
<p>t1 <br> t2 <img src='x'> t3</p>
will give 2 'opening' codes.
This cause havoc and fatal errors when dealing with merging XLIFF or when recombining segmented text.
Comments (7)
-
Account Deleted -
Account Deleted - changed status to open
Comment [2.](https://code.google.com/p/okapi/issues/detail?id=59#c2) originally posted by @ysavourel on 2009-04-27T17:25:19.000Z:
-
Account Deleted Comment [3.](https://code.google.com/p/okapi/issues/detail?id=59#c3) originally posted by @ysavourel on 2009-04-27T18:17:59.000Z:
What needs to be done here is to look up the list of HTML tags and check their properties (Jericho has a table like this I believe). This will only be in the case of start tags which in non-well-formed html can be either opening of standalone.
I curse the day HTML browsers became so forgiving :-)
-
Account Deleted Comment [4.](https://code.google.com/p/okapi/issues/detail?id=59#c4) originally posted by @ysavourel on 2009-04-27T18:27:05.000Z:
There is a table of HTML 4 elements and their tag notation in: http://www.w3.org/TR/REC-html40/index/elements.html
Note that several 'start' tag have actually never an end tag (F), while some elements have optional ending. But the browsers are so forgiven that they allow thing like <br></br>.
-
Account Deleted Comment [5.](https://code.google.com/p/okapi/issues/detail?id=59#c5) originally posted by @ysavourel on 2009-04-27T18:50:36.000Z:
So what abut cases like <p> - they are not required to have end tags - are they then standalone if they don't have an end mate?
Setting this information correctly will be difficult - you will have to have full context to know if a tag has been ended or not.
-
Account Deleted Comment [6.](https://code.google.com/p/okapi/issues/detail?id=59#c6) originally posted by @ysavourel on 2009-04-27T18:56:46.000Z:
Well <p> should not be inline :) so the question is moot for that one. But I see what you mean, maybe some inline codes like may be alone. I think then they should be flagged as 'opening'. The one only one that should be place-holder should probably be the ones that are empy (per the table).
-
Account Deleted - changed status to resolved
Comment [7.](https://code.google.com/p/okapi/issues/detail?id=59#c7) originally posted by @ysavourel on 2009-04-27T19:45:55.000Z:
Logic now looks like this using Jericho HTML tables:
is this an empty tag? if (startTag.isSyntacticalEmptyElementTag()) { codeType = TextFragment.TagType.PLACEHOLDER; } else if (startTag.isEndTagRequired()) { codeType = TextFragment.TagType.OPENING; } else { codeType = TextFragment.TagType.PLACEHOLDER; }
- Log in to comment
Comment [1.](https://code.google.com/p/okapi/issues/detail?id=59#c1) originally posted by @ysavourel on 2009-04-26T18:09:01.000Z:
I've changed the xliff reader and merger so isolated open/close codes can be merged without their corresponding paired tag. So this bug is less critical. But I think it is still important to set those empty element to 'placeholders'.