Tags of empty elements are labled as opening tags

Issue #59 resolved
Former user created an issue

Original [issue 59](https://code.google.com/p/okapi/issues/detail?id=59) created by @ysavourel on 2009-04-26T14:31:49.000Z:

The HTML filter labels the tags of empty elements as 'opening' while they should be 'placeholder' (or have closing). Example:

<p>t1 <br> t2 <img src='x'> t3</p>

will give 2 'opening' codes.

This cause havoc and fatal errors when dealing with merging XLIFF or when recombining segmented text.

Comments (7)

  1. Former user Account Deleted

    Comment [1.](https://code.google.com/p/okapi/issues/detail?id=59#c1) originally posted by @ysavourel on 2009-04-26T18:09:01.000Z:

    I've changed the xliff reader and merger so isolated open/close codes can be merged without their corresponding paired tag. So this bug is less critical. But I think it is still important to set those empty element to 'placeholders'.

  2. Former user Account Deleted

    Comment [3.](https://code.google.com/p/okapi/issues/detail?id=59#c3) originally posted by @ysavourel on 2009-04-27T18:17:59.000Z:

    What needs to be done here is to look up the list of HTML tags and check their properties (Jericho has a table like this I believe). This will only be in the case of start tags which in non-well-formed html can be either opening of standalone.

    I curse the day HTML browsers became so forgiving :-)

  3. Former user Account Deleted

    Comment [5.](https://code.google.com/p/okapi/issues/detail?id=59#c5) originally posted by @ysavourel on 2009-04-27T18:50:36.000Z:

    So what abut cases like <p> - they are not required to have end tags - are they then standalone if they don't have an end mate?

    Setting this information correctly will be difficult - you will have to have full context to know if a tag has been ended or not.

  4. Former user Account Deleted

    Comment [6.](https://code.google.com/p/okapi/issues/detail?id=59#c6) originally posted by @ysavourel on 2009-04-27T18:56:46.000Z:

    Well <p> should not be inline :) so the question is moot for that one. But I see what you mean, maybe some inline codes like may be alone. I think then they should be flagged as 'opening'. The one only one that should be place-holder should probably be the ones that are empy (per the table).

  5. Former user Account Deleted

    Comment [7.](https://code.google.com/p/okapi/issues/detail?id=59#c7) originally posted by @ysavourel on 2009-04-27T19:45:55.000Z:

    Logic now looks like this using Jericho HTML tables:

    is this an empty tag? if (startTag.isSyntacticalEmptyElementTag()) { codeType = TextFragment.TagType.PLACEHOLDER; } else if (startTag.isEndTagRequired()) { codeType = TextFragment.TagType.OPENING; } else { codeType = TextFragment.TagType.PLACEHOLDER; }

  6. Log in to comment