Text unit name not using ID attribute in HTML filter

Issue #65 resolved
Former user created an issue

Original [issue 65](https://code.google.com/p/okapi/issues/detail?id=65) created by @ysavourel on 2009-05-07T14:35:29.000Z:

Some HTML elements such as <p> may have an attribute 'id'. Its value should be carried to the extracted ext unit as its name (=resname in XLIFF), so some utilities can take advantage of it (alignment, update etc.)

Comments (6)

  1. Former user Account Deleted

    Comment [3.](https://code.google.com/p/okapi/issues/detail?id=65#c3) originally posted by @ysavourel on 2009-05-22T22:32:23.000Z:

    I am having a hard time visualizing an algorithm if we assume ID can appear on ANY element (except the above). How far way can the text be from the element and still have the ID set to TextUnit.name? What if there is not a closing element for an element that has an id? In that case we keep an id value hanging around for a long time.

    The only compromise I can think of is to have a finite list of content based content elements such as p, h1, dd, dt etc.. The same elements that we consider complex TextUnits - that way the elements are \*close\* to the text they wrap and the number of possibilities is reduced considerably.

  2. Log in to comment