TextFragment.getCodedText() problem

Issue #321 on hold
Former user created an issue

Original issue 321 created by aurelien.tomass... on 2013-03-28T12:54:19.000Z:

Testing my application with JUnit, i have a unitary test for manipulation of TextFragment. [The goal is to create a tree from a textFragment, manipulate it, and regenerate a new TextFragment from this tree.]

I found a problem creating a tree with a "high depth" of elements:
I want to create a TextFragment corresponding to this code:











Content









To create this one, I just create a new TextFragment, appen 10 OPENING code "", appen the text "Content", and then appen 10 CLOSING code "".

With the Eclipse debugger, I can check the TextFragment structure, and each code is relevant (cf. attachment "sc1.png"): 10 opening codes and 10 closing.

Now, i want to use the string of the CodedText.
I call the method TextFragment.getCodedText(), and observe the String structure.

Assuming that each code is coded into two characters into this String, i have to have 20 characters for the 10 opening tags, the text "Content", and 20 characters for the closing tags. For each pair, the first char is the char type.

With the eclipse debugger, I check the TYPE character of each opening code, then the char 0, 2, 4, 6...18. Their values are in the attachment "sc2.png" : all are OPENING tag, except the char 10 (<=> the code 5 into the "sc1.png"), which is here an ISOLATED tag.

If i refer to http://okapi.opentag.com/devguide/gettingstarted.html#readingDocument, it is written that if there is an opening tag without the closing one, it is an isolated tag. Here, the code 5 has a closing code, but is considered as an isolated one.

For the closing tags, the last one is also considered as an isolated one, while other are CLOSING tags.

NB: i use the version 0.19 of Okapi Framework.

Comments (11)

  1. Former user Account Deleted

    Comment 1. originally posted by @ysavourel on 2013-03-28T15:41:31.000Z:

    I tested nesting 11 ** tags and all seem to work fine:

    TextFragment tf = new TextFragment();
    tf.append(TagType.OPENING, "b", "");
    tf.append(TagType.OPENING, "b", "
    ");
    tf.append(TagType.OPENING, "b", "");
    tf.append(TagType.OPENING, "b", "
    ");
    tf.append(TagType.OPENING, "b", "");
    tf.append(TagType.OPENING, "b", "
    ");
    tf.append(TagType.OPENING, "b", "");
    tf.append(TagType.OPENING, "b", "
    ");
    tf.append(TagType.OPENING, "b", "");
    tf.append(TagType.OPENING, "b", "
    ");
    tf.append(TagType.OPENING, "b", "");
    tf.append("Content");
    tf.append(TagType.CLOSING, "b", "
    ");
    tf.append(TagType.CLOSING, "b", "");
    tf.append(TagType.CLOSING, "b", "
    ");
    tf.append(TagType.CLOSING, "b", "");
    tf.append(TagType.CLOSING, "b", "
    ");
    tf.append(TagType.CLOSING, "b", "");
    tf.append(TagType.CLOSING, "b", "
    ");
    tf.append(TagType.CLOSING, "b", "");
    tf.append(TagType.CLOSING, "b", "
    ");
    tf.append(TagType.CLOSING, "b", "");
    tf.append(TagType.CLOSING, "b", "
    ");
    assertEquals("***Content***", tf.toText());
    assertEquals("<1><2><3><4><5><6><7><8><9><10><11>Content</11></10></9></8></7></6></5></4></3></2></1>", fmt.setContent(tf).toString());

    What you se may be the result of how the tags were added: they must have matching type ("b" in the example above).
    Maybe there is some typo in the test code?

    If you don't find what is possibly wrong, please provide the test unit, so we can debug it.
    Thanks,
    -ys

  2. Former user Account Deleted

    Comment 4. originally posted by aurelien.tomass... on 2013-03-28T16:05:47.000Z:

    In fact, the problem is not the structure of the TextFragment, but the result of the method "getCodedText()".

    When calling this method, each code is coded into two characters, the first saying the type of the code: "opening code", "closing code" or "isolated code", and the second one is an ID-like.

    If you display Unicode value of the 5th code into this generated String, it says the code is "isloated", while it is not. In a JUnit test, you can just add this test:

    assertEquals((int)tf.getCodedText().chatAt(0),57601);/*The unicode value of open tag*/
    assertEquals((int)tf.getCodedText().chatAt(2),57601);
    assertEquals((int)tf.getCodedText().chatAt(4),57601);
    ...
    assertEquals((int)tf.getCodedText().chatAt(8),57601);
    assertEquals((int)tf.getCodedText().chatAt(10),57601); /*Error there, because finds 5763: isolated tag*/
    assertEquals((int)tf.getCodedText().chatAt(12),57601);
    ...

  3. Former user Account Deleted

    Comment 5. originally posted by @ysavourel on 2013-03-28T16:57:37.000Z:

    The toString() call from fmt (a GenericContnet object) in the test uses getCodedText(). So if there was a placeholder instead of an opening, we would see it.

    In any case, if I add to the test and do this:

    String ct = tf.getCodedText();
    assertEquals(TextFragment.MARKER_OPENING, ct.charAt(0));
    assertEquals(TextFragment.MARKER_OPENING, ct.charAt(2));
    assertEquals(TextFragment.MARKER_OPENING, ct.charAt(4));
    assertEquals(TextFragment.MARKER_OPENING, ct.charAt(6));
    assertEquals(TextFragment.MARKER_OPENING, ct.charAt(8));
    assertEquals(TextFragment.MARKER_OPENING, ct.charAt(10));
    assertEquals(TextFragment.MARKER_OPENING, ct.charAt(12));
    assertEquals(TextFragment.MARKER_OPENING, ct.charAt(14));
    assertEquals(TextFragment.MARKER_OPENING, ct.charAt(16));
    assertEquals(TextFragment.MARKER_OPENING, ct.charAt(18));
    assertEquals(TextFragment.MARKER_OPENING, ct.charAt(20));
    // Content goes here
    assertEquals(TextFragment.MARKER_CLOSING, ct.charAt(29));
    assertEquals(TextFragment.MARKER_CLOSING, ct.charAt(31));
    assertEquals(TextFragment.MARKER_CLOSING, ct.charAt(33));
    assertEquals(TextFragment.MARKER_CLOSING, ct.charAt(35));
    assertEquals(TextFragment.MARKER_CLOSING, ct.charAt(37));
    assertEquals(TextFragment.MARKER_CLOSING, ct.charAt(39));
    assertEquals(TextFragment.MARKER_CLOSING, ct.charAt(41));
    assertEquals(TextFragment.MARKER_CLOSING, ct.charAt(43));
    assertEquals(TextFragment.MARKER_CLOSING, ct.charAt(45));
    assertEquals(TextFragment.MARKER_CLOSING, ct.charAt(47));
    assertEquals(TextFragment.MARKER_CLOSING, ct.charAt(49));

    It passes.
    I'm guessing there is something that cause one of the code to be seen as placeholder in your code, before you do the asserts.
    It could be many things. One would need the full code to see what wrong.

    cheers,
    -yves

  4. Former user Account Deleted

    Comment 6. originally posted by aurelien.tomass... on 2013-03-29T08:32:16.000Z:

    Ok, i tried with your example and it works correctly.
    But, if i construct the textFragment as my tree algorithm does, the construction of the TextFragment can be coded like this:

        TextFragment tf = new TextFragment();  
        tf.append("Content"); // the deepest child  
        for (int i=0; i<10; i++){  
            TextFragment tf2 = new TextFragment();  
            tf2.append(TagType.OPENING, "b", "**");  
            tf2.append(tf);  
            tf2.append(TagType.CLOSING, "b", "**");  
            tf = tf2;  
        }
    

    There, JUnit displays:
    Failed tests: okapidDegub: expected:<57601> but was:<57603>

  5. Former user Account Deleted

    Comment 7. originally posted by @ysavourel on 2013-03-29T12:15:42.000Z:

    Thanks for the code. That is explaining a lot.
    the tf2.append(tf) triggers an insertion of a TF into another one. That operation may cause the re-balancing of the codes: that is when the IDs for the closing codes are matched with their opening counterparts. And, as you can see that happens when there is one more opening than closing.
    There is probably some side effect that occurs then that prevents the proper matching. The documentation for TF.append(Code, String, String) should probably mention that the auto-pairing of closing/opening code needs to be done before any re-balancing is done.
    Maybe there are ways to fix this. I'll try to look at it in the coming days.
    -ys

  6. Former user Account Deleted

    Comment 8. originally posted by @ysavourel on 2013-03-29T14:35:26.000Z:

    This is a tricky situation. We could change the TF.insert() code so the closing markers would be set to -1 and rebalanced when doing the insert. But that would break the code in other places.
    This issue here is that each TF has its own set of IDs so when we append or insert two TFs with codes we have to somehow find a way to adjust the IDs if they overlap.
    In your test code that happens when the inserted code equals the number of pairs divided by 2 plus 1.
    There are several ways to work around the problem.

    One is to force the IDs:

    TextFragment tf = new TextFragment();
    tf.append("Content"); // the deepest child
    for ( int i=0; i<10; i++ ) {
    TextFragment tf2 = new TextFragment();
    tf2.append(TagType.OPENING, "b", "", 10-i);
    tf.insert(0, tf2);
    tf2 = new TextFragment();
    tf2.append(TagType.CLOSING, "b", "
    ", 10-i);
    tf.insert(-1, tf2);
    }
    assertEquals("<1><2><3><4><5><6><7><8><9><10>Content</10></9></8></7></6></5></4></3></2></1>", fmt.setContent(tf).toString());

    The other one is to assign a unique 'type' to each paired codes:

    TextFragment tf = new TextFragment();
    tf.append("Content"); // the deepest child
    for ( int i=0; i<10; i++ ) {
    TextFragment tf2 = new TextFragment();
    tf2.append(TagType.OPENING, "b"+i, "");
    tf2.append(tf);
    tf2.append(TagType.CLOSING, "b"+i, "
    ");
    tf = tf2;
    }
    assertEquals("<1><2><3><4><5><6><7><8><9><10>Content</10></9></8></7></6></5></4></3></2></1>", fmt.setContent(tf).toString());

    for now I don't think we can change the TF.insert() to allow your code to work because it would break several filters. but We'll try to see if we can improve this.

    -ys

  7. Former user Account Deleted

    Comment 9. originally posted by @ysavourel on 2013-03-29T14:36:39.000Z:

    One more thing: there was a bug also (the balancing was incorrectly reset when is should not).
    I don't think it affected your example. but its was nice to catch.
    -ys

  8. Former user Account Deleted

    Comment 10. originally posted by aurelien.tomass... on 2013-03-29T14:52:25.000Z:

    Thanks a lot for the investigation!
    In my case i think it will be easier to manipulate unique 'type', thanks for the tips and for your work,

    cheers,
    Aurelien

  9. Former user Account Deleted
    • changed status to open

    Comment 11. originally posted by @ysavourel on 2013-04-15T12:22:36.000Z:

    I'm keeping this issue open (with a lower priority)
    As it would be nice to allow the following to work.

    TextFragment tf = new TextFragment();
    tf.append("Content"); // the deepest child
    for ( int i=0; i<10; i++ ) {
    TextFragment tf2 = new TextFragment();
    tf2.append(TagType.OPENING, "b", "");
    tf2.append(tf);
    tf2.append(TagType.CLOSING, "b", "
    ");
    tf = tf2;
    }

  10. Log in to comment