Duplicate Segments in merged file

Issue #1014 resolved
Devesh kumar created an issue

I have been trying to find solution for -ERR:REF-NOT-FOUND- getting appended in my translated file at random places

and i found this issue has been resolved in version: 1.41.0-SNAPSHOT (https://bitbucket.org/okapiframework/okapi/pull-requests/451/996-core-reference-identifier-values)

. I have used the latest source code from dev branch and built the version locally.

But another issue i noticed here is , in the merged file duplicate segments are appearing randomly.

Attached original and translated file and compare report.

Version: 1.41.0-SNAPSHOT

Comments (17)

  1. Denis Konovalyenko

    @Devesh Kumar , the round-trip of Artura _ McLaren Automotive.html has not given any duplicates (for more information please refer to the attached 1014.zip)… Do you think you would be able to provide additional details (HTML filter parameters, at least) the document is processed with? Also, it would be really helpful if you could reduce the original file content as much as possible (1-2 segments ideally).

  2. Devesh kumar reporter

    left side : original file content (Artura _ McLaren Automotive.html)

    Right side : merged file , with duplicate segments highlighted in RED (File Name : merged_file_with_duplicate.html)

    Few example :

    <a href="https://cars.mclaren.com/us-en/artura" class="language-link js-language-link cta" data-locale="en">

    <a href="https://mclarencars.cn/cn-zh" class="language-link js-language-link cta" data-locale="zh">

    Italian</a>

    French</a>

    Above lines have come twice in the merged file, (see attached : files_and_comparison.zip) ,

    Merged file name : merged_file_with_duplicate.html

  3. Devesh kumar reporter

    @Denis Konovalyenko

    I am getting these logs, what are these while doing the translation

    #Reference dp NOT FOUND and

    the extra target code id='2' does not have corresponding data. (item id='tu105', name='')

    2021-01-12 13:22:40.269 WARN 17446 --- [nio-8732-exec-1] c.matecat.converter.core.XliffProcessor : Missing producer version in input XLIFF
    2021-01-12 13:22:42.141 INFO 17446 --- [nio-8732-exec-1] n.s.o.c.pipelinedriver.PipelineDriver : Input: /tmp/3640111358274881623/pack/manifest.rkm
    2021-01-12 13:22:42.340 INFO 17446 --- [nio-8732-exec-1] n.s.o.s.rainbowkit.postprocess.Merger : Merging: mcleren.html
    2021-01-12 13:22:43.483 WARN 17446 --- [nio-8732-exec-1] n.s.o.filters.html.HtmlSkeletonWriter : Reference 'dp45' not found.
    2021-01-12 13:22:44.374 WARN 17446 --- [nio-8732-exec-1] n.s.o.filters.html.HtmlSkeletonWriter : Reference 'dp46' not found.
    2021-01-12 13:22:45.532 WARN 17446 --- [nio-8732-exec-1] n.s.o.filters.html.HtmlSkeletonWriter : Reference 'dp70' not found.
    2021-01-12 13:22:45.609 WARN 17446 --- [nio-8732-exec-1] n.sf.okapi.common.resource.TextUnitUtil : The extra target code id='2' does not have corresponding data. (item id='tu105', name='')
    2021-01-12 13:22:46.725 WARN 17446 --- [nio-8732-exec-1] n.s.o.filters.html.HtmlSkeletonWriter : Reference 'dp189' not found.
    2021-01-12 13:22:47.681 WARN 17446 --- [nio-8732-exec-1] n.s.o.filters.html.HtmlSkeletonWriter : Reference 'dp203' not found.
    2021-01-12 13:22:48.632 WARN 17446 --- [nio-8732-exec-1] n.s.o.filters.html.HtmlSkeletonWriter : Reference 'dp246' not found.
    2021-01-12 13:22:49.428 WARN 17446 --- [nio-8732-exec-1] n.s.o.filters.html.HtmlSkeletonWriter : Reference 'dp256' not found.
    2021-01-12 13:22:49.435 WARN 17446 --- [nio-8732-exec-1] n.sf.okapi.common.resource.TextUnitUtil : The extra target code id='3' does not have corresponding data. (item id='tu171', name='')
    2021-01-12 13:22:50.131 WARN 17446 --- [nio-8732-exec-1] n.s.o.filters.html.HtmlSkeletonWriter : Reference 'dp344' not found.

    Item id = tu105

    <trans-unit id="tu105" xml:space="preserve">
                    <source xml:lang="en">
                        <bx id="1" />
                        Configure
                        <ex id="1" />
                    </source>
                    <seg-source>
                        <mrk mid="0" mtype="seg">
                        </mrk>
                        <mrk mid="1" mtype="seg">
                            <bx id="1" />
                        </mrk>
                        <mrk mid="2" mtype="seg">
                            Configure
                            <ex id="1" />
                        </mrk>
                        <mrk mid="3" mtype="seg">
                        </mrk>
                    </seg-source>
                    <target xml:lang="hi">
                        <mrk mid="0" mtype="seg">
                            <sid id="1">
                            </sid>
                        </mrk>
                        <mrk mid="1" mtype="seg">
                            <sid id="1">
                                <bx id="1" />
                            </sid>
                        </mrk>
                        <mrk mid="2" mtype="seg">
                            Configure
                            <ex id="1" />
                            <sid id="1">
                                Configure
                                <ex id="1" />
                            </sid>
                        </mrk>
                        <mrk mid="3" mtype="seg">
                            <sid id="1">
                            </sid>
                        </mrk>
                    </target>
    

    Item id = tu171

    <trans-unit id="tu171" xml:space="preserve">
                    <source xml:lang="en">
                        <ex id="1" />
                        <ex id="2" />
                    </source>
                    <seg-source>
                        <mrk mid="0" mtype="seg">
                            <ex id="1" />
                        </mrk>
                        <mrk mid="1" mtype="seg">
                            <ex id="2" />
                        </mrk>
                    </seg-source>
                    <target xml:lang="hi">
                        <mrk mid="0" mtype="seg">
                            <sid id="1">
                                <ex id="1" />
                            </sid>
                        </mrk>
                        <mrk mid="1" mtype="seg">
                            <ex id="2" />
                            <sid id="1">
                                <ex id="2" />
                            </sid>
                        </mrk>
                    </target>
                </trans-unit>
    

  4. Jim Hargrave (OLD)

    I think I found the problem. You are using the RainbowKitStep. The merger used in that step is using very old code. I have updated RainbowKitStep and pushed to dev. Can you give it a try with the latest code? Thanks!

  5. Devesh kumar reporter

    @Jim Hargrave (OLD) where can i find the specific updated RanibowKitStep code(any specific commits), do i need to change only this package?

  6. Devesh kumar reporter

    I was using okapi version :0.35
    also, i have found few issues on the top layer, on using parallestream() , multiple threads accessing and manipulating same document was causing this ERROR:NO REF . adding syncronized block worked fine.
    @Denis Konovalyenko thanks for giving the insight to check for issue on top layers.

  7. Denis Konovalyenko

    @Devesh Kumar , thank you for getting back with the information on the root cause of the issue. Do you think it can be closed then?

  8. Devesh kumar reporter

    yes. thank you for helping me out and apologies, for taking your too much time on this issue , will try 1.41 version once released on maven

  9. Log in to comment