LanguageTool calls are not optimized

Issue #234 wontfix
Former user created an issue

Original issue 234 created by marcin.milkow... on 2012-05-22T11:24:38.000Z:

The LanguageTool server expects to check large chunks of text; otherwise, the checking could be really slow. On each HTTP query, a JLanguageTool object is created, and this introduces around 40ms additional slowdown.

What steps will reproduce the problem?
1. Start CheckMate for a big file.
2. Start LT GUI on the command line, you will see lots of tiny HTTP queries.

What is the expected output? What do you see instead?

The expected behavior would be to chunk many requests into one (it's enough to use \n\n to split them). It should speed up CheckMate's checks. This is especially important in the upcoming version of LanguageTool that contains spellchecking based on hunspell - creation of suggestions takes a lot of time. I understand this makes LT integration slightly harder, but you need to first run your own rules on the segments, then pass a large chunk to the HTTP server, and sort the results using the segment numbers (mapped onto line numbers).

Moreover, the suggestions from the HUNSPELL_RULE are not displayed by CheckMate, neither on the screen, nor in the Quality Check Report. This is unexpected as well.

What version of the product are you using? On what operating system?

Snapshot 0.17, Windows 7 64-bit

Comments (20)

  1. Former user Account Deleted

    Comment [4.](https://code.google.com/p/okapi/issues/detail?id=234#c4) originally posted by mihn... on 2012-07-25T23:58:20.000Z:

    I have also tried calling directly the JLanguageTool API in the languagetool jar. That is blazing fast. Some use cases might still prefer a service, but it might be worth a plugin calling the jar directly (once we have the plugin infrastructure in place?)

  2. Former user Account Deleted

    Comment 5. originally posted by @ysavourel on 2013-02-20T22:11:12.000Z:

    Will port the code into a separate step.

  3. Former user Account Deleted

    Comment 7. originally posted by marcin.milkow... on 2013-03-22T17:59:01.000Z:

    I tried to use the step but I'm not sure how the results could be displayed. It seems that the documents get checked but how do I get the list of warnings and errors?

  4. Former user Account Deleted

    Comment 8. originally posted by @ysavourel on 2013-03-22T18:03:43.000Z:

    Try to add it in a pipeline just before creating a translation kit.
    For example:

    • Raw document to Filter Event
    • Language Tool
    • Rainbow Translation Kit Creation

    This is still very preliminary. I need to change the Quality Check step as well as CheckMate so that new step can be taken advantage of.

  5. Former user Account Deleted

    Comment 9. originally posted by marcin.milkow... on 2013-03-22T18:09:27.000Z:

    I did so, but when I use a bilingual document, the resulting doc is hardly useful. And I don't see any errors nor warnings (though it is much, much faster than before).

  6. Former user Account Deleted

    Comment 10. originally posted by marcin.milkow... on 2013-03-22T18:10:12.000Z:

    I have a talk tomorrow on automatic translation QA and I wanted to mention this but I'm somewhat puzzled how it is supposed to work.

  7. Former user Account Deleted

    Comment 11. originally posted by @ysavourel on 2013-03-22T18:11:45.000Z:

    Currently the only effect is that you'll get annotations in the resulting XLIFF.
    The next thing to do is to take advantage of those annotations (the old system didn't use annotations).

  8. Former user Account Deleted

    Comment 12. originally posted by marcin.milkow... on 2013-03-22T18:15:51.000Z:

    OK, now I see. But there's a slight glitch in the XLIFF:

    <target xml:lang="pl-pl" its:locQualityIssueComment="Mówimy <suggestion>pełniącego funkcję</suggestion> lub <suggestion>odgrywającego rolę</suggestion>, a nie „pełnić rolę”." its:locQualityIssueSeverity="2" its:locQualityIssueType="uncategorized">Na rysunku nie widać serwera baz danych, pełniącego rolę zaplecza informacyjnego, lecz jego obecność jest bardzo prawdopodobna.</target>

    The <suggestion> tags should be all escaped or removed altogether. Leaving just ">" is inconsistent.

  9. Former user Account Deleted

    Comment 13. originally posted by @ysavourel on 2013-03-22T19:45:32.000Z:

    Actually I don't think we want XML tags in the comment.
    (I had not noticed there were cases like that).
    As for the < and '>' it is consistent: they are both seen as '<' and '>' went parsed. But I think we should re-format the message.

    There is still a lot of work to do.

    The idea is to allow various steps to add those annotations and other steps (or an application like Checkmate) can use them.
    We also should be able to pass such annotations to some original file format like HTML5 as it supports ITS LQI. Other applications can then take advantage of that too. see for example http://www.w3.org/International/multilingualweb/lt/wiki/images/e/e6/VistaTEC_Harnessing_Metadata_Slides.pdf

  10. Former user Account Deleted

    Comment 14. originally posted by marcin.milkow... on 2013-03-22T20:07:13.000Z:

    Yes, we started to support these tags but many rules don't have them yet. CheckMate using this plugin would be great, as it's now quite slow...

  11. Former user Account Deleted

    Comment 15. originally posted by @ysavourel on 2013-03-22T20:22:44.000Z:

    BTW: Having getMessage() return "Mówimy <suggestion>pełniącego funkcję</suggestion> lub <suggestion>odgrywającego rolę</suggestion>, a nie „pełnić rolę”." is a bit strange.

    There is a getSuggestedReplacements() method to get the suggestions. Is what we get in the message and what we get with the method always the same?

  12. Former user Account Deleted

    Comment 16. originally posted by marcin.milkow... on 2013-03-22T20:24:42.000Z:

    Yes. Actually, suggestions are represented as strings in the message, including the suggestion tags.

  13. Former user Account Deleted

    Comment 17. originally posted by @ysavourel on 2013-06-05T18:51:32.000Z:

    Status: stable, but one remaining issue: how to pass strings that have inline codes.

  14. Jim Hargrave (OLD)
    • edited description
    • changed status to wontfix

    This is probably resolved with the latest languageTool version (this code is now outside of okapi in its own repo)

  15. Log in to comment