LanguageTool calls are not optimized
Original issue 234 created by marcin.milkow... on 2012-05-22T11:24:38.000Z:
The LanguageTool server expects to check large chunks of text; otherwise, the checking could be really slow. On each HTTP query, a JLanguageTool object is created, and this introduces around 40ms additional slowdown.
What steps will reproduce the problem?
1. Start CheckMate for a big file.
2. Start LT GUI on the command line, you will see lots of tiny HTTP queries.
What is the expected output? What do you see instead?
The expected behavior would be to chunk many requests into one (it's enough to use \n\n to split them). It should speed up CheckMate's checks. This is especially important in the upcoming version of LanguageTool that contains spellchecking based on hunspell - creation of suggestions takes a lot of time. I understand this makes LT integration slightly harder, but you need to first run your own rules on the segments, then pass a large chunk to the HTTP server, and sort the results using the segment numbers (mapped onto line numbers).
Moreover, the suggestions from the HUNSPELL_RULE are not displayed by CheckMate, neither on the screen, nor in the Quality Check Report. This is unexpected as well.
What version of the product are you using? On what operating system?
Snapshot 0.17, Windows 7 64-bit
Comments (20)
-
Account Deleted -
Account Deleted Comment [2.](https://code.google.com/p/okapi/issues/detail?id=234#c2) originally posted by @ysavourel on 2012-05-23T11:50:08.000Z:
-
Account Deleted Comment [3.](https://code.google.com/p/okapi/issues/detail?id=234#c3) originally posted by marcin.milkow... on 2012-05-23T19:09:38.000Z:
Probably this will also need some enhancements in LT (bitext should be parsed in a sligthly different way if >1 segment is sent for check; probably some exchange format should be defined).
-
Account Deleted Comment [4.](https://code.google.com/p/okapi/issues/detail?id=234#c4) originally posted by mihn... on 2012-07-25T23:58:20.000Z:
I have also tried calling directly the JLanguageTool API in the languagetool jar. That is blazing fast. Some use cases might still prefer a service, but it might be worth a plugin calling the jar directly (once we have the plugin infrastructure in place?)
-
Account Deleted Comment 5. originally posted by @ysavourel on 2013-02-20T22:11:12.000Z:
Will port the code into a separate step.
-
Account Deleted Comment 6. originally posted by @ysavourel on 2013-03-21T14:59:23.000Z:
The early beta of the new step is available
See: http://www.opentag.com/okapi/wiki/index.php?title=LanguageTool_Step -
Account Deleted Comment 7. originally posted by marcin.milkow... on 2013-03-22T17:59:01.000Z:
I tried to use the step but I'm not sure how the results could be displayed. It seems that the documents get checked but how do I get the list of warnings and errors?
-
Account Deleted Comment 8. originally posted by @ysavourel on 2013-03-22T18:03:43.000Z:
Try to add it in a pipeline just before creating a translation kit.
For example:- Raw document to Filter Event
- Language Tool
- Rainbow Translation Kit Creation
This is still very preliminary. I need to change the Quality Check step as well as CheckMate so that new step can be taken advantage of.
-
Account Deleted Comment 9. originally posted by marcin.milkow... on 2013-03-22T18:09:27.000Z:
I did so, but when I use a bilingual document, the resulting doc is hardly useful. And I don't see any errors nor warnings (though it is much, much faster than before).
-
Account Deleted Comment 10. originally posted by marcin.milkow... on 2013-03-22T18:10:12.000Z:
I have a talk tomorrow on automatic translation QA and I wanted to mention this but I'm somewhat puzzled how it is supposed to work.
-
Account Deleted Comment 11. originally posted by @ysavourel on 2013-03-22T18:11:45.000Z:
Currently the only effect is that you'll get annotations in the resulting XLIFF.
The next thing to do is to take advantage of those annotations (the old system didn't use annotations). -
Account Deleted Comment 12. originally posted by marcin.milkow... on 2013-03-22T18:15:51.000Z:
OK, now I see. But there's a slight glitch in the XLIFF:
<target xml:lang="pl-pl" its:locQualityIssueComment="Mówimy <suggestion>pełniącego funkcję</suggestion> lub <suggestion>odgrywającego rolę</suggestion>, a nie „pełnić rolę”." its:locQualityIssueSeverity="2" its:locQualityIssueType="uncategorized">Na rysunku nie widać serwera baz danych, pełniącego rolę zaplecza informacyjnego, lecz jego obecność jest bardzo prawdopodobna.</target>
The <suggestion> tags should be all escaped or removed altogether. Leaving just ">" is inconsistent.
-
Account Deleted Comment 13. originally posted by @ysavourel on 2013-03-22T19:45:32.000Z:
Actually I don't think we want XML tags in the comment.
(I had not noticed there were cases like that).
As for the < and '>' it is consistent: they are both seen as '<' and '>' went parsed. But I think we should re-format the message.There is still a lot of work to do.
The idea is to allow various steps to add those annotations and other steps (or an application like Checkmate) can use them.
We also should be able to pass such annotations to some original file format like HTML5 as it supports ITS LQI. Other applications can then take advantage of that too. see for example http://www.w3.org/International/multilingualweb/lt/wiki/images/e/e6/VistaTEC_Harnessing_Metadata_Slides.pdf -
Account Deleted Comment 14. originally posted by marcin.milkow... on 2013-03-22T20:07:13.000Z:
Yes, we started to support these tags but many rules don't have them yet. CheckMate using this plugin would be great, as it's now quite slow...
-
Account Deleted Comment 15. originally posted by @ysavourel on 2013-03-22T20:22:44.000Z:
BTW: Having getMessage() return "Mówimy <suggestion>pełniącego funkcję</suggestion> lub <suggestion>odgrywającego rolę</suggestion>, a nie „pełnić rolę”." is a bit strange.
There is a getSuggestedReplacements() method to get the suggestions. Is what we get in the message and what we get with the method always the same?
-
Account Deleted Comment 16. originally posted by marcin.milkow... on 2013-03-22T20:24:42.000Z:
Yes. Actually, suggestions are represented as strings in the message, including the suggestion tags.
-
Account Deleted Comment 17. originally posted by @ysavourel on 2013-06-05T18:51:32.000Z:
Status: stable, but one remaining issue: how to pass strings that have inline codes.
-
Account Deleted - changed status to open
Comment 18. originally posted by @ysavourel on 2013-06-28T12:05:47.000Z:
Resolving this issue is dependant on the resolution of the feature request comment 41. in languageTool (http://sourceforge.net/p/languagetool/feature-requests/41/)
-
Account Deleted Comment 19. originally posted by @ysavourel on 2013-09-05T12:41:51.000Z:
https://sourceforge.net/p/languagetool/feature-requests/41/ is done.
Now, we just need to find the time to update the LanguageTool step to use that version (not in Maven yet it seems) -
- edited description
- changed status to wontfix
This is probably resolved with the latest languageTool version (this code is now outside of okapi in its own repo)
- Log in to comment
Comment [1.](https://code.google.com/p/okapi/issues/detail?id=234#c1) originally posted by @ysavourel on 2012-05-23T11:49:41.000Z:
I've created a separate issue for the HUNSPELL\_RULE. (They may get resolved separately)