PatternSyntaxException in MIF filter
While converting a MIF file I got the following exception:
java.util.regex.PatternSyntaxException: Illegal repetition near index 529
(^[A-Z]{1}:)|(•)|(\\t)|(<[naArR ]{1}[+]*\>)|(<[naArR]{1}=[0-9]+\>)|(<\$.*?>)|(<Default ¶ Font\>)|(<(zenkaku|kanji|full-width|chinese|Indic|Farsi|Hebrew|Abjad|Alif Ba Ta|Thai) [naA]{1}[+]*\>)|(<(zenkaku|kanji|full-width|chinese|Indic|Farsi|Hebrew|Abjad|Alif Ba Ta|Thai) [naA]{1}=[0-9]+\>)|(<(kanji kazu|daiji|hira iroha|kata iroha|hira gojuon|kata gojuon)[+]*\>)|(<(kanji kazu|daiji|hira iroha|kata iroha|hira gojuon|kata gojuon)=[0-9]+\>)|(<Superscript\>)|(<Fixed_Font\>)|(<Regular\>)|(<link\>)|(<Italic\>)|(<Bold\>)|(<Menue\>)|(<{Wingdings_3}\>)|(<{Symbol}\>)
^
at java.util.regex.Pattern.error(Pattern.java:1969)
at java.util.regex.Pattern.closure(Pattern.java:3171)
at java.util.regex.Pattern.sequence(Pattern.java:2148)
at java.util.regex.Pattern.expr(Pattern.java:2010)
at java.util.regex.Pattern.group0(Pattern.java:2919)
at java.util.regex.Pattern.sequence(Pattern.java:2065)
at java.util.regex.Pattern.expr(Pattern.java:2010)
at java.util.regex.Pattern.compile(Pattern.java:1702)
at java.util.regex.Pattern.<init>(Pattern.java:1352)
at java.util.regex.Pattern.compile(Pattern.java:1054)
at net.sf.okapi.common.filters.InlineCodeFinder.compile(InlineCodeFinder.java:146)
at net.sf.okapi.filters.mif.MIFFilter.open(MIFFilter.java:256)
at net.sf.okapi.filters.mif.MIFFilter.open(MIFFilter.java:197)
at net.sf.okapi.steps.common.RawDocumentToFilterEventsStep.handleEvent(RawDocumentToFilterEventsStep.java:132)
at net.sf.okapi.common.pipeline.Pipeline.execute(Pipeline.java:117)
at net.sf.okapi.common.pipeline.Pipeline.process(Pipeline.java:227)
at net.sf.okapi.common.pipeline.Pipeline.process(Pipeline.java:199)
at net.sf.okapi.common.pipelinedriver.PipelineDriver.processBatch(PipelineDriver.java:182)
I cannot disclose the file unfortunately, but the stacktrace appears to point to the offending code.
Comments (8)
-
reporter -
reporter Looks like the code finder rules were updated in commit 0f1f74b3f72f38eb0513a8b5b027b3595225a424 for issue
#938 -
reporter - edited description
-
reporter What it puzzles me is that, looking at the code, I cannot tell where the
{Wingdings_3}
rule comes from.The last two rules should have their braces escaped:
\\{Wingdings_3\\}
and\\{Symbol\\}
. -
reporter Found the culprit.
The problem is in
net.sf.okapi.filters.mif.FontTags#toInlineCodeFinderRules()
where font names are used to build code finder rules.Apparently font names may contain reserved regular expression chars.
-
reporter - changed title to PatternSyntaxException in MIF filter
-
reporter Opened PR https://bitbucket.org/okapiframework/okapi/pull-requests/425 to fix this issue.
-
- changed status to resolved
The pull request #425 was merged.
- Log in to comment
the filter was created as follows:
I haven’t explicitly enabled code finder rules.