xml filter codeFinder not working

Issue #128 resolved
Former user created an issue

Original [issue 128](https://code.google.com/p/okapi/issues/detail?id=128) created by khagar... on 2010-03-13T09:13:03.000Z:

Using this xml filter: <?xml version="1.0" encoding="UTF-8" standalone="no"?> <its:rules xmlns:its="http://www.w3.org/2005/11/its" xmlns:itsx="http://www.w3.org/2008/12/its-extensions" xmlns:okp="okapi-framework:xmlfilter-options" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.0"> <!-- See ITS specification at: http://www.w3.org/TR/its/ --> <its:translateRule selector="//WixLocalization/@ Culture" translate="yes"/> <its:translateRule selector="//WixLocalization/@ Codepage" translate="yes"/> <its:translateRule selector="//resource/@ Language" translate="yes"/> <its:translateRule selector="//resource/@ LANGID" translate="yes"/> <its:translateRule selector="//string/@ value" translate="yes"/> <its:translateRule selector="//menuitem/@ caption" translate="yes"/> <its:translateRule selector="//dialog/@ caption" translate="yes"/> <its:translateRule selector="//control/@ caption" translate="yes"/> <okp:codeFinder useCodeFinder="yes">\#v1 count=1 rule0=(&\#xD;&\#xA;) </okp:codeFinder> </its:rules> or actually any filter that is using codeFinder, I get: {{{ ERROR: Error with utility. java.lang.String cannot be cast to java.lang.Integer @ java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer at net.sf.okapi.common.ParametersString.getInteger(ParametersString.java:243) at net.sf.okapi.common.filters.InlineCodeFinder.fromString(InlineCodeFinder.java:189) at net.sf.okapi.filters.xml.Parameters.getFilterOptions(Parameters.java:343) at net.sf.okapi.filters.xml.Parameters.load(Parameters.java:195) at net.sf.okapi.common.filters.FilterConfigurationMapper.getCustomParameters(FilterConfigurationMapper.java:358) at net.sf.okapi.common.filters.FilterConfigurationMapper.createFilter(FilterConfigurationMapper.java:198) at net.sf.okapi.applications.rainbow.utilities.BaseFilterDrivenUtility.processFilterInput(BaseFilterDrivenUtility.java:53) at net.sf.okapi.applications.rainbow.UtilityDriver.execute(UtilityDriver.java:216) at net.sf.okapi.applications.rainbow.MainForm.launchUtility(MainForm.java:1491) at net.sf.okapi.applications.rainbow.MainForm.access$5400(MainForm.java:114) at net.sf.okapi.applications.rainbow.MainForm$75.widgetSelected(MainForm.java:1378) at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source) at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source) at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source) at net.sf.okapi.applications.rainbow.MainForm.run(MainForm.java:1466) at net.sf.okapi.applications.rainbow.Main.main(Main.java:43)

Error count: 1, Warning count: 0 Process duration: 0h 0m 0s 47ms

End process}}}

This is on Windows 7, java 6 update 18, okapi 0.5.1

Comments (4)

  1. Former user Account Deleted

    Comment [2.](https://code.google.com/p/okapi/issues/detail?id=128#c2) originally posted by khagar... on 2010-03-13T19:56:10.000Z:

    Thanks, that fixed the error message, unfortunately, it doesn't work how I expected, ie it doesn't find and preserve the inline line breaks. lineBreakAsCode can't be used either, as that also preserves the "real" line breaks, which is something I don't want. I want to only preserve the inline line breaks that are escaped as &\#xD;&\#xA;. Is this the expected behavior of lineBreakAsCode, or should I fill another bug for that.

  2. Former user Account Deleted

    Comment [3.](https://code.google.com/p/okapi/issues/detail?id=128#c3) originally posted by @ysavourel on 2010-03-13T21:56:38.000Z:

    The regex pattern does not work because the &\#xD;&\#xA; have been already read and interpreted as 0x0D+0x0A and normalized as a linebreak when the codeFinder is invoked.

    It is very difficult to work on such character entities in XML as the parser has already read them. Most parser don't even have a trace of how the line-break was represented.

    If lineBreakAsCode, which will preserve the line-breaks as &comment 10\.; in the extracted content, is not enough for you, I would suggest a search and replace before processing the file.

    For example search for "&\#xD;&\#xA;" and replace by "\#xD;\#xA;". that will make the codes 'normal text' from the XML viewpoint.

    Then in the .fprm file, you can use the codeFinder to protect those using something like:

    <okp:codeFinder useCodeFinder="yes">\#v1 count.i=1 rule0=\#xD;\#xA; </okp:codeFinder>

    That will make "\#xD;\#xA;" inline codes.

    Then after the process you can do another search and replace to put back the "&\#xD;&\#xA;" from the "\#xD;\#xA;".

    -ys

  3. Log in to comment