- changed status to open
xml filter codeFinder not working
Original [issue 128](https://code.google.com/p/okapi/issues/detail?id=128) created by khagar... on 2010-03-13T09:13:03.000Z:
Using this xml filter:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<its:rules xmlns:its="http://www.w3.org/2005/11/its"
xmlns:itsx="http://www.w3.org/2008/12/its-extensions"
xmlns:okp="okapi-framework:xmlfilter-options"
xmlns:xlink="http://www.w3.org/1999/xlink" version="1.0">
<!-- See ITS specification at: http://www.w3.org/TR/its/ -->
<its:translateRule selector="//WixLocalization/@ Culture" translate="yes"/>
<its:translateRule selector="//WixLocalization/@ Codepage" translate="yes"/>
<its:translateRule selector="//resource/@ Language" translate="yes"/>
<its:translateRule selector="//resource/@ LANGID" translate="yes"/>
<its:translateRule selector="//string/@ value" translate="yes"/>
<its:translateRule selector="//menuitem/@ caption" translate="yes"/>
<its:translateRule selector="//dialog/@ caption" translate="yes"/>
<its:translateRule selector="//control/@ caption" translate="yes"/>
<okp:codeFinder useCodeFinder="yes">\#v1
count=1
rule0=(&\#xD;&\#xA;)
</okp:codeFinder>
</its:rules>
or actually any filter that is using codeFinder, I get:
{{{
ERROR: Error with utility.
java.lang.String cannot be cast to java.lang.Integer
@ java.lang.ClassCastException: java.lang.String cannot be cast to
java.lang.Integer
java.lang.ClassCastException: java.lang.String cannot be cast to
java.lang.Integer
at net.sf.okapi.common.ParametersString.getInteger(ParametersString.java:243)
at
net.sf.okapi.common.filters.InlineCodeFinder.fromString(InlineCodeFinder.java:189)
at net.sf.okapi.filters.xml.Parameters.getFilterOptions(Parameters.java:343)
at net.sf.okapi.filters.xml.Parameters.load(Parameters.java:195)
at
net.sf.okapi.common.filters.FilterConfigurationMapper.getCustomParameters(FilterConfigurationMapper.java:358)
at
net.sf.okapi.common.filters.FilterConfigurationMapper.createFilter(FilterConfigurationMapper.java:198)
at
net.sf.okapi.applications.rainbow.utilities.BaseFilterDrivenUtility.processFilterInput(BaseFilterDrivenUtility.java:53)
at
net.sf.okapi.applications.rainbow.UtilityDriver.execute(UtilityDriver.java:216)
at
net.sf.okapi.applications.rainbow.MainForm.launchUtility(MainForm.java:1491)
at net.sf.okapi.applications.rainbow.MainForm.access$5400(MainForm.java:114)
at
net.sf.okapi.applications.rainbow.MainForm$75.widgetSelected(MainForm.java:1378)
at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
at net.sf.okapi.applications.rainbow.MainForm.run(MainForm.java:1466)
at net.sf.okapi.applications.rainbow.Main.main(Main.java:43)
Error count: 1, Warning count: 0 Process duration: 0h 0m 0s 47ms
End process}}}
This is on Windows 7, java 6 update 18, okapi 0.5.1
Comments (4)
-
Account Deleted -
Account Deleted Comment [2.](https://code.google.com/p/okapi/issues/detail?id=128#c2) originally posted by khagar... on 2010-03-13T19:56:10.000Z:
Thanks, that fixed the error message, unfortunately, it doesn't work how I expected, ie it doesn't find and preserve the inline line breaks. lineBreakAsCode can't be used either, as that also preserves the "real" line breaks, which is something I don't want. I want to only preserve the inline line breaks that are escaped as &\#xD;&\#xA;. Is this the expected behavior of lineBreakAsCode, or should I fill another bug for that.
-
Account Deleted Comment [3.](https://code.google.com/p/okapi/issues/detail?id=128#c3) originally posted by @ysavourel on 2010-03-13T21:56:38.000Z:
The regex pattern does not work because the &\#xD;&\#xA; have been already read and interpreted as 0x0D+0x0A and normalized as a linebreak when the codeFinder is invoked.
It is very difficult to work on such character entities in XML as the parser has already read them. Most parser don't even have a trace of how the line-break was represented.
If lineBreakAsCode, which will preserve the line-breaks as &comment 10\.; in the extracted content, is not enough for you, I would suggest a search and replace before processing the file.
For example search for "&\#xD;&\#xA;" and replace by "\#xD;\#xA;". that will make the codes 'normal text' from the XML viewpoint.
Then in the .fprm file, you can use the codeFinder to protect those using something like:
<okp:codeFinder useCodeFinder="yes">\#v1 count.i=1 rule0=\#xD;\#xA; </okp:codeFinder>
That will make "\#xD;\#xA;" inline codes.
Then after the process you can do another search and replace to put back the "&\#xD;&\#xA;" from the "\#xD;\#xA;".
-ys
-
Account Deleted - changed status to resolved
Comment [4.](https://code.google.com/p/okapi/issues/detail?id=128#c4) originally posted by @ysavourel on 2010-03-15T14:50:27.000Z:
The initial issue (count vs count.i) was doc-related: I'm marking it as fixed. The broader problem of preserving &\#xD; is a different topic.
- Log in to comment
Comment [1.](https://code.google.com/p/okapi/issues/detail?id=128#c1) originally posted by @ysavourel on 2010-03-13T13:06:35.000Z:
It's a bug in the documentation:
cout=1 should be count.i=1
like this:
<okp:codeFinder useCodeFinder="yes">\#v1 count.i=1 rule0=(&\#xD;&\#xA;) </okp:codeFinder>
I'll fix the example in the help.