-
assigned issue to
Output Not Escaped
I have this test RPG program:
**free
dcl-pi *N;
output char(10);
end-pi;
output = '&<>"''';
return;
I use XMLSERVICE to call the program:
call QXMLSERV.iPLUG4K(
'NA',
'*here',
'<pgm name="ESCAPETEST" lib="DROESSNER" error="on"><parm io="out"><data name="OUTPUT" type="10a"></data></parm></pgm>',
'NULL'
);
The output is:
<pgm name="ESCAPETEST" lib="DROESSNER" error="on">
<parm io="out">
<data name="OUTPUT" type="10a">&<>"'</data>
</parm>
<success><![CDATA[+++ success DROESSNER ESCAPETEST ]]></success>
</pgm>
None of the special XML characters are being escaped.
Comments (30)
-
-
Account Deactivated CALL XMLSERVICE.iPLUG4K('NA', 'here', '<?xml version=''1.0''?> <myscript> <pgm name=''XMLTEST'' lib=''JIMOXML'' error=''fast''> <parm io=''out''> <data type=''10A''><![CDATA[]]></data> </parm> </pgm> </myscript>', ?)
Would that be possible to work? XMLService can understand the CDATA passed in. Then user can choose the raw output format. The output is:
<?xml version='1.0'?><myscript><pgm name='XMLTEST' lib='JIMOXML' error='fast'><parm io='out'><data type='10A'><![CDATA[&<'">]]></data></parm><success><![CDATA[+++ success JIMOXML XMLTEST ]]></success> </pgm> </myscript>
-
reporter @JiMoJiMo That does seem to work for the example provided. Is there a reason for making the developer specify when to wrap the raw data in the CDATA tags instead of just always escaping the data? I can't see a case where I would want to pass back the raw data since there is a chance it could create invalid XML. Shouldn't it either always escape the data or wrap character data with the CDATA tag by default?
-
@JiMoJiMo Adding the <![CDATA[]]> to every char output field adds a lot of unnecessary byte size to the output, especially inside <ds>. Is there another solution?
-
Account Deactivated Not sure if changing the raw output format may beak parsers of some clients without escape chars. I will turn to another option to see if this can be control by users with a flag.
-
@JiMoJiMo Any progress on this?
-
Account Deactivated Sorry.. got busy last week. I am going to look at if we can leverage the ipc flag doCDdata to surround all values.
-
reporter @JiMoJiMo I have to also say that surrounding all values with a CDATA tag is not the way to go here. That adds unnecessary bytes to the output. The right way to fix this is to actually escape the reserved characters in the output so that valid XML is always returned. I was very surprised that this wasn't already being done.
-
Account Deactivated -------------------------- cdata to turn global CDATA on ---------------------------------------------------------------- CALL XMLSERVICE.iPLUG4K('NA', 'herecdata', '<?xml version=''1.0''?> <myscript> <pgm name=''XMLTEST'' lib=''JIMOXML'' error=''fast''> <parm io=''out''> <data type=''10A''></data> </parm> </pgm> </myscript>', ?)
===== output ============>
<?xml version='1.0'?> <myscript> <pgm name='XMLTEST' lib='JIMOXML' error='fast'> <parm io='out'> <data type='10A'><![CDATA[&<'">]]></data> </parm> <success><![CDATA[+++ success JIMOXML XMLTEST ]]></success> </pgm> </myscript>
There is already a control flag to turn the CData flag on/off in the original design. Form the code, '*cdata' in the control parm to control global CData enabled; '<!CDATA[]>' specified in input xml to enable specified output fields. Compared with scanning each output char and escape the performance should be better. Just a few more extra chars injected but leave controls to users without reserved char output. That might the initial intent that I tend to agree with. :-)
-
reporter @JiMoJiMo Here's an example of why I disagree with this approach. We have a use case where we send grid data back to the client. This is a data structure array in the pgm. Let's say for example, we send back 500 records of 10 fields. That would add 5000 potentially unnecessary CDATA tags. At 12 bytes each, we are looking at 60KB of unnecessary data in the output. Is there a counter argument against just correctly escaping the xml characters?
-
Account Deactivated @droessner Just had a look at procedure xmlWrkData. There is no flag/code to do escape. That is a brand new function. :-)
-
- marked as enhancement
Changing the "kind" of this issue to "enhancement" (which I think is technically correct since the existing function allows for a *cdata option for this specific reason, presumably).
I'm guessing the underlying request here would be to have xmlservice automatically escape
&
as&
as per https://www.w3.org/TR/xml/#syntax Please verify that to be correct.I think that's feasible to pursue, but we must proceed with caution when changing the default behavior to ensure that existing toolkits (or other software that may be using a non-compliant XML parser) will not be broken.
-
reporter I agree it's probably best to put this on a flag to allow for backward compatibility.
I think the service should convert:
&
to&
<
to<
>
to>
'
to'
"
to"
-
Account Deactivated Agree with a new flag. I will look into the changes for reserved char escape. Looks like one more thing need to be mentioned that the output data length like '10A' may be changed to reflect the length of the escaped output, since escape injects more chars. Also seems if cdata and escapse both specified, cdata overrides escapse option.
-
Account Deactivated @ThePrez if user requests like <data type=''1A''>, the output char is '&'. Then xmlservice escape it to '&' with 4 chars back. In theory, it's still just 1 char to user. Should xmlservice set the ouput to <data type=''4A''>? Not sure if users would parse xml output depending on the length. I am not sure which way to go.
-
I think any modern xml parser wouldn't care what the length was.
-
Account Deactivated OK. I will leave the length as is.. just escape the reserved chars. That's kind of reasonable that the length indicates the raw output.
-
Account Deactivated @droessner @bjerome I just pushed the changes into repository. Would you have a test please? A new IPC flag '*escp' is added, which turn escape on.
CALL XMLSERVICE.iPLUG512K( '*NA', '*here*escp', '<?xml version=''1.0''?> <myscript> <pgm name=''XMLTEST'' lib=''JIMOXML'' error=''fast''> <parm io=''out'' > <data type=''5A''></data> </parm> </pgm> </myscript>', ?) Return Code = 0 Output Parameter #4 (CO) = <?xml version='1.0'?> <myscript> <pgm name='XMLTEST' lib='JIMOXML' error='fast'> <parm io='out' > <data type='5A'>&<'"></data> </parm> <success><![CDATA[+++ success JIMOXML XMLTEST ]]></success> </pgm> </myscript>
-
reporter @JiMoJiMo I tested with several different RPG programs with various output and the output was always escaped correctly. I think this looks good. Thanks for the quick turn-around on this one!
-
Account Deactivated Thanks @droessner
-
Account Deactivated - changed status to resolved
Added a new IPC control flag *escp to turn xml escape on.
-
@JiMoJiMo When escaping a large string this flag seems to add a huge chunk of time before getting a response. Like 1s without
*escp
vs 15s with*escp
for only about 64KB passed to XMLSERVICE. Can you please look into this? -
Account Deactivated @bjerome Can I have your test xml sample and output sample of your program called? As mentioned in early discussion, this could leads to performance downgrade, as for each char in the output, there is 5 times comparison for <'&">.
-
@JiMoJiMo For each char in the output, does that include looking in the xml tags??
You can just take the example program and make OUTPUT io="both" and type="64000a", then input/output a 64000 character string.
-
Account Deactivated @bjerome Nope. Just scan the data of output for escape.. not scanning any tags. I will take a test to how it goes on my system.
-
Account Deactivated @bjerome Sorry I took back what I just commented. 1024 bytes increased about 300ms on my box. Looks like use *escp took 300 x 64 ms like your test.
-
Account Deactivated @bjerome @droessner Please refresh your code to see if it works now. I made typo's in declaration but rpg is so tolerant... without errors prompted when assigning values to const variables.. and comparisons ( = ) on those took more time (don't know why.. interesting..) Went bunches of wrong ways to locate the performance issue.. but it should be the right spot.. Sorry! Tried with 64k, 640k, 512k.. looks not much time increased now.
-
reporter @JiMoJiMo I installed the latest code and then tested a string of ~60k characters that contained many escape characters and the speed with and without the *escp flag was almost the same. This looks good to me.
-
Account Deactivated @droessner Thank you for the test!
-
@JiMoJiMo Thank you for addressing this issue. I verified the speed as well and looks much better!
- Log in to comment