Output Not Escaped

Issue #15 resolved
Danny Roessner created an issue

I have this test RPG program:

**free

dcl-pi *N;
  output char(10);
end-pi;

output = '&<>"''';

return;         

I use XMLSERVICE to call the program:

call QXMLSERV.iPLUG4K(
    'NA',
    '*here',
    '<pgm name="ESCAPETEST" lib="DROESSNER" error="on"><parm io="out"><data name="OUTPUT" type="10a"></data></parm></pgm>',
    'NULL'
);

The output is:

<pgm name="ESCAPETEST" lib="DROESSNER" error="on">
  <parm io="out">
    <data name="OUTPUT" type="10a">&<>"'</data>
  </parm>
  <success><![CDATA[+++ success DROESSNER ESCAPETEST ]]></success>
</pgm>

None of the special XML characters are being escaped.

Comments (30)

  1. Ji Mo Account Deactivated

    CALL XMLSERVICE.iPLUG4K('NA', 'here', '<?xml version=''1.0''?> <myscript> <pgm name=''XMLTEST'' lib=''JIMOXML'' error=''fast''> <parm io=''out''> <data type=''10A''><![CDATA[]]></data> </parm> </pgm> </myscript>', ?)

    Would that be possible to work? XMLService can understand the CDATA passed in. Then user can choose the raw output format. The output is:

    <?xml version='1.0'?><myscript><pgm name='XMLTEST' lib='JIMOXML' error='fast'><parm io='out'><data type='10A'><![CDATA[&<'">]]></data></parm><success><![CDATA[+++ success JIMOXML XMLTEST ]]></success> </pgm> </myscript>

  2. Danny Roessner reporter

    @JiMoJiMo That does seem to work for the example provided. Is there a reason for making the developer specify when to wrap the raw data in the CDATA tags instead of just always escaping the data? I can't see a case where I would want to pass back the raw data since there is a chance it could create invalid XML. Shouldn't it either always escape the data or wrap character data with the CDATA tag by default?

  3. Brian Jerome

    @JiMoJiMo Adding the <![CDATA[]]> to every char output field adds a lot of unnecessary byte size to the output, especially inside <ds>. Is there another solution?

  4. Ji Mo Account Deactivated

    Not sure if changing the raw output format may beak parsers of some clients without escape chars. I will turn to another option to see if this can be control by users with a flag.

  5. Ji Mo Account Deactivated

    Sorry.. got busy last week. I am going to look at if we can leverage the ipc flag doCDdata to surround all values.

  6. Danny Roessner reporter

    @JiMoJiMo I have to also say that surrounding all values with a CDATA tag is not the way to go here. That adds unnecessary bytes to the output. The right way to fix this is to actually escape the reserved characters in the output so that valid XML is always returned. I was very surprised that this wasn't already being done.

  7. Ji Mo Account Deactivated

    -------------------------- cdata to turn global CDATA on ---------------------------------------------------------------- CALL XMLSERVICE.iPLUG4K('NA', 'herecdata', '<?xml version=''1.0''?> <myscript>  <pgm name=''XMLTEST'' lib=''JIMOXML'' error=''fast''>    <parm io=''out''>        <data type=''10A''></data>    </parm>  </pgm> </myscript>', ?)  

    ===== output ============>

    <?xml version='1.0'?> <myscript>  <pgm name='XMLTEST' lib='JIMOXML' error='fast'> <parm io='out'> <data type='10A'><![CDATA[&<'">]]></data> </parm> <success><![CDATA[+++ success JIMOXML XMLTEST ]]></success> </pgm> </myscript>

    There is already a control flag to turn the CData flag on/off in the original design. Form the code, '*cdata' in the control parm to control global CData enabled; '<!CDATA[]>' specified in input xml to enable specified output fields. Compared with scanning each output char and escape the performance should be better. Just a few more extra chars injected but leave controls to users without reserved char output. That might the initial intent that I tend to agree with. :-)

  8. Danny Roessner reporter

    @JiMoJiMo Here's an example of why I disagree with this approach. We have a use case where we send grid data back to the client. This is a data structure array in the pgm. Let's say for example, we send back 500 records of 10 fields. That would add 5000 potentially unnecessary CDATA tags. At 12 bytes each, we are looking at 60KB of unnecessary data in the output. Is there a counter argument against just correctly escaping the xml characters?

  9. Ji Mo Account Deactivated

    @droessner Just had a look at procedure xmlWrkData. There is no flag/code to do escape. That is a brand new function. :-)

  10. Jesse G

    Changing the "kind" of this issue to "enhancement" (which I think is technically correct since the existing function allows for a *cdata option for this specific reason, presumably).

    I'm guessing the underlying request here would be to have xmlservice automatically escape & as &amp as per https://www.w3.org/TR/xml/#syntax Please verify that to be correct.

    I think that's feasible to pursue, but we must proceed with caution when changing the default behavior to ensure that existing toolkits (or other software that may be using a non-compliant XML parser) will not be broken.

  11. Danny Roessner reporter

    I agree it's probably best to put this on a flag to allow for backward compatibility.

    I think the service should convert:

    & to &amp;

    < to &lt;

    > to &gt;

    ' to &apos;

    " to &quot;

  12. Ji Mo Account Deactivated

    Agree with a new flag. I will look into the changes for reserved char escape. Looks like one more thing need to be mentioned that the output data length like '10A' may be changed to reflect the length of the escaped output, since escape injects more chars. Also seems if cdata and escapse both specified, cdata overrides escapse option.

  13. Ji Mo Account Deactivated

    @ThePrez if user requests like <data type=''1A''>, the output char is '&'. Then xmlservice escape it to '&amp' with 4 chars back. In theory, it's still just 1 char to user. Should xmlservice set the ouput to <data type=''4A''>? Not sure if users would parse xml output depending on the length. I am not sure which way to go.

  14. Ji Mo Account Deactivated

    OK. I will leave the length as is.. just escape the reserved chars. That's kind of reasonable that the length indicates the raw output.

  15. Ji Mo Account Deactivated

    @droessner @bjerome I just pushed the changes into repository. Would you have a test please? A new IPC flag '*escp' is added, which turn escape on.

    CALL XMLSERVICE.iPLUG512K( '*NA', 
    '*here*escp', 
    '<?xml version=''1.0''?> 
    <myscript>  
    <pgm name=''XMLTEST'' lib=''JIMOXML'' error=''fast''>
    <parm io=''out'' >
    <data type=''5A''></data>
    </parm>
    </pgm> 
    </myscript>', ?)   
    Return Code = 0 Output Parameter #4 (CO) = 
    <?xml version='1.0'?> 
    <myscript>  
    <pgm name='XMLTEST' lib='JIMOXML' error='fast'> 
    <parm io='out' > 
    <data type='5A'>&amp;&lt;&apos;&quot;&gt;</data> 
    </parm> 
    <success><![CDATA[+++ success JIMOXML XMLTEST ]]></success> 
    </pgm> 
    </myscript>
    
  16. Danny Roessner reporter

    @JiMoJiMo I tested with several different RPG programs with various output and the output was always escaped correctly. I think this looks good. Thanks for the quick turn-around on this one!

  17. Brian Jerome

    @JiMoJiMo When escaping a large string this flag seems to add a huge chunk of time before getting a response. Like 1s without *escp vs 15s with *escp for only about 64KB passed to XMLSERVICE. Can you please look into this?

  18. Ji Mo Account Deactivated

    @bjerome Can I have your test xml sample and output sample of your program called? As mentioned in early discussion, this could leads to performance downgrade, as for each char in the output, there is 5 times comparison for <'&">.

  19. Brian Jerome

    @JiMoJiMo For each char in the output, does that include looking in the xml tags??

    You can just take the example program and make OUTPUT io="both" and type="64000a", then input/output a 64000 character string.

  20. Ji Mo Account Deactivated

    @bjerome Nope. Just scan the data of output for escape.. not scanning any tags. I will take a test to how it goes on my system.

  21. Ji Mo Account Deactivated

    @bjerome Sorry I took back what I just commented. 1024 bytes increased about 300ms on my box. Looks like use *escp took 300 x 64 ms like your test.

  22. Ji Mo Account Deactivated

    @bjerome @droessner Please refresh your code to see if it works now. I made typo's in declaration but rpg is so tolerant... without errors prompted when assigning values to const variables.. and comparisons ( = ) on those took more time (don't know why.. interesting..) Went bunches of wrong ways to locate the performance issue.. but it should be the right spot.. Sorry! Tried with 64k, 640k, 512k.. looks not much time increased now.

  23. Danny Roessner reporter

    @JiMoJiMo I installed the latest code and then tested a string of ~60k characters that contained many escape characters and the speed with and without the *escp flag was almost the same. This looks good to me.

  24. Brian Jerome

    @JiMoJiMo Thank you for addressing this issue. I verified the speed as well and looks much better!

  25. Log in to comment