Invalid Character in XML Error Prevents Order Creation

Issue #65 closed
Jeffrey Olmstead
created an issue

I have a user who has placed an Order via PayPal which feeds back to the Nwazet.Commerce.OrderService.CreateOrder method (so it seems it could also happen in Stripe). This individual is from the country of Lithuania and the street address they provided has an invalid character in it (at least as far as my American keyboard and ultimately XML is concerned). As a result, the following error is thrown:

System.ArgumentException: '', hexadecimal value 0x1A, is an invalid character.

This error stems from:

Nwazet.Commerce.Models.OrderPart.PersistCustomer()

though it could stem from any of these methods. So my question is whether there is a single method / override we can apply to escape the invalid characters.

There is a discussion here: http://stackoverflow.com/questions/8331119/escape-invalid-xml-characters-in-c-sharp about invalid characters. I am wondering if we can use:

SecurityElement.Escape(text); (from System.Security)

to escape all our incoming text and pull these invalid characters out. Could it be implemented at the "ToAttr" method so it is automatically applied against all entries?

Comments (12)

  1. Jeffrey Olmstead reporter

    Should have added this, here is the stack trace between the Nwazet.Commerce code and the actual reported error:

    System.ArgumentException: '', hexadecimal value 0x1A, is an invalid character.
       at System.Xml.XmlEncodedRawTextWriter.InvalidXmlChar(Int32 ch, Char* pDst, Boolean entitize)
       at System.Xml.XmlEncodedRawTextWriter.WriteAttributeTextBlock(Char* pSrc, Char* pSrcEnd)
       at System.Xml.XmlEncodedRawTextWriter.WriteString(String text)
       at System.Xml.XmlEncodedRawTextWriterIndent.WriteString(String text)
       at System.Xml.XmlWellFormedWriter.WriteString(String text)
       at System.Xml.XmlWriter.WriteAttributeString(String prefix, String localName, String ns, String value)
       at System.Xml.Linq.ElementWriter.WriteStartElement(XElement e)
       at System.Xml.Linq.ElementWriter.WriteElement(XElement e)
       at System.Xml.Linq.XElement.WriteTo(XmlWriter writer)
       at System.Xml.Linq.XNode.GetXmlString(SaveOptions o)
       at Nwazet.Commerce.Models.OrderPart.PersistCustomer()
    
  2. Bertrand Le Roy repo owner

    As far as I can tell, SecurityElement.Escape is only dealing with angle brackets, ampersands, and quotes, so that wouldn't help at all here. As far as I can tell, 0x1A really is an invalid character.

  3. Jeffrey Olmstead reporter

    I agree it is an invalid character and actually just finished trying SecurityElement.Escape, it didn't work. I am trying to set up a unit test that fails with an Invalid Character but it is hard to make an invalid character to even test. I wish PayPal would just strip them out (so I wouldn't have to try to) but then again I assume Stripe too could have invalid characters entered. Or, perhaps, this is just a one off scenario that I shouldn't even be trying to resolve. Will let you know what I find (and if you have further ideas I would love them).

  4. Jeffrey Olmstead reporter

    My client did and it is a legitimate order. It is only character in the street address which is coming in as an invalid character. Even though I can't paste it in here, it was a forward arrow --> (all one character) that seemingly was the issue. Here is an article: https://seattlesoftware.wordpress.com/2008/09/11/hexadecimal-value-0-is-an-invalid-character/ that relates to the type of error and parses out all of these invalid characters. What I don't know is if this is something that the XML extensions should be taking care of (i.e. in Orchard native) or if it is something that each module should take care of (i.e. Nwazet.Commerce or my PayPal module)...

  5. Jeffrey Olmstead reporter

    Unfortunately, I don't know what charset it is coming from. Here is an image of what the Orchard log shows (since you can't type arrows in BitBucket):

    Hex-Error.jpg

    Here is how the IPN looks when viewed in a browser on PayPal website:

    Hex-Error-PayPal-IPN-Web-Browser.jpg

    But that same IPN when pasted into Notepad++ reveals the following:

    Hex-Error-Notepad++.jpg

    And finally in plain old Notepad it shows up as the arrow:

    Hex-Error-Notepad.jpg

    So I know there is something there but no way to determine exactly what it is (other than an arrow). Part of me wonders if the XML parser calls it a 0x1A simply because it doesn't know what type of character it is. I too am concerned about failing gracefully as I have workflows tied to the order and none of them went out on this Order since it didn't publish properly. I happened to see it in the error logs.

  6. Bertrand Le Roy repo owner

    OK, so how about sanitizing strings when they come in by just removing characters in that range, and also making sure that even if something of the same sort happens, the order correctly goes through and nothing else fails?

  7. Jeffrey Olmstead reporter

    I agree, just wondering if this is something Orchard.Framework\ContentManagement\XmlHelper.cs should be doing for everyone of if this is something each module developer should concern themselves with. I have written a unit test to check for this:

    public class InvalidCharacters {
        public string TestString { get; set; }
    }
    
    [Test]
    public void StringWithInvalidCharactersToAttribute() {
        var testStringFormat = "test{0}string";
        var testString = String.Format(testStringFormat, Convert.ToString((char)0x1A));
        var cleanString = String.Format(testStringFormat, "");
    
        var invalidCharacters = new InvalidCharacters() {
            TestString = testString
        };
    
        XElement el = new XElement("data").With(invalidCharacters)
            .ToAttr(s => s.TestString);
        var xmlString = el.ToString(SaveOptions.None);
    
        Assert.That(xmlString, Is.EqualTo(String.Format("<data TestString=\"{0}\" />", cleanString)));
    }
    

    The error is thrown on this line: var xmlString = el.ToString(SaveOptions.None);

    If we apply it in XmlHelper.cs then I believe we could do one check here:

            public static XElement Attr<T>(this XElement el, string name, T value) {
                el.SetAttributeValue(name, ToString(value));
                return el;
            }
    

    But of course, there may be other points of issue and there may also be other ways people set XML attributes so I believe I am talking myself into sanitizing it just for my module. Any final thoughts you have are appreciated.

  8. Bertrand Le Roy repo owner

    I'd only change Xmlhelper after thoroughly checking for performance regression: this is an API that you'll see with a high frequency in profiles. But ideally, yes.

    I think the most important fix would be that any unhandled exception in order confirmation code would not prevent the order from going through if payment has already been processed. This seems like the most important.

  9. Jeffrey Olmstead reporter

    I haven't implemented the unhandled exception in order confirmation yet, but here is the code I am using in PayPal module to strip out invalid characters:

    public static class CommonService {
        // Resource: https://seattlesoftware.wordpress.com/2008/09/11/hexadecimal-value-0-is-an-invalid-character/
        public static string SanitizeXmlString(string xml) {
            if (xml == null) {
                return null;
            }
    
            System.Text.StringBuilder buffer = new System.Text.StringBuilder(xml.Length);
    
            foreach (char c in xml) {
                if (IsLegalXmlChar(c)) {
                    buffer.Append(c);
                }
            }
    
            return buffer.ToString();
        }
    
        /// <summary>
        /// Whether a given character is allowed by XML 1.0.
        /// </summary>
        public static bool IsLegalXmlChar(int character) {
            return
            (
                    character == 0x9 /* == '\t' == 9   */          ||
                    character == 0xA /* == '\n' == 10  */          ||
                    character == 0xD /* == '\r' == 13  */          ||
                (character >= 0x20 && character <= 0xD7FF) ||
                (character >= 0xE000 && character <= 0xFFFD) ||
                (character >= 0x10000 && character <= 0x10FFFF)
            );
        }
    }
    

    It is inefficient I am sure iterating over each character... don't see another way though. It does work though so that is a plus. I agree that if it can be deemed to be efficient and working it should make it's way to Nwazet.Commerce and finally to Orchard.Core. Get's me a patch I can put in place for today though.

  10. Log in to comment