Native result produced from YAML loading is not JCF compliant

Issue #562 wontfix
Pawel Veselov created an issue

This has come out of investigations of 531 and IMHO warrants its own home.

YAML generally supports mappings that have keys that are themselves any other node (including mappings and collections). It’s also possible to reference nodes arbitrarily, which makes it possible to have infinite recursion if direct traversal were attempted.

Right now, SnakeYAML uses default JCF Map, Collection and Set implementation classes (along with classes that represent scalar values) when constructing Native data for the application. The keys of these maps, along with the elements of the sets, can be instances of any of these classes as well. However, such approach violates the terms of these implementations, specifically:

  • Any key of a JCF map must be immutable in terms of result of equal() or hashCode method. Having non-scalar keys automatically violate these, as modifying such keys will change the outcome of either equal or hashCode, as default implementations consider their contents for the outcome. Immutability requirement imposed by https://docs.oracle.com/javase/8/docs/api/java/util/Map.html
  • Any value of JCF set must be immutable in the same terms, and therefore is afflicted in the same way if default implementation of JCF classes is used for the elements. Imposed by https://docs.oracle.com/javase/8/docs/api/java/util/Set.html

Based on the difference in requirements between YAML and JCF, I believe it’s generally impossible to have default implementation of JCF [properly] represent the Native data of a YAML document.

The change that I’m suggesting here is to generally move from using default implementations, and have SnakeYAML supply its own. I believe that it will be sufficient if the library simply replaces implementation of hashCode to return a constant, one per instance, value, and equals to only consider the strict object equality.

Comments (10)

  1. Alexander Maslov

    This is not true and I don’t think there is anything like JCF-compliance.

    Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map. A special case of this prohibition is that it is not permissible for a map to contain itself as a key. While it is permissible for a map to contain itself as a value, extreme caution is advised: the equals and hashCode methods are no longer well defined on such a map.

    and

    Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.

    Are Notes which need to be taken with a “great care”. This Note is not for YAML, but for any developer using Collections. You can still create Sets/Maps, add keys to those and then do something to those objects. Hashcode and equals result might change and behavior of your application will we unpredictable, but this does not make anything like “not JCF compliant”. It just states that “great care” or understanding of JCF didn’t happen.

    SnakeYAML is trying to take it in account. That’s why we have 2 step constructions for objects and “delayed map/set filling” in case of recursion.

    We DO have problems connected to hashCode() and recursion, as I have mentioned in pull request #37 , but going with this suggestion is NO GO from my side.

  2. Pawel Veselov reporter

    @Alexander Maslov

    The way I read these JavaDocs is - “take great care with mutable objects to not lead to situations where their hashes/equality changes while inside sets/maps, because if you do - the behavior is undefined”. Mutability doesn’t require that hash codes or equality changes.

    In this case, is it OK that SnakeYAML effectively tells the user - “here, we converted this YAML into these objects, we took great care with doing so, but if you use the result, the behavior of your application will be unpredictable, because blame JDK”?

    That’s what I disagree with here, I don’t think it’s OK to produce result that fall into category of unpredictable behavior when the user didn’t do anything to deserve so, or is required to override the default implementation to avoid it.

    Once the user starts modifying the native result, it’s going to be easy to run into a problem that is related to mutable keys/values, and they won’t be able to blame JDK, because JDK told everybody what’s not supported (equals “unspecified behavior”), but the structures were created by SnakeYAML code.

    As to PR#37, yes, you said that you disagree with this, but can you suggest any other feasible approach, or event point in a direction?

  3. Alexander Maslov

    Yes, I think that the approach could be seen as a modifying solution of “detect cycles in directional graph”. Maybe it is overkill here… maybe.

    Maybe something a bit more simplified as: refactor “collection-filling” and do some heuristics while adding keys/elements.u
    Maybe even go as far as forbid collection usage as keys.

    Unfortunately, I don’t have a general solution… hopefully yet. And any other suggestions are welcome.

    But the one proposed here I don’t like.

  4. Pawel Veselov reporter

    This is why I forked this out into a separate issue. The root cause, to me, seems like the fact that JCF is abused. If that is resolved in general, there won’t be a problem with recursive hash code. I would like to hear your objections to those parts of my comment, unless you believe we are just at impasse in how we interpret the JavaDocs.

    If you are considering disabling the collection usage as keys, they it would probably extend to mapping usage as keys, in which case it seems that at least from that perspective, it’s easier to just can support for circular references.

  5. Alexander Maslov

    IMHO interpretation is pretty much the same, but maybe presumable JavaDoc target is different. I see a developer using JCF as “great care” taker. In your view my “developer” seems to be SnakeYaml (as it is responsible of creation of those instances) and it must somehow take “great care” while creating objects (maybe rejecting to create when this kind of “violation” is detected).

    And I do not disagree with this view.

    For some yaml documents (created by human or non-JVM systems) snakayaml is a JVM-entrypoint. It would be nice if we can catch these “violations” and report Errors, which may give an idea what is wrong and why this document is not accepted.
    This most probably will benefit everybody.

    What I don’t like - to create custom collections when it is not explicitly asked; to have a dependency of loaded domain objects to the classes provided by 3rd party parser even if it is a runtime dependency.

    What I wanted to emphasize that IMHO there is no such thing as JCF compliant.
    As developer I can do something like:

    Map<Object, Strig> map = new HashMap<>();
    List<String> list = new ArrayList<>("one", "two")
    
    map.add(list, "Wow");
    
    list.add("ten");
    
    for(entry <- map) {
      println....
    }
    

    This is wrong, but I can do it (I guess. Sorry, haven’t been writing Java for quite some time).
    And JVM is not going to complain anything, but it may blow-up my system at runtime.

    And some might be even be surprised why map.get(list) is not Wow, but null (sorry for own java interpretation :D)

    I just want to say: I am not against finding the solution(s) of this problem, it would be nice to be able to be “clever” and helpful parser.

    Actually, I am very happy your research brought up this understanding of how this is happening in JCF.
    Unfortunately, some were/are forced to learn it a hard way ;)

  6. Andrey Somov

    I think we try to solve aproblem which does not exist. SnakeYAML is used quite an amount of years and there was no problem untill they created artificial vulnarability issues.

    Please do not forget that YAML is general, it is not only about Java. Why to try to solve it for Java if all the other languages keep failing ? In Javascript, the keys cannot be anything but String. In other languages there are other surprises.

  7. Pawel Veselov reporter

    @Alexander Maslov Yes, then we disagree, because I believe that it’s not OK for any library to produce objects that are guaranteed to behave incorrectly, especially when it’s within its full power to prevent that, as the user of SnakeYAML may have absolutely no control over what YAML they are reading, and will have to duplicate some of the work of the library to resolve any JCF-related issues. Having worked with a fair amount of libraries, this is somewhat a commonplace occurrence, where edge cases create some unexpected behavior, at least in case of SnakeYAML there is a straightforward and supported mechanism to override the default behavior.

    I also see that you view the requirement to only use JDK default implementations for maps/collections as paramount, and I’m not quite sure what are the grounds for that, in my view as long as whatever’s produced answers to corresponding Map/Collection interfaces is quite acceptable. Maybe this comes from that “native” part of the YAML spec, that lays the expectations that all objects produced from the default tags are native platform objects.

    There probably more than one such caveat in JDK, some which, I’m sure, were documented as “don’t do this” retroactively, once the impact was understood, because changing the API or behavior instead was impossible due to compatibility reasons.

    And the compatibility problems is also what any changes to SnakeYAML to address this will cause as well. I’m sure there is plenty of code that expects original ArrayList or LinkedHashMap as the returned native objects, and also expects that their content equality or hashcode calculation is not modified. Removing support for map/collection keys will cause its own kinds of compatibility issues. By now, if somebody had problems related to this, they have found their way to work around them.

    @Andrey Somov And yes, this is

    • an edge case, simply because map/list keys are exuberant, js-yaml just a big “caveat” statement on those, without even explaining the impact on the round-trip
    • nobody [else] is complaining, so there is no traction or pressure to actually change anything, and any change will be disruptive, so their worth would be questionable

    So, I’ll concede pursuing this any further.

  8. Andrey Somov

    the consequences to change the default implementaion is huge, the advantage is small. (except to shut up the low quality tooling which reports false positives)

    By the way, users can always stick to their own collections - but it must be explicitly defined.

    We may create another sibling of SnakeYAML which is supposed to work with untrusted sources, so we have a freedom to have no backwards comtatibility.

  9. Log in to comment