Memory leak

Issue #462 invalid
Alexey created an issue

Hello, I have experienced high memory usage of my application when using SnakeYAML 1.2.3.

After acquiring heap dump of my application I noticed that there are many objects are not being deleted by GC of these types:

  • 2Mil org.yaml.snakeyaml.error.Mark (90MB)
  • 330K org.yaml.snakeyaml.nodes.ScalarNode, org.yaml.snakeyaml.eventsImplicitTuple, org.yaml.snakeyaml.events.ScalarEvent, org.yaml.snakeyaml.tokens.ScalarToken

I also found some people were experiencing the same problem https://github.com/circe/circe-yaml/issues/73 (notes that downgrading to 1.17 works well but 1.18 isn’t) and https://github.com/ua-parser/uap-scala/issues/31 .

After some investigation I found out that it may be caused by PR #13.

I did not wanted to downgrade to 1.17 completely so I just copied Mark class from 1.17 release repeated high load testing and acquired heap dump again which showed there is no more extra memory consumption by SnakeYAML classes.

Comments (8)

  1. Alexander Maslov

    rather interesting behavior…
    I wonder how implements Serializable effects GarbageCollection, hm… Need more investigation

  2. Alexander Maslov

    I see potential memory leak about nodes (+mark) in case of failures during document creation… Can’t yet explain/believe that Serializable is the reason.

  3. Andrey Somov

    Memory leak is a very big problem. Memory leak means that the application consumes more and more memory until it gets “out of memory“.

    The fact that the application consumed a lot of memory does not mean that there is a memory leak. The application might need a lot of memory to work.

    1. Do you re-use the same instance of Yaml object or you create an instance for every document ?
    2. Marks indeed consume a lot, but they are required for good error messages. Can you switch to the Engine ? Engine can switch off Marks (the error messages become far less informative)
    3. What is the argument to come to the conclusion that making a class Serializable might cause such a consequence ?
    4. Do you mean that the GC began to remove Marks only after you removed the Serializable interface ?

  4. Alexey reporter

    @Andrey Somov I am using yaml documents for i18n files for my pages so there is no reason for these objects to be keept im memory after request is finished especially after stopping load test and performing heap dump which calls GC. Thats why I believe this is not regular memory usage.

    1. No. Is it thread safe? Javadoc for org.yaml.snakeyaml.Yaml states that “Each Thread must have its own instance“. I am using new instance for each i18n file which is for now about 4 Yaml instances per request thread. I can reduce it to 1 per request thread.
    2. I am planning to switch to Engine some time.
    3. That was just a theory because it was the only suspicious change between 1.17 and 1.18.
    4. I believe yes. I have 2 heap dumps: one with Serializable

    and one without

  5. Andrey Somov
    1. My question was unrelated to thread-safety. The question was whether you create an instance for every YAML stream. As far I as I understand you create an instance of Yaml for every stream (or document). It makes it even more strange.
    2. Please do ! We will try to change the API to excplicitly release resources when the parsing/dumping is finished. Let us do it together for the latest version 2.1
    3. No comment
    4. There must be something else involved.

  6. Andrey Somov

    I have run a test with many documents. (you can check out StressTest and play with it).

    The test runs in a constant heap space. If there was a memory leak than the memory consumption should gradually increase which does not happen.

    We need more info to understand what is the problem.

  7. Log in to comment