Anchors are not retained

Issue #1089 wontfix
Gergely Czuczy created an issue

I’m trying to update yaml files with anchors, while retaining both anchors and comments, however SnakeYaml does not seem to do this. The example code for testing/verifying the functionality is:

#!/usr/bin/env groovy
@Grab(group='org.yaml', module='snakeyaml', version='2.2')

import org.yaml.snakeyaml.Yaml
import org.yaml.snakeyaml.DumperOptions
import org.yaml.snakeyaml.LoaderOptions
import org.yaml.snakeyaml.constructor.Constructor
import org.yaml.snakeyaml.representer.Representer
import org.yaml.snakeyaml.resolver.Resolver

class YAML {
  private Object yamldata
  private Yaml yaml;

  YAML(String data) {
    // construct the yaml object
    LoaderOptions lopts = new LoaderOptions()
    lopts.setProcessComments(true)

    DumperOptions dopts = new DumperOptions()
    dopts.setDefaultFlowStyle(DumperOptions.FlowStyle.BLOCK)
    dopts.setCanonical(false)

    this.yaml = new Yaml(new Constructor(lopts),
                         new Representer(dopts),
                         dopts, lopts,
                         new Resolver())
    this.yamldata = parse(data)
    println this.yamldata.getClass()
  }

  //@NonCPS
  private Object parse(String data) {
    return this.yaml.load(data)
  }

  public String dump() {
    return this.yaml.dump(this.yamldata)
  }
}

String input = '''
# first image
image1:
  repository: &repo some.re.po
  tag: &tag 1.2.3
# second image
image2:
  repository: *repo
  tag: *tag
'''

def y = new YAML(input)
println y.dump()

(groovy, because it’ll be used from a jenkins pipeline)

The output gives back the serialized data, without any anchors or comments:

class java.util.LinkedHashMap
image1:
  repository: some.re.po
  tag: 1.2.3
image2:
  repository: some.re.po
  tag: 1.2.3

Could you please help me out what I’m doing wrong?

Comments (13)

  1. Andrey Somov

    it means that the images may be equal but they are different instances.

    If SnakeYAML would use object data instead of object identity, it would never output [1, 1]

  2. Gergely Czuczy reporter

    In the meantime with the lowlevel API I was able to serialize the parsed yaml to be the same (retaining anchors and comments)

    Just FYI the default NumberAnchorGenerator does not retain anchor names, it is forcefully overriding it;. I need a custom anchorgenerator to retain the anchor names:

    class RetainAnchorGenerator implements AnchorGenerator{
      private int lastAnchorId;
      public RetainAnchorGenerator() {
        this.lastAnchorId = 0;
      }
    
      public String nextAnchor(Node node) {
        if ( node.getAnchor() != null )
          return node.getAnchor();
    
        this.lastAnchorId++;
        NumberFormat format = NumberFormat.getNumberInstance();
        format.setMinimumIntegerDigits(3);
        format.setMaximumFractionDigits(0);// issue 172
        format.setGroupingUsed(false);
        String anchorId = format.format(this.lastAnchorId);
        return "id" + anchorId;
      }
    }
    

    Would be awesome to have this as the default functionality.

    Regarding the highlevel API my guess what happens is, it’s converted to native java structures which do not have the metadata to retain comments and anchors, and thus the serializer has no idea how to do that.

    OTOH the lowlevel API’s ScalarNode really could use a setValue :)

  3. Andrey Somov
    1. please do not combine unrelated topics. For 3 independent issues you may create 3 independent tickets.
    2. A test is the best way to show your idea
    3. You can check whether the parsed instance has copies or the references to the same instance
    4. Sorry, I do not get the last statement at all

  4. Gergely Czuczy reporter

    Thank you, that PR seems to be addressing the retention of already existing anchors, instead of ignoring and replacing them with new generated ones.

  5. Gergely Czuczy reporter

    Will do all of those in a few minutes.

    first, the test: I’m not sure that’s proper. The point would be to retain the anchor, not just to keep the same value.

    I would like to give a broader context here. What we’re trying to do is propagating CI data to helm value.yaml files, and these files have the following characteristics regarding comments and anchors:

    Comments: Comments can be used to auto-generate documentation. Not retaining them cancels the ability to auto-generate documentation for the chart’s configuration. So, comment retention is important.

    Anchors: At many places there can anchors and aliases referring to the same setting. When CI is updating the anchor, it is important to keep the aliased nodes aliased, and not just to copy the value and save it as a resolved value, because the next CI run will also just update the anchor, and if the anchors and aliases are resolved to plain values, then the functionality provided by this yaml feature will be lost, and the deployment will be misconfigured. So, anchor and alias retention is quite important. If an anchor’s value is updated, it is important to keep the anchor, and not to resolve the aliases in the dump, but keep them as aliases.

    Here’s the codebase where I’m trying to put this together with the lowlevel API: https://github.com/gczuczy/snakeyaml-retention

    Regarding the SetValue, you can see, that here a quite ugly hack is needed (which is not even working properly), because a scalar’s value cannot be just updated, but the parent MappingValue’s whole valueset needs to be redefined, just to set a single scalar’s value. The whole scalarnode needs to be re-created, the list copied with the same key’s value replaced with the newly constructed scalarnode. This could be ultimately simplified with a single setValue on a ScalarNode. And it’s not needed properly, because it is for some reason duplicating the anchor, like during parsing the alias is resolved, and upon updating (well, replacing) the scalar, the alias is kept on the same anchor with a different value:

    Input:
    # first image
    image1:
      repository: &repo some.re.po
      tag: &tag 1.2.3
    # second image
    image2:
      repository: *repo
      tag: *tag
    some: thing
    other: thing
    
    ---
    
    Output:
    # first image
    image1:
      repository: &repo some.other.repo
      tag: &tag 1.2.3
    # second image
    image2:
      repository: &repo some.re.po
      tag: *tag
    some: thing
    other: thing
    
    ---
    

    Expected behaviour would be, when an anchored ScalarNode is updated, the aliases referencing it in the output are behaving as aliases, and the resolved value is not forced upon it.

    Please let me know how should I document this here in the tracker. Should I open a new issue with this information? I think this is touching multiple smaller issues/mechanics.

  6. Andrey Somov
    1. I updated the test and I see nothing to be fixed.
    2. As I already told, please use low level API for comments.
    3. Using anchor for scalar would be a major unexpected change. In your example you do NOT wish to have the word “repository“ to be anchored (but it will be !)
    4. Setters have proven to be a design mistake. Immutable data structures win. Setting value may make the tag totally invalid and inconsistent for ScalarNode. I do not see your use case as enough justification to open the door to become mutable.

  7. Log in to comment