Issue #4 new

facilitate modification of serialized YAML that preserves structure and comments

Jason Sachs
created an issue

Use case:

  • End-user creates a YAML configuration file by hand, for use with a set of python scripts.
  • Python script reads the configuration file, and makes minor changes to be flushed out to the file.
  • End-user wishes to continue editing by hand, and expects structure and comments of YAML file to be maintained, not canonicalized.

This was alluded to in an earlier feature request: http://pyyaml.org/ticket/114


I am not familiar with the internal structure of PyYaml, but I would guess that the only possibilities to implement this are:

  • to allow the resulting deserialized object to carry along a tokenized version of the original representation
  • to allow the serializer to inspect an output file, and navigate it, instead of just blindly writing its output. (This approach doesn't seem like a very clean way to handle it, as it breaks the abstraction of input stream deserialization, output stream serialization.)

If the deserialized object is a "plain" python object made of dicts/lists/strings/numbers, then the original serialization (including comments) is lost.

Comments (3)

  1. TomRitchford

    I actually have exactly this use-case, and I have an ugly hack around it which only barely solves the issue.

    I'm storing dictionaries, one per file. So when I make edits to a file, I append these as another YAML record, after a record separator ---.

    It's totally sucky, though, and worst, if the files get large people could edit the original value in the top half, without realizing that there are values overriding it below.

    I'd be willing to help with the work on this project...

  2. Chad Dombrova

    Instead of mixing the formatting info in with the parsed data that PyYaml returns, what about providing a separate data structure just for the extra formatting info?

    e.g.

    data, formatting = yaml.loadf(document)
    yaml.dumpf((data, formatting))
    
    assert yaml.dumpf(yaml.loadf(document)) == document
    
  3. TomRitchford

    This is probably easier for the Yaml developers to program, but it's harder to use as an application programmer.

    Look at the use case - a program makes minor changes to a Yaml document. You'd need to keep track of both the document and the formatting data and apply parallel changes to both.

    But I'd welcome anything to fix this issue, frankly. I made a kludge that sort of works but...

  4. Log in to comment