Incorrectly reformats arrays of hashes

Issue #51 resolved
Alex Harvey
created an issue

Input file:

role::startup::author::rsyslog_inputs:
  imfile:
    - ruleset: 'AEM-slinglog'
      File: '/opt/aem/author/crx-quickstart/logs/error.log'
      startmsg.regex: '^[-+T.:[:digit:]]*'
      tag: 'error'
    - ruleset: 'AEM-slinglog'
      File: '/opt/aem/author/crx-quickstart/logs/stdout.log'
      startmsg.regex: '^[-+T.:[:digit:]]*'
      tag: 'stdout'

Ruamel invocation:

import ruamel.yaml

def read_file(f):
    with open(f, 'r') as _f:
        return ruamel.yaml.round_trip_load(_f.read(), preserve_quotes=True)

def write_file(f, data):
    with open(f, 'w') as _f:
        _f.write(ruamel.yaml.dump(data, Dumper=ruamel.yaml.RoundTripDumper, block_seq_indent=2))


data = read_file('in.yaml')
write_file('out.yaml', data)

Output file:

role::startup::author::rsyslog_inputs:
  imfile:
    - ruleset: 'AEM-slinglog'
    File: '/opt/aem/author/crx-quickstart/logs/error.log'
    startmsg.regex: '^[-+T.:[:digit:]]*'
    tag: 'error'
    - ruleset: 'AEM-slinglog'
    File: '/opt/aem/author/crx-quickstart/logs/stdout.log'
    startmsg.regex: '^[-+T.:[:digit:]]*'
    tag: 'stdout'

Comments (16)

  1. Alex Harvey reporter

    Furthermore this reformatting has made the file unreadable by Ruamel:

    Traceback (most recent call last):
      File "bulk_edit.py", line 68, in <module>
        hiera = file_data(f)
      File "bulk_edit.py", line 41, in file_data
        return ruamel.yaml.round_trip_load(_f.read(), preserve_quotes=True)
      File "/Library/Python/2.7/site-packages/ruamel/yaml/main.py", line 123, in round_trip_load
        return load(stream, RoundTripLoader, version, preserve_quotes=preserve_quotes)
      File "/Library/Python/2.7/site-packages/ruamel/yaml/main.py", line 81, in load
        return loader.get_single_data()
      File "/Library/Python/2.7/site-packages/ruamel/yaml/constructor.py", line 54, in get_single_data
        node = self.get_single_node()
      File "/Library/Python/2.7/site-packages/ruamel/yaml/composer.py", line 50, in get_single_node
        document = self.compose_document()
      File "/Library/Python/2.7/site-packages/ruamel/yaml/composer.py", line 70, in compose_document
        node = self.compose_node(None, None)
      File "/Library/Python/2.7/site-packages/ruamel/yaml/composer.py", line 105, in compose_node
        node = self.compose_mapping_node(anchor)
      File "/Library/Python/2.7/site-packages/ruamel/yaml/composer.py", line 164, in compose_mapping_node
        item_value = self.compose_node(node, item_key)
      File "/Library/Python/2.7/site-packages/ruamel/yaml/composer.py", line 105, in compose_node
        node = self.compose_mapping_node(anchor)
      File "/Library/Python/2.7/site-packages/ruamel/yaml/composer.py", line 164, in compose_mapping_node
        item_value = self.compose_node(node, item_key)
      File "/Library/Python/2.7/site-packages/ruamel/yaml/composer.py", line 103, in compose_node
        node = self.compose_sequence_node(anchor)
      File "/Library/Python/2.7/site-packages/ruamel/yaml/composer.py", line 133, in compose_sequence_node
        while not self.check_event(SequenceEndEvent):
      File "/Library/Python/2.7/site-packages/ruamel/yaml/parser.py", line 116, in check_event
        self.current_event = self.state()
      File "/Library/Python/2.7/site-packages/ruamel/yaml/parser.py", line 448, in parse_block_sequence_entry
        token.id, token.start_mark)
    ruamel.yaml.parser.ParserError: while parsing a block collection
      in "<byte string>", line 262, column 5:
            - ruleset: 'AEM-slinglog'
            ^
    expected <block end>, but found '?'
      in "<byte string>", line 263, column 5:
            File: '/opt/aem/author/crx-quick ... 
            ^
    

    Showing the input file at line 262-263 (with vim line numbers turned on):

    260 role::startup::author::rsyslog_inputs:
    261   imfile:
    262     - ruleset: 'AEM-slinglog'
    263     File: '/opt/aem/author/crx-quickstart/logs/error.log'
    264     startmsg.regex: '^[-+T.:[:digit:]]*'
    265     tag: 'error'
    266     - ruleset: 'AEM-slinglog'
    267     File: '/opt/aem/author/crx-quickstart/logs/stdout.log'
    268     startmsg.regex: '^[-+T.:[:digit:]]*'
    269     tag: 'stdout'
    
  2. Ruamel/Anthon van der Neut repo owner

    The output is indeed not valid YAML. What were the round-trip-parameters you used. I.e. give me the minimal program that you used to read and write (so I don't have to guess the arguments to round_trip_load/round_trip_dump)

  3. Ruamel/Anthon van der Neut repo owner

    as is indicated here it is a problem if you have block_seq_indent set to the same level as indent. Your "sample" output is inconsistently indented (2 for mappings, 4 for sequences), ruamel.yaml only has one value (which in this case should be 4 to allow for the block sequence indent.

    It is a design decision to "normalize" the indentation and some other things, so try

    role::startup::author::rsyslog_inputs:
        imfile:
          - ruleset: 'AEM-slinglog'
            File: '/opt/aem/author/crx-quickstart/logs/error.log'
            startmsg.regex: '^[-+T.:[:digit:]]*'
            tag: 'error'
          - ruleset: 'AEM-slinglog'
            File: '/opt/aem/author/crx-quickstart/logs/stdout.log'
            startmsg.regex: '^[-+T.:[:digit:]]*'
            tag: 'stdout'
    

    as input and specify indent=4, block_seq_indent=2 for round_trip_dump()

    Not warning about block_seq_indent being to big, generating incorrect output should be considered a bug, though, so I'll leave this open until fixed.

  4. Alex Harvey reporter

    That's not a fix however, because the files are supposed to be indented with 2 spaces. I need to preserve the original formatting, which is:

    role::startup::author::rsyslog_inputs:
      imfile:
        - ruleset: 'AEM-slinglog'
          File: '/opt/aem/author/crx-quickstart/logs/error.log'
          startmsg.regex: '^[-+T.:[:digit:]]*'
          tag: 'error'
        - ruleset: 'AEM-slinglog'
          File: '/opt/aem/author/crx-quickstart/logs/stdout.log'
          startmsg.regex: '^[-+T.:[:digit:]]*'
          tag: 'stdout'
    

    I think "indent" probably should not have any effect inside a block sequence?

  5. Ruamel/Anthon van der Neut repo owner

    Your input file is inconsistently indented 4 spaces for the elements of the sequence and 2 spaces for the mapping. ruamel.yaml doesn't allows you to do that (neither does PyYAML). You have to decide on what level of indentation you want first.

    If you go for 2 space indent, then you cannot "float" the dash before the elements as there is not enough space to do so (because of the space that has to follow the dash)

    Although it would be possible to have different indentation values for mappings and sequences (and even for individual collections in a YAML), the code base doesn't easily allow for that (I tried) and it is currently not a priority to support such inconsistently indented files.

  6. Ruamel/Anthon van der Neut repo owner

    As indicated ruamel.yaml normalizes the indentation to one indent value for all mapping and sequence items. This example has two, so this is not a bug in the library, it is a documented feature.

    Supporting different indents for mappings and sequences is a feature request.

  7. Ruamel/Anthon van der Neut repo owner

    After two failed attempts, the second one ending in the same position and a few hours of staring at the screen I finally realiased what caused me not to be able to solve the issue: an extra indent level already on the "stack" caused by the mapping with keys ruleset etc.

    Once I had seen that I reimplemented part of the new API to allow for three values to be passed in:

    import sys
    import ruamel.yaml
    
    yaml_str = """\
    role::startup::author::rsyslog_inputs:
      imfile:
        - ruleset: 'AEM-slinglog'
          File: '/opt/aem/author/crx-quickstart/logs/error.log'
          startmsg.regex: '^[-+T.:[:digit:]]*'
          tag: 'error'
        - ruleset: 'AEM-slinglog'
          File: '/opt/aem/author/crx-quickstart/logs/stdout.log'
          startmsg.regex: '^[-+T.:[:digit:]]*'
          tag: 'stdout'
    """
    
    yaml = ruamel.yaml.YAML()
    yaml.preserve_quotes = True
    # yaml.indent(mapping=2, sequence=4, offset=2)  # default indent is two
    yaml.indent(sequence=4, offset=2)
    
    data = yaml.load(yaml_str)
    yaml.dump(data, sys.stdout)
    

    Gets you back your input.

    Sorry for the long delay in getting this implemented.

  8. Alex Harvey reporter

    @Ruamel/Anthon van der Neut Thanks so much for implementing the feature. I guess that means I can upgrade to the latest Ruamel and revert the ugly hack I had in place for this. You refer to a "new API" and syntax is changed. I guess I would need to do a bit of refactoring to use the new API?

  9. Ruamel/Anthon van der Neut repo owner

    You only need the new API for writing out, you could leave reading as it is for now. If you need help with that let me know (is your source online somewhere?)

    I am not aware of anything you could do with the old API that is no longer working, even adding representers for your own objects and subclassing YAMLObject, should still work in combination with the new yaml = YAML(); yaml.dump().

    It is even possible to do the above with the old API, but that requires subclassing the Emitter and RoundTripDumper, so you have an instance of the former at some point on which to set the values for the attributes best_map_indent`,best_sequence_indentandsequence_dash_offset`` .

    This gives some background information. Just adding a new parameter (as I did for block_seq_indent and some others) just required too many changes, esentially making experimenting with things like the solution for the above cumbersome, as everything had to be in place before I could use it.

    With the new API I could just add attributes (like best_sequence_indent) to the Emitter instance (in its __init__.py), and use it in its methods. Changing them from the calling program/test could be done with a relatively simple:

    yam = YAML()
    yaml.emitter.best_sequence_indent = 4
    .... rest of the program
    

    And only once that was working I would have to change main.py to change yaml.indent() the functionality that it has, without ever having to touch dumper.py and cyaml.py. ;-)

  10. Log in to comment