Retrieve data from invalid YAML

Issue #515 invalid
Elikill58 created an issue

I’m using Yaml for a project. And sometime, the file became invalid, so It look like this:

name: "A name"
values: { key: "value"
final-name: "Something"

Here, it will obviously create an error because of parsing failed. But, I suggest you try to get data from the invalid file. In my example, get name and final-name.

How can we do ? I have multiple ideas :

  1. Read each line, and each one that is NOT valid go to garbage.
  2. When a line is invalid, we save the indentitation, and while we don’t have a new variable at this step we ignore
  3. Try to find missing char. In my previous example, it will be } before “final-name”. But this solution seems … hard.

What do you think about this ?

P.S.: I have absolutly no idea of how the file have been corrupted, but I think this can prevent issue when user edit wrongly

Comments (14)

  1. Elikill58 reporter

    I proposed 3 ways to solve it.

    Yes, bugged line is data, but I prefer lose 1 or 2 line that the full file. In the case showed in the issue post, there is only one line that will be lose. And the only way to get back the data is to find what should be basic structure

  2. Elikill58 reporter

    I’m actually trying to work on it. For my own project I made something like this:

    private static void beSureItsGoodList(File f, List<String> lines, boolean changed) throws IOException {
        try {
            yaml.get().loadAs(new StringReader(String.join("\n", lines)), LinkedHashMap.class); // try to load
            if(changed) // changing default file
                Files.write(f.toPath(), lines, StandardOpenOption.TRUNCATE_EXISTING);
        } catch (MarkedYAMLException e) {
            int line = e.getProblemMark().getLine(); // get line that is problematic
            if (lines.size() > line) { // if can found it
                String removedLine = lines.remove(line); // removing it
                System.out.println("Fixed file " + f.getName() + " by removing line " + line + ": " + removedLine);
                beSureItsGoodList(f, lines, true); // check again until everything is fine
            }
        }
    }
    

    I made a quick test, then I tried to make the code correspond. But your system for line reading is difficult to change to implement this :

    1. I tried to check before throwing error, but I can’t return comment event or something as the parser think it will receive document/mapping/block informations
    2. I don’t want to make something like me, like wait until the error is thrown and check again as it’s not optimized and it’s the same project that create and manage error…
    3. I don’t know what doing then except doing bad code, just because I don’t know what do to. Do you have suggestions ?

  3. Elikill58 reporter

    I’m trying to remove the line which break the yaml file. For example, the code I gave let me fix file like that:

    some:
      thing: "value"
      another: "value2"
         - 0.0004554
    some2:
      thing: "value"
      another: "value2"
    

    In this case, my code remove the - 0.0004554 line, save file and try again and it works fine. The issue seems to appear as the writing stream close too quickly and the save just failed silently before

  4. Log in to comment