Wiki

Clone wiki

avro_from_delimited / UseCase1

Use Case 1 - First Line Column Heads, No Schema or Data Validation

Your delimited file looks like this. You have no other schema that it needs to be validated against before proceeding. You also don't care about validating data. You'll take anything in your text file as a string.

ColOne ,colTwo, ColThree,ColFour, ColFive
898sdf,,sdf,aaaaa,89343
13534,34352,sdf,aa,89443
Your job is to convert the first row into an Avro schema, and then create an Avro file from the other rows, using this same schema.

You have not been asked to validate this against a previously used schema. See other use cases for such situations.

How to use the API

This shows how it is calling from inside the tests:

Note that, since you were creating a json object, it asks you to give the json object a name, and a namespace.

    @Test
    public void testGetUseCase1() {
        DelimToAvro delimToAvro = new DelimToAvro();
        File goodDataFile = new File("./src/test/resources/testFile1.csv");
        delimToAvro.get(goodDataFile.toURI(), new File("target/deleteme.avro"), "MyFoo",
                "com.foo.stuff", ",", false);
    }

What does it produce?

It creates a binary avro file. We can look at that later, but first you'll probably want to know what your data looks like:

json data

But's that just the data. What makes avro so helpful is that you never just have the data - it is always self contained with a schema. We generated that from the first row, above. Here is what that looks like:

json schema

Of course, the real data is binary. Shown below is the contents of this same .avro file in a text editor. No, it doesn't make sense, but it's not supposed to. That's how it acheives it's compact form.

binary

Updated