fastavro /

Filename Size Date modified Message
avro-files
fastavro
129 B
94 B
116 B
1.3 KB
701 B
93 B
742 B
167 B
322 B
1.6 KB

fastavro

The current Python avro package is packed with features but dog slow.

On a test case of about 10K records, it takes about 14sec to iterate over all of them. In comparison the JAVA avro SDK does it in about 1.9sec.

fastavro is less feature complete than avro, however it's much faster. It iterates over the same 10K records in 2.9sec, and if you use it with PyPy it'll do it in 1.5sec (to be fair, the JAVA benchmark is doing some extra JSON encoding/decoding).

If you have Cython installed, then fastavro will be even faster. For the same 10K records it'll run in about 1.7sec.

Usage

from fastavro import iter_avro

with open('some-file.avro', 'rb') as fo:
    avro = iter_avro(fo)
    schema = avro.schema

    for record in avro:
        process_record(record)

You can also use the fastavro module from the command line to dump avro files. Each record will be dumped to standard output in one line of JSON.

python -m fastavro weather.avro

Limitations

  • Support only iteration
    • No writing for you!
  • Supports only null and deflate codecs
    • avro also supports snappy
  • No reader schema

Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.