Commits

Lars Yencken committed 40ec222

Fixes a typo

  • Participants
  • Parent commits 88f37ad

Comments (0)

Files changed (1)

 
 Shorter, and easier. Now we only use as much memory as each {{{<doc>...</doc>}}} requires, meaning that you're only cpu-bound, not memory-bound. This means that vastly larger files are now practical to work with. Notice that it also used the filename to detect the bz2 compression and transparently decompressed the file on the fly.
 
-Aside: for truly large data sets, I hate XML, and suggest YAML instead. YAML has the concept of many-documents-per-file built-in, so you can iterate over documents without the fancy parsing hacks which {{{iterxml}}} has to resort to. Its data is also typed, which can safe time and code when deserializing. Check out [[http://pyyaml.org/|pyyaml]], and be sure to compile it with libyaml bindings.
+Aside: for truly large data sets, I hate XML, and suggest YAML instead. YAML has the concept of many-documents-per-file built-in, so you can iterate over documents without the fancy parsing hacks which {{{iterxml}}} has to resort to. Its data is also typed, which can save time and code when deserializing. Check out [[http://pyyaml.org/|pyyaml]], and be sure to compile it with libyaml bindings.