Commits

Victor Stinner committed ed703f8

Automated commit message

  • Participants
  • Parent commits 2fd0e06

Comments (0)

Files changed (1)

+[[hachoir-core|Hachoir core]] features:
+ * **Autofix**: On error, Hachoir can automatically fix errors on buggy file or parser ([[Features#Autofix|more details below]]) ;
+ * **Lazy**: Field value, size, description, absolute address, (...) are computed on demand ([[Features#Lazyparser|more details below]]) ;
+ * **No arbitrary limit**: Hachoir has no arbitrary limit: addresses can be bigger than 4 Go, an integer can be 32, 64, 128 bits or more, there is no field number limit, no depth limit, etc. ;
+ * **Types**: Hachoir has many predefined field types: integer, string, boolean, byte array, ... ;
+ * **Bit granularity**: Size and address are in bits, so it's easy to mix fields with size in bytes or in bits ;
+ * **Unicode**: String values are stored in Unicode (if string charset is specified) ;
+ * **Endian**: You don't have to care about endian: you just need to set it once for the whole parser ;
+ * **No dependency**: Hachoir just needs Python 2.4 (but some frontends need more libraries) ;
+ * **API**: Data is represented as a tree of fields where each field is a Python object ([[Features#NiceAPI|more details below]]) ;
+ * **Parser**: 70 parsers are included: JPEG picture, ZIP archive, MP3 audio, ... (see the [[hachoir-parser|parser list]]) ;
+ * **Guess parser**: An algorithm automatically chooses the right parser using file extension, given MIME type or using validate() function of each parser ;
+ * etc.
+
+== Nice API ==
+
+A file is split into a fields tree. A field set is a field too, and each field is an object. Some interesting attributes:
+ * size: size in bits
+ * value: field value
+ * description: string description
+ * address: relative address (relative to its parent's absolute address)
+ * absolute_address: address in the stream
+
+== Lazy parser ==
+
+Quite all features of Hachoir are lazy: it means that Hachoir reads/computes an information only when it is asked for or if Hachoir needs it to read/compute another information.
+
+Examples:
+ * Value and description of field are read/created on first access ;
+ * Fields of a field set are created on demand, or if Hachoir is unable to guess field set size ;
+ * etc.
+
+Because of this lazy feature, Hachoir is able to read big files like a FAT32 partition of 10 GB and create complex and deep field tree.
+
+The "secret" is the Python keyword **yield**, get more information:
+ * [[http://en.wikipedia.org/wiki/Coroutine|Coroutine]] (on Wikipedia)
+ * [[http://www.python.org/dev/peps/pep-0255/|PEP 255]]: //Simple Generators//
+ * [[http://www.python.org/dev/peps/pep-0342/|PEP 342]]: //Coroutines via Enhanced Generators//
+
+== Many ways to explore field set ==
+
+Hachoir allows exploring the field sets in different ways.
+
+Iterate on each field:
+{{{
+#!python
+for field in fieldset:
+  print "%s=%s" % (field.name, field.value)
+}}}
+
+Get a field using its path:
+{{{
+#!python
+# Get a field
+header = png[["header"]
+height|= png["height"]].value
+width = png[["/header/width"].value
+
+#|Get parents
+assert png == header.parent == header[".."]]
+grandfather = field[["../.."]
+grandfather|= field["..."]]  # alternative syntax
+assert field[["../../.."]|== field["...."]]
+}}}
+
+== Autofix ==
+
+Hachoir is able to fix (some) parser errors: errors from invalid/truncated input files or from parser code.
+ * If a field is bigger than it should be, it's truncated or removed ;
+ * On parser error, the field is removed ;
+ * When parsing is done, the field set is filled with padding if needed.
+
+Other autofixes:
+ * GenericString tries to fix truncated UTF-16-LE string (add null byte at the end)
+
+=== Too big ===
+
+{{{
+#!python
+class Header(FieldSet):
+   def createFields(self):
+      yield UInt32(self, "width")
+      yield UInt32(self, "height")
+      yield UInt32(self, "flags")
+      yield UInt32(self, "extra_header")
+
+class Parser(Parser):
+   def createFields(self):
+      ...
+      yield Header(self, "header", size=12*8)
+      yield UInt32(self, "important_value")
+      ...
+}}}
+
+Hachoir will not create integer /header/extra_header (stop at /header/flags), but /important_value is available.
+
+=== Too small ===
+
+{{{
+#!python
+class Header(FieldSet):
+   def createFields(self):
+      yield UInt32(self, "width")
+      yield UInt32(self, "height")
+
+class Parser(Parser):
+   def createFields(self):
+      ...
+      yield Header(self, "header", size=12*8)
+      yield UInt32(self, "important_value")
+      ...
+}}}
+
+Hachoir will add padding to /header/raw`[[0]`|field (4 bytes) and /important_value is available.
+
+=== Catch error ===
+
+{{{
+#!python
+class Image(FieldSet):
+   def createFields(self):
+      yield UInt32(self, "width")
+      if not(10 <= self["width"]].value <= 1000):
+         raise ParserError("Invalid with")
+      yield UInt32(self, "height")
+      if not(10 <= self[["height"].value|<= 1000):
+         raise ParserError("Invalid height")
+      yield RawBytes(self, "data", self["width"]].value * self[["height"].value)
+
+class|Parser(Parser):
+   def createFields(self):
+      ...
+      yield Image(self, "image")
+      yield UInt32(self, "important_value")
+      ...
+}}}
+
+If width is invalid, Hachoir will stop parsing just after /image/width, so that you can't read image data but at least it's size! If height is invalid, Hachoir will stop after /image/height, so that you get width and height.
+
+This example is trivial but it shows that even on error, you're able to read the beginning of a file. Errors can be:
+ * Stream error: end of stream, disk error, ...
+ * Field error: try to create empty string, invalid valid, ...
+ * Parser error: buggy parser, !ParserError raised by the parser, ...
+
+In this example, it's not possible to read /important_value. To be able to read it, you have to guess /image size and specify it as follows:
+{{{
+#!python
+class Parser(Parser):
+   def createFields(self):
+      ...
+      yield UInt32(self, "imagelen", "Image size in bytes")
+      ...
+      yield Image(self, "image", size=self["imagelen"]].value*8)
+      yield UInt32(self, "important_value")
+      ...
+}}}