1. Philippe Lagadec
  2. OleFileIO_PL
  3. Issues


Issue #19 closed

Getting offset of some entry

Anonymous created an issue


I have been taking a look to OleFileIO_PL for using it for a project. However, I noticed it doesn't return (or I cannot find how) via any property or method the offset were an element is on file. It's possible to retrieve the size but not the actual offset were something is. For example:

ole.listdir() [['\x01CompObj'], ['\x05DocumentSummaryInformation'], ['\x05SummaryInformation'], ['1Table'], ['Data'], ['WordDocument']] ole.get_size('1Table') 9661

For the entry '1Table', we can get the size but I cannot find a way to retrieve the offset were the entry's size is.

Am I missing something perhaps?

Comments (6)

  1. Philippe Lagadec repo owner

    Technically you can easily find the offset of the first sector of a stream in the file (I can provide sample code later if you need it). But an OLE file is like a file system, it can be fragmented. So a stream can be stored in non contiguous sectors. In most real-life cases, streams appear as contiguous blocks of data, but it will not work all the time.

    Moreover, streams smaller than 4KB are stored in the MiniFAT, larger streams in the main FAT. Both have different structures.

    Bottom line: if you need to extract stream data, it is better to use the normal API with openstream (see the documentation). Otherwise, please explain what you need to do and I'll see if there is a solution.

  2. Joxean Koret


    Original poster here. I want to use it for half-intelligent fuzzing: instead of fuzzing (mutating) the raw binary without considering the structure I would like to fuzz specific fields, streams, content's of some streams, etc... This is why I want to have the offset of any entry in the container + the size (considering it's not fragmented, as you say...).

    Another possible solution could be to enhance OleFileIO_PL to support writing OLE2 files too. I know it's planned, but I'm afraid it will take very long.


  3. Philippe Lagadec repo owner

    OK, you can try something like this for streams larger than 4KB:

    import OleFileIO_PL
    ole = OleFileIO_PL.OleFileIO("test.doc")
    for stream in ole.listdir():
        sid = ole._find(stream)
        direntry = ole.direntries[sid]
        size = direntry.size
        sect_start = direntry.isectStart
        if size >= ole.minisectorcutoff:
            offset = ole.sectorsize * (sect_start+1)
            print "%s: sid=%d, size=%d, sect_start=%d, offset=%X" % (
                repr('/'.join(stream)), sid, size, sect_start, offset)
            print "%s: sid=%d, size=%d, sect_start=%d, located in MiniFAT" % (
                repr('/'.join(stream)), sid, size, sect_start)

    As for the support for writing OLE2 files I am currently working on it: see issue #6. Full writing support will not be finished soon, but at least it should be possible to overwrite streams with data of same size in the near future. (which should be useful for your fuzzing, I guess)

  4. Philippe Lagadec repo owner

    The latest version 0.32 can now overwrite streams (>4K) with data of same size, see the OleFileIO.write_stream() method. (use write_mode=True when opening the file) For example:

    ole = OleFileIO_PL.OleFileIO('test.doc', write_mode=True)
    data = ole.openstream('WordDocument').read()
    data = data.replace(b'foo', b'bar')
    ole.write_stream('WordDocument', data)
  5. Log in to comment