Commits

Philippe Lagadec committed 342738a

oletools v0.04: Fixed bug in rtfobj, added documentation for rtfobj

  • Participants
  • Parent commits a5bba47

Comments (0)

Files changed (3)

 - **pyxswf**: a tool to detect, extract and analyze Flash objects (SWF) that may
   be embedded in files such as MS Office documents (e.g. Word, Excel) and RTF,
   which is especially useful for malware analysis.
+- **rtfobj**: a tool and python module to extract embedded objects from RTF files.
 - and a few others (coming soon)
 
 News
 ----
 
+- 2013-04-18 v0.04: fixed bug in rtfobj, added documentation for rtfobj
 - 2012-11-09 v0.03: Improved pyxswf to extract Flash objects from RTF
 - 2012-10-29 v0.02: Added oleid
 - 2012-10-09 v0.01: Initial version of olebrowse and pyxswf
 For more info, see [http://www.decalage.info/python/pyxswf](http://www.decalage.info/python/pyxswf)
 
 
+rtfobj
+------
+
+rtfobj is a Python module to extract embedded objects from RTF files, such as
+OLE ojects. It can be used as a Python library or a command-line tool.
+
+	Usage: rtfobj.py <file.rtf>
+
+It extracts and decodes all the data blocks encoded as hexadecimal in the RTF document, and saves them as files named "object_xxxx.bin", xxxx being the location of the object in the RTF file.
+
+Usage as python module: rtf_iter_objects(filename) is an iterator which yields a tuple (index, object) providing the index of each hexadecimal stream in the RTF file, and the corresponding decoded object. Example:
+
+	import rtfobj    
+	for index, data in rtfobj.rtf_iter_objects("myfile.rtf"):
+        print 'found object size %d at index %08X' % (len(data), index)
+
+
+For more info, see [http://www.decalage.info/python/rtfobj](http://www.decalage.info/python/rtfobj)
+
+
 How to contribute:
 ------------------
 
 
 This license applies to the python-oletools package, apart from the thirdparty folder which contains third-party files published with their own license.
 
-The python-oletools package is copyright (c) 2012, Philippe Lagadec (http://www.decalage.info)
+The python-oletools package is copyright (c) 2012-2013, Philippe Lagadec (http://www.decalage.info)
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without modification,

oletools/README.txt

 -  **pyxswf**: a tool to detect, extract and analyze Flash objects (SWF)
    that may be embedded in files such as MS Office documents (e.g. Word,
    Excel) and RTF, which is especially useful for malware analysis.
+-  **rtfobj**: a tool and python module to extract embedded objects from
+   RTF files.
 -  and a few others (coming soon)
 
 News
 ----
 
+-  2013-04-18 v0.04: fixed bug in rtfobj, added documentation for rtfobj
 -  2012-11-09 v0.03: Improved pyxswf to extract Flash objects from RTF
 -  2012-10-29 v0.02: Added oleid
 -  2012-10-09 v0.01: Initial version of olebrowse and pyxswf
 For more info, see
 `http://www.decalage.info/python/pyxswf <http://www.decalage.info/python/pyxswf>`_
 
+rtfobj
+------
+
+rtfobj is a Python module to extract embedded objects from RTF files,
+such as OLE ojects. It can be used as a Python library or a command-line
+tool.
+
+::
+
+    Usage: rtfobj.py <file.rtf>
+
+It extracts and decodes all the data blocks encoded as hexadecimal in
+the RTF document, and saves them as files named "object\_xxxx.bin", xxxx
+being the location of the object in the RTF file.
+
+Usage as python module: rtf\_iter\_objects(filename) is an iterator
+which yields a tuple (index, object) providing the index of each
+hexadecimal stream in the RTF file, and the corresponding decoded
+object. Example:
+
+::
+
+    import rtfobj    
+    for index, data in rtfobj.rtf_iter_objects("myfile.rtf"):
+        print 'found object size %d at index %08X' % (len(data), index)
+
+For more info, see
+`http://www.decalage.info/python/rtfobj <http://www.decalage.info/python/rtfobj>`_
+
 How to contribute:
 ------------------
 
 thirdparty folder which contains third-party files published with their
 own license.
 
-The python-oletools package is copyright (c) 2012, Philippe Lagadec
+The python-oletools package is copyright (c) 2012-2013, Philippe Lagadec
 (http://www.decalage.info) All rights reserved.
 
 Redistribution and use in source and binary forms, with or without

oletools/rtfobj.py

 #!/usr/bin/env python
 """
-rtfobj.py - Philippe Lagadec 2012-11-09
+rtfobj.py - Philippe Lagadec 2013-04-02
 
 rtfobj is a Python module to extract embedded objects from RTF files, such as
 OLE ojects. It can be used as a Python library or a command-line tool.
 rtfobj is part of the python-oletools package:
 http://www.decalage.info/python/oletools
 
-rtfobj is copyright (c) 2012, Philippe Lagadec (http://www.decalage.info)
+rtfobj is copyright (c) 2012-2013, Philippe Lagadec (http://www.decalage.info)
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without modification,
 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 """
 
-__version__ = '0.01'
+__version__ = '0.02'
 
 #------------------------------------------------------------------------------
 # CHANGELOG:
 # 2012-11-09 v0.01 PL: - first version
+# 2013-04-02 v0.02 PL: - fixed bug in main
 
 #------------------------------------------------------------------------------
 # TODO:
 # - improve regex pattern for better performance?
+# - allow semicolon within hex, as found in  this sample:
+#   http://contagiodump.blogspot.nl/2011/10/sep-28-cve-2010-3333-manuscript-with.html
 
 import re, sys, string, binascii
 
 # several hex chars, at least 4: (?:[0-9A-Fa-f]{2}){4,}
 # at least 4 hex chars, followed by whitespace or CR/LF: (?:[0-9A-Fa-f]{2}){4,}\s*
 PATTERN = r'(?:(?:[0-9A-Fa-f]{2})+\s*)*(?:[0-9A-Fa-f]{2}){4,}'
+# improved pattern, allowing semicolons within hex:
+#PATTERN = r'(?:(?:[0-9A-Fa-f]{2})+\s*)*(?:[0-9A-Fa-f]{2}){4,}'
 
 # a dummy translation table for str.translate, which does not change anythying:
 TRANSTABLE_NOCHANGE = string.maketrans('', '')
             yield m.start(), found
 
 if __name__ == '__main__':
-    if len(sys.argv<2):
+    if len(sys.argv)<2:
         sys.exit(__doc__)
     for index, data in rtf_iter_objects(sys.argv[1]):
         print 'found object size %d at index %08X' % (len(data), index)