Commits

Philippe Lagadec committed 318cc0a

updated readme, moved tools details to the documentation wiki

Comments (0)

Files changed (2)

 
 [python-oletools](http://www.decalage.info/python/oletools) is a package of python tools to analyze [Microsoft OLE2 files (also called Structured Storage, Compound File Binary Format or Compound Document File Format)](http://en.wikipedia.org/wiki/Compound_File_Binary_Format), such as Microsoft Office documents or Outlook messages, mainly for malware analysis and debugging. It is based on the [OleFileIO_PL](http://www.decalage.info/python/olefileio) parser. See [http://www.decalage.info/python/oletools](http://www.decalage.info/python/oletools) for more info.  
 
+**Quick links:** [Home page](http://www.decalage.info/python/oletools) - [Download](https://bitbucket.org/decalage/oletools/downloads) - [Documentation](https://bitbucket.org/decalage/oletools/wiki) - [Report issues](https://bitbucket.org/decalage/oletools/issues?status=new&status=open) - [Contact the author](http://decalage.info/contact) - [Repository](https://bitbucket.org/decalage/oletools) - [Updates on Twitter](https://twitter.com/decalage2)
+
 Note: python-oletools is not related to OLETools published by BeCubed Software.
 
 Tools in python-oletools:
 -------------------------
 
-- **olebrowse**: A simple GUI to browse OLE files (e.g. MS Word, Excel, Powerpoint documents), to
+- **[olebrowse](https://bitbucket.org/decalage/oletools/wiki/olebrowse)**: A simple GUI to browse OLE files (e.g. MS Word, Excel, Powerpoint documents), to
   view and extract individual data streams.
-- **oleid**: a tool to analyze OLE files to detect specific characteristics that could potentially indicate that the file is suspicious or malicious.
-- **olemeta**: a tool to extract all standard properties (metadata) from OLE files.
-- **oletimes**: a tool to extract creation and modification timestamps of all streams and storages.
-- **olevba (new)**: a tool to extract VBA Macro source code from MS Office documents (OLE and OpenXML).
-- **pyxswf**: a tool to detect, extract and analyze Flash objects (SWF) that may
+- **[oleid](https://bitbucket.org/decalage/oletools/wiki/oleid)**: a tool to analyze OLE files to detect specific characteristics that could potentially indicate that the file is suspicious or malicious.
+- **[olemeta](https://bitbucket.org/decalage/oletools/wiki/olemeta)**: a tool to extract all standard properties (metadata) from OLE files.
+- **[oletimes](https://bitbucket.org/decalage/oletools/wiki/oletimes)**: a tool to extract creation and modification timestamps of all streams and storages.
+- **[olevba](https://bitbucket.org/decalage/oletools/wiki/olevba) (new)**: a tool to extract VBA Macro source code from MS Office documents (OLE and OpenXML).
+- **[pyxswf](https://bitbucket.org/decalage/oletools/wiki/pyxswf)**: a tool to detect, extract and analyze Flash objects (SWF) that may
   be embedded in files such as MS Office documents (e.g. Word, Excel) and RTF,
   which is especially useful for malware analysis.
-- **rtfobj**: a tool and python module to extract embedded objects from RTF files.
+- **[rtfobj](https://bitbucket.org/decalage/oletools/wiki/rtfobj)**: a tool and python module to extract embedded objects from RTF files.
 - and a few others (coming soon)
 
 News
 ----
 
-- 2014-08-15 v0.06alpha: added olevba, a new tool to extract VBA Macro source code from MS Office documents (OLE and OpenXML)
-- 2013-07-24 v0.05: added new tools olemeta and oletimes
-- 2013-04-18 v0.04: fixed bug in rtfobj, added documentation for rtfobj
-- 2012-11-09 v0.03: Improved pyxswf to extract Flash objects from RTF
-- 2012-10-29 v0.02: Added oleid
-- 2012-10-09 v0.01: Initial version of olebrowse and pyxswf
+- **2014-08-16 v0.06**: added [olevba](https://bitbucket.org/decalage/oletools/wiki/olevba), a new tool to extract VBA Macro source code from MS Office documents (97-2003 and 2007+). Improved [documentation](https://bitbucket.org/decalage/oletools/wiki)
+- 2013-07-24 v0.05: added new tools [olemeta](https://bitbucket.org/decalage/oletools/wiki/olemeta) and [oletimes](https://bitbucket.org/decalage/oletools/wiki/oletimes)
+- 2013-04-18 v0.04: fixed bug in rtfobj, added documentation for [rtfobj](https://bitbucket.org/decalage/oletools/wiki/rtfobj)
+- 2012-11-09 v0.03: Improved [pyxswf](https://bitbucket.org/decalage/oletools/wiki/pyxswf) to extract Flash objects from RTF
+- 2012-10-29 v0.02: Added [oleid](https://bitbucket.org/decalage/oletools/wiki/oleid)
+- 2012-10-09 v0.01: Initial version of [olebrowse](https://bitbucket.org/decalage/oletools/wiki/olebrowse) and pyxswf
 - see changelog in source code for more info.
 
-Download:
----------
+Download and Install:
+---------------------
 
-The archive is available on [the project page](https://bitbucket.org/decalage/oletools/downloads).
+To use python-oletools from the command line as analysis tools, you may simply [download the zip archive](https://bitbucket.org/decalage/oletools/downloads) and extract the files in the directory of your choice.
 
-
-olebrowse:
-----------
-
-A simple GUI to browse OLE files (e.g. MS Word, Excel, Powerpoint documents), to
-view and extract individual data streams.
-
-	Usage: olebrowse.py [file]
-
-If you provide a file it will be opened, else a dialog will allow you to browse folders to open a file. Then if it is a valid OLE file, the list of data streams will be displayed. You can select a stream, and then either view its content in a builtin hexadecimal viewer, or save it to a file for further analysis.
-
-For screenshots and other info, see [http://www.decalage.info/python/olebrowse](http://www.decalage.info/python/olebrowse)
-
-oleid:
-------
-
-oleid is a script to analyze OLE files such as MS Office documents (e.g. Word,
-Excel), to detect specific characteristics that could potentially indicate that
-the file is suspicious or malicious, in terms of security (e.g. malware).
-For example it can detect VBA macros, embedded Flash objects, fragmentation.
-
-	Usage: oleid.py <file>
-
-Example - analyzing a Word document containing a Flash object and VBA macros:
-
-	C:\oletools>oleid.py word_flash_vba.doc
-	Filename: word_flash_vba.doc
-	OLE format: True
-	Has SummaryInformation stream: True
-	Application name: Microsoft Office Word
-	Encrypted: False
-	Word Document: True
-	VBA Macros: True
-	Excel Workbook: False
-	PowerPoint Presentation: False
-	Visio Drawing: False
-	ObjectPool: True
-	Flash objects: 1
-	
-oleid project website: [http://www.decalage.info/python/oleid](http://www.decalage.info/python/oleid)
-
-
-pyxswf:
---------
-
-pyxswf is a script to detect, extract and analyze Flash objects (SWF files) that may
-be embedded in files such as MS Office documents (e.g. Word, Excel),
-which is especially useful for malware analysis.
-
-pyxswf is an extension to [xxxswf.py](http://hooked-on-mnemonics.blogspot.nl/2011/12/xxxswfpy.html) published by Alexander Hanel.
-
-Compared to xxxswf, it can extract streams from MS Office documents by parsing
-their OLE structure properly, which is necessary when streams are fragmented.
-Stream fragmentation is a known obfuscation technique, as explained on
-[http://www.breakingpointsystems.com/resources/blog/evasion-with-ole2-fragmentation/](http://www.breakingpointsystems.com/resources/blog/evasion-with-ole2-fragmentation/)
-
-It can also extract Flash objects from RTF documents, by parsing embedded objects encoded in hexadecimal format (-f option).
-
-
-For this, simply add the -o option to work on OLE streams rather than raw files, or the -f option to work on RTF files.
-
-	Usage: pyxswf.py [options] <file.bad>
-	
-	Options:
-	  -o, --ole             Parse an OLE file (e.g. Word, Excel) to look for SWF
-	                        in each stream
-	  -f, --rtf             Parse an RTF file to look for SWF in each embedded
-	                        object
-	  -x, --extract         Extracts the embedded SWF(s), names it MD5HASH.swf &
-	                        saves it in the working dir. No addition args needed
-	  -h, --help            show this help message and exit
-	  -y, --yara            Scans the SWF(s) with yara. If the SWF(s) is
-	                        compressed it will be deflated. No addition args
-	                        needed
-	  -s, --md5scan         Scans the SWF(s) for MD5 signatures. Please see func
-	                        checkMD5 to define hashes. No addition args needed
-	  -H, --header          Displays the SWFs file header. No addition args needed
-	  -d, --decompress      Deflates compressed SWFS(s)
-	  -r PATH, --recdir=PATH
-	                        Will recursively scan a directory for files that
-	                        contain SWFs. Must provide path in quotes
-	  -c, --compress        Compresses the SWF using Zlib
-	
-Example 1 - detecting and extracting a SWF file from a Word document on Windows:
-
-	C:\oletools>pyxswf.py -o word_flash.doc
-	OLE stream: 'Contents'
-	[SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents
-	        [ADDR] SWF 1 at 0x8  - FWS Header
-	
-	C:\oletools>pyxswf.py -xo word_flash.doc
-	OLE stream: 'Contents'
-	[SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents
-	        [ADDR] SWF 1 at 0x8  - FWS Header
-	                [FILE] Carved SWF MD5: 2498e9c0701dc0e461ab4358f9102bc5.swf
-
-Example 2 - detecting and extracting a SWF file from a RTF document on Windows:
-
-	C:\oletools>pyxswf.py -xf "rtf_flash.rtf"
-	RTF embedded object size 1498557 at index 000036DD
-	[SUMMARY] 1 SWF(s) in MD5:46a110548007e04f4043785ac4184558:RTF_embedded_object_0
-	00036DD
-	        [ADDR] SWF 1 at 0xc40  - FWS Header
-	                [FILE] Carved SWF MD5: 2498e9c0701dc0e461ab4358f9102bc5.swf
-		
-For more info, see [http://www.decalage.info/python/pyxswf](http://www.decalage.info/python/pyxswf)
-
-
-rtfobj
-------
-
-rtfobj is a Python module to extract embedded objects from RTF files, such as
-OLE ojects. It can be used as a Python library or a command-line tool.
-
-	Usage: rtfobj.py <file.rtf>
-
-It extracts and decodes all the data blocks encoded as hexadecimal in the RTF document, and saves them as files named "object_xxxx.bin", xxxx being the location of the object in the RTF file.
-
-Usage as python module: rtf_iter_objects(filename) is an iterator which yields a tuple (index, object) providing the index of each hexadecimal stream in the RTF file, and the corresponding decoded object. Example:
-
-	import rtfobj    
-	for index, data in rtfobj.rtf_iter_objects("myfile.rtf"):
-        print 'found object size %d at index %08X' % (len(data), index)
-
-
-For more info, see [http://www.decalage.info/python/rtfobj](http://www.decalage.info/python/rtfobj)
-
+If you plan to use python-oletools with other Python applications or your own scripts, then the simplest solution is to use "**easy_install oletools**" or "**pip install oletools**" to download and install in one go. Otherwise you may download the zip archive and run "**setup.py install**". 
 
 How to contribute:
 ------------------
 
-The code is available in [a Mercurial repository on bitbucket](https://bitbucket.org/decalage/oletools). You may use it to submit enhancements or to report any issue.
+The code is available in [a Mercurial repository on bitbucket](https://bitbucket.org/decalage/oletools). You may use it to submit enhancements (using fork and pull requests) or to report any issue.
 
-If you would like to help us improve this module, or simply provide feedback, you may also send an e-mail to decalage(at)laposte.net. 
+If you would like to help us improve this module, or simply provide feedback, you may also [contact the author](http://decalage.info/contact). 
 
-How to report bugs:
--------------------
+How to suggest improvements or report bugs:
+-------------------------------------------
 
-To report a bug or any issue, please use the [issue reporting page](https://bitbucket.org/decalage/olefileio_pl/issues?status=new&status=open), or send an e-mail with all the information and files to reproduce the problem. 
+To suggest improvements, report a bug or any issue, please use the [issue reporting page](https://bitbucket.org/decalage/olefileio_pl/issues?status=new&status=open), providing all the information and files to reproduce the problem. You may also [contact the author](http://decalage.info/contact).
 
 License
 -------
 
 This license applies to the python-oletools package, apart from the thirdparty folder which contains third-party files published with their own license.
 
-The python-oletools package is copyright (c) 2012-2013, Philippe Lagadec (http://www.decalage.info)
+The python-oletools package is copyright (c) 2012-2014 Philippe Lagadec (http://www.decalage.info)
+
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without modification,
 OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
+
+----------
+
+olevba contains modified source code from the officeparser project, published
+under the following MIT License (MIT):
+
+officeparser is copyright (c) 2014 John William Davison
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

oletools/pyxswf.py

 
 #------------------------------------------------------------------------------
 # TODO:
+# + add support for LZMA-compressed flash files (ZWS header)
+#   references: http://blog.malwaretracker.com/2014/01/cve-2013-5331-evaded-av-by-using.html
+#   http://code.metager.de/source/xref/adobe/flash/crossbridge/tools/swf-info.py
+#   http://room32.dyndns.org/forums/showthread.php?766-SWFCompression
+#   sample code: http://room32.dyndns.org/SWFCompression.py
 # - check if file is OLE
 # - support -r