Anonymous avatar Anonymous committed a5bba47

pyxswf v0.02: added extraction from RTF embedded objects, with new rtfobj module

Comments (0)

Files changed (4)

   view and extract individual data streams.
 - **oleid**: a tool to analyze OLE files to detect specific characteristics that could potentially indicate that the file is suspicious or malicious.
 - **pyxswf**: a tool to detect, extract and analyze Flash objects (SWF) that may
-  be embedded in files such as MS Office documents (e.g. Word, Excel),
+  be embedded in files such as MS Office documents (e.g. Word, Excel) and RTF,
   which is especially useful for malware analysis.
 - and a few others (coming soon)
 
 News
 ----
 
+- 2012-11-09 v0.03: Improved pyxswf to extract Flash objects from RTF
 - 2012-10-29 v0.02: Added oleid
 - 2012-10-09 v0.01: Initial version of olebrowse and pyxswf
 - see changelog in source code for more info.
 Stream fragmentation is a known obfuscation technique, as explained on
 [http://www.breakingpointsystems.com/resources/blog/evasion-with-ole2-fragmentation/](http://www.breakingpointsystems.com/resources/blog/evasion-with-ole2-fragmentation/)
 
-For this, simply add the -o option to work on OLE streams rather than raw files.
+It can also extract Flash objects from RTF documents, by parsing embedded objects encoded in hexadecimal format (-f option).
+
+
+For this, simply add the -o option to work on OLE streams rather than raw files, or the -f option to work on RTF files.
 
 	Usage: pyxswf.py [options] <file.bad>
 	
 	Options:
 	  -o, --ole             Parse an OLE file (e.g. Word, Excel) to look for SWF
 	                        in each stream
+	  -f, --rtf             Parse an RTF file to look for SWF in each embedded
+	                        object
 	  -x, --extract         Extracts the embedded SWF(s), names it MD5HASH.swf &
 	                        saves it in the working dir. No addition args needed
 	  -h, --help            show this help message and exit
 	                        contain SWFs. Must provide path in quotes
 	  -c, --compress        Compresses the SWF using Zlib
 	
-Example - detecting and extracting a SWF file from a Word document on Windows:
+Example 1 - detecting and extracting a SWF file from a Word document on Windows:
 
 	C:\oletools>pyxswf.py -o word_flash.doc
 	OLE stream: 'Contents'
 	[SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents
 	        [ADDR] SWF 1 at 0x8  - FWS Header
 	                [FILE] Carved SWF MD5: 2498e9c0701dc0e461ab4358f9102bc5.swf
-	
+
+Example 2 - detecting and extracting a SWF file from a RTF document on Windows:
+
+	C:\oletools>pyxswf.py -xf "rtf_flash.rtf"
+	RTF embedded object size 1498557 at index 000036DD
+	[SUMMARY] 1 SWF(s) in MD5:46a110548007e04f4043785ac4184558:RTF_embedded_object_0
+	00036DD
+	        [ADDR] SWF 1 at 0xc40  - FWS Header
+	                [FILE] Carved SWF MD5: 2498e9c0701dc0e461ab4358f9102bc5.swf
+		
 For more info, see [http://www.decalage.info/python/pyxswf](http://www.decalage.info/python/pyxswf)
 
 

oletools/README.txt

    suspicious or malicious.
 -  **pyxswf**: a tool to detect, extract and analyze Flash objects (SWF)
    that may be embedded in files such as MS Office documents (e.g. Word,
-   Excel), which is especially useful for malware analysis.
+   Excel) and RTF, which is especially useful for malware analysis.
 -  and a few others (coming soon)
 
 News
 ----
 
+-  2012-11-09 v0.03: Improved pyxswf to extract Flash objects from RTF
 -  2012-10-29 v0.02: Added oleid
 -  2012-10-09 v0.01: Initial version of olebrowse and pyxswf
 -  see changelog in source code for more info.
 as explained on
 `http://www.breakingpointsystems.com/resources/blog/evasion-with-ole2-fragmentation/ <http://www.breakingpointsystems.com/resources/blog/evasion-with-ole2-fragmentation/>`_
 
+It can also extract Flash objects from RTF documents, by parsing
+embedded objects encoded in hexadecimal format (-f option).
+
 For this, simply add the -o option to work on OLE streams rather than
-raw files.
+raw files, or the -f option to work on RTF files.
 
 ::
 
     Options:
       -o, --ole             Parse an OLE file (e.g. Word, Excel) to look for SWF
                             in each stream
+      -f, --rtf             Parse an RTF file to look for SWF in each embedded
+                            object
       -x, --extract         Extracts the embedded SWF(s), names it MD5HASH.swf &
                             saves it in the working dir. No addition args needed
       -h, --help            show this help message and exit
                             contain SWFs. Must provide path in quotes
       -c, --compress        Compresses the SWF using Zlib
 
-Example - detecting and extracting a SWF file from a Word document on
+Example 1 - detecting and extracting a SWF file from a Word document on
 Windows:
 
 ::
             [ADDR] SWF 1 at 0x8  - FWS Header
                     [FILE] Carved SWF MD5: 2498e9c0701dc0e461ab4358f9102bc5.swf
 
+Example 2 - detecting and extracting a SWF file from a RTF document on
+Windows:
+
+::
+
+    C:\oletools>pyxswf.py -xf "rtf_flash.rtf"
+    RTF embedded object size 1498557 at index 000036DD
+    [SUMMARY] 1 SWF(s) in MD5:46a110548007e04f4043785ac4184558:RTF_embedded_object_0
+    00036DD
+            [ADDR] SWF 1 at 0xc40  - FWS Header
+                    [FILE] Carved SWF MD5: 2498e9c0701dc0e461ab4358f9102bc5.swf
+
 For more info, see
 `http://www.decalage.info/python/pyxswf <http://www.decalage.info/python/pyxswf>`_
 

oletools/pyxswf.py

 #!/usr/bin/env python
 """
-pyxswf.py - Philippe Lagadec 2012-09-17
+pyxswf.py
 
 pyxswf is a script to detect, extract and analyze Flash objects (SWF) that may
 be embedded in files such as MS Office documents (e.g. Word, Excel),
 which is especially useful for malware analysis.
+
 pyxswf is an extension to xxxswf.py published by Alexander Hanel on
 http://hooked-on-mnemonics.blogspot.nl/2011/12/xxxswfpy.html
 Compared to xxxswf, it can extract streams from MS Office documents by parsing
-their OLE structure properly, which is necessary when streams are fragmented.
+their OLE structure properly (-o option), which is necessary when streams are
+fragmented.
 Stream fragmentation is a known obfuscation technique, as explained on
 http://www.breakingpointsystems.com/resources/blog/evasion-with-ole2-fragmentation/
 
+It can also extract Flash objects from RTF documents, by parsing embedded
+objects encoded in hexadecimal format (-f option).
+
 pyxswf project website: http://www.decalage.info/python/pyxswf
 
 pyxswf is part of the python-oletools package:
 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 """
 
-__version__ = '0.01'
+__version__ = '0.02'
 
 #------------------------------------------------------------------------------
 # CHANGELOG:
 # 2012-09-17 v0.01 PL: - first version
+# 2012-11-09 v0.02 PL: - added RTF embedded objects extraction
 
 #------------------------------------------------------------------------------
 # TODO:
 # - check if file is OLE
 # - support -r
 
-import optparse, sys, os
+import optparse, sys, os, rtfobj, StringIO
 from thirdparty.xxxswf import xxxswf
 from thirdparty.OleFileIO_PL import OleFileIO_PL
 
     parser.add_option('-c', '--compress', action='store_true', dest='compress', help='Compresses the SWF using Zlib')
 
     parser.add_option('-o', '--ole', action='store_true', dest='ole', help='Parse an OLE file (e.g. Word, Excel) to look for SWF in each stream')
+    parser.add_option('-f', '--rtf', action='store_true', dest='rtf', help='Parse an RTF file to look for SWF in each embedded object')
 
 
     (options, args) = parser.parse_args()
         parser.print_help()
         return
 
+    # OLE MODE:
     if options.ole:
         for filename in args:
             ole = OleFileIO_PL.OleFileIO(filename)
                         xxxswf.disneyland(f, direntry.name, options)
                     f.close()
             ole.close()
+
+    # RTF MODE:
+    elif options.rtf:
+        for filename in args:
+            for index, data in rtfobj.rtf_iter_objects(filename):
+                if 'FWS' in data or 'CWS' in data:
+                    print 'RTF embedded object size %d at index %08X' % (len(data), index)
+                    f = StringIO.StringIO(data)
+                    name = 'RTF_embedded_object_%08X' % index
+                    # call xxxswf to scan or extract Flash files:
+                    xxxswf.disneyland(f, name, options)
+
     else:
         xxxswf.main()
 

oletools/rtfobj.py

+#!/usr/bin/env python
+"""
+rtfobj.py - Philippe Lagadec 2012-11-09
+
+rtfobj is a Python module to extract embedded objects from RTF files, such as
+OLE ojects. It can be used as a Python library or a command-line tool.
+
+Usage: rtfobj.py <file.rtf>
+
+rtfobj project website: http://www.decalage.info/python/rtfobj
+
+rtfobj is part of the python-oletools package:
+http://www.decalage.info/python/oletools
+
+rtfobj is copyright (c) 2012, Philippe Lagadec (http://www.decalage.info)
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+"""
+
+__version__ = '0.01'
+
+#------------------------------------------------------------------------------
+# CHANGELOG:
+# 2012-11-09 v0.01 PL: - first version
+
+#------------------------------------------------------------------------------
+# TODO:
+# - improve regex pattern for better performance?
+
+import re, sys, string, binascii
+
+# REGEX pattern to extract embedded OLE objects in hexadecimal format:
+# alphanum digit: [0-9A-Fa-f]
+# hex char = two alphanum digits: [0-9A-Fa-f]{2}
+# several hex chars, at least 4: (?:[0-9A-Fa-f]{2}){4,}
+# at least 4 hex chars, followed by whitespace or CR/LF: (?:[0-9A-Fa-f]{2}){4,}\s*
+PATTERN = r'(?:(?:[0-9A-Fa-f]{2})+\s*)*(?:[0-9A-Fa-f]{2}){4,}'
+
+# a dummy translation table for str.translate, which does not change anythying:
+TRANSTABLE_NOCHANGE = string.maketrans('', '')
+
+
+def rtf_iter_objects (filename, min_size=32):
+    """
+    Open a RTF file, extract each embedded object encoded in hexadecimal of
+    size > min_size, yield the index of the object in the RTF file and its data
+    in binary format.
+    This is an iterator.
+    """
+    data = open(filename, 'rb').read()
+    for m in re.finditer(PATTERN, data):
+        found = m.group(0)
+        # remove all whitespace and line feeds:
+        #NOTE: with Python 2.6+, we could use None instead of TRANSTABLE_NOCHANGE
+        found = found.translate(TRANSTABLE_NOCHANGE, ' \t\r\n\f\v')
+        found = binascii.unhexlify(found)
+        #print repr(found)
+        if len(found)>min_size:
+            yield m.start(), found
+
+if __name__ == '__main__':
+    if len(sys.argv<2):
+        sys.exit(__doc__)
+    for index, data in rtf_iter_objects(sys.argv[1]):
+        print 'found object size %d at index %08X' % (len(data), index)
+        fname = 'object_%08X.bin' % index
+        print 'saving to file %s' % fname
+        open(fname, 'wb').write(data)
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.