1. insomniacslk NA
  2. mergelogs

Overview

HTTPS SSH

README

What is it?

This is a simple command-line tool written in Python that helps the analysis of OpenStack logs. I have created it to help me with an assignment of COMP40750 - Corporate Investigations for my M.Sc. in Digital Investigations and Forensic Computing at UCD.

This is a weekend project (actually I developed it during a flight) and the code is really ugly. Please be kind and bear with me :)

Compatibility

This library is written in pure Python and runs decently on Linux (primary development platform), Mac OS X and Windows. On Windows it may be a bit trickier to set up.

Dependencies

  • The Python programming environment, at least version 2.7 (not tested with 3.x)
  • dateutil to parse timestamps appropriately (timezone aware)
  • [optional] IPython for the interactive shell features
  • [optional] grec if you want colorized grep output

How to install it

If you have pip on your system, which is part of the Python distribution starting from version 2.7.9, you can just run:

pip install --user https://bitbucket.org/insomniacslk/mergelogs/get/master.tar.gz

The equivalent command on Windows is

C:\Python27\Scripts\pip.exe install --user https://bitbucket.org/insomniacslk/mergelogs/get/master.tar.gz

Download it

Either download the zipball at https://bitbucket.org/insomniacslk/mergelogs/get/master.zip or get it via git on a command line:

git clone https://insomniacslk@bitbucket.org/insomniacslk/mergelogs.git

Install it

If you downloaded the zipball, unpack it somewhere in your system, then open a terminal and enter the directory that it created. To install it run the following command:

python setup.py install # as root, or add --user for local installation.
                        # If you use OSX, you need "--user --prefix=" to work around a system bug

If you are on Windows, you will probably have to run something like this:

C:\Python27\python.exe setup.py install

or have the path to python.exe in your system's PATH and just call python.exe.

Features

The tool can:

  • merge various types of logs, including multi-line log entries, and sort them by timestamp
  • filter the logs in a specific time range
  • print some statistics, like the number of log entries and log lines, the files that contributed to the log analysis, and the holes that log files have.
  • search and filter by expressions, including regular expressions
  • save the resulting logs to a file
  • drop into an IPython shell for further analysis. This is the most powerful way of using the tool. Requires IPython installed

How to use it

The following commands assume you have a UNIX-like environment. If you have Windows, replace the python command with something like C:\Python27\python.exe.

Show the usage instructions

Use use --help:

python -m mergelogs --help
usage: mergelogs [-h] [--save SAVE] [--start START] [--end END] [--stats]
                 [--verbose] [--shell] [--grep GREP] [--case-sensitive]
                 [--regex]
                 files [files ...]

positional arguments:
  files             The input files or directories to merge. If an argument is
                    a directory, all the files contained at the first level
                    are considered

optional arguments:
  -h, --help        show this help message and exit
  --save SAVE       The merged and sorted log file
  --start START     The start timestamp in YYYY-mm-dd[ HH:MM:SS[.usec]] format
  --end END         The end timestamp in YYYY-mm-dd[ HH:MM:SS[.usec]] format
  --stats           Print some statistics on the logs
  --verbose         Print more verbose statistics
  --shell           Spawn an IPython shell after parsing
  --grep GREP       Filter by pattern, like grep. Case insensitive by default
                    (see --case-sensitive)
  --case-sensitive  Makes grep case sensitive
  --regex           Used with --grep, treats the pattern as a regular
                    expression

Merge and print some stats

Use --stats

Statistics
================================================================================
Total log lines: 63150
Total log entries: 51456
Oldest entry timestamp: 2014-04-16 20:40:48.613000
Newest entry timestamp: 2014-04-22 12:04:38.208000
Total log files: 19
Discarded log files: 0
--------------------------------------------------------------------------------
Excluded log entries: 0
Log entries saved to merged.log. Time range: (None, None)

NOTE on the DOS shell in Windows you cannot use wildcards. Either use the PowerShell, or copy all the log files in a subdirectory and use it as a target.

Print more verbose statistics

For example, the holes found in the logs. Use --verbose:

python -m mergelogs logs/* --stats --verbose
Statistics
================================================================================
Total log lines: 63150
Total log entries: 51456
Oldest entry timestamp: 2014-04-16 20:40:48.613000
Newest entry timestamp: 2014-04-22 12:04:38.208000
Total log files: 19
        logs/access.log
        logs/all.log
        logs/api.log
        logs/cinder-api.log
        logs/cinder-scheduler.log
        logs/cinder-volume.log
        logs/error.log
        logs/nova-api-metadata.log
        logs/nova-api.log
        logs/nova-cert.log
        logs/nova-compute.log
        logs/nova-conductor.log
        logs/nova-consoleauth.log
        logs/nova-dhcpbridge.log
        logs/nova-manage_csi.log
        logs/nova-manage_lbg.log
        logs/nova-network.log
        logs/nova-scheduler.log
        logs/registry.log
Discarded log files: 0
Holes (min 1h) found: 2
        2014-04-16 21:48:37.116000 ~ 2014-04-17 09:36:23.295000 (length: 11:47:46.179000)
        2014-04-17 18:15:43.977000 ~ 2014-04-21 14:24:08.113000 (length: 3 days, 20:08:24.136000)

--------------------------------------------------------------------------------
Excluded log entries: 0
Log entries saved to merged.log. Time range: (None, None)

Merge and filter logs in a time range

If you want to look for activity between 00:00 and 12:00 of the 21st of April 2014, so run:

python -m mergelogs --start '2014-04-21' --end '2014-04-21 12:00:00' logs/* --stats --verbose
Statistics
================================================================================
Total log lines: 0
Total log entries: 0
Oldest entry timestamp: None
Newest entry timestamp: None
Total log files: 0
Discarded log files: 0
Holes (min 1h) found: 0

--------------------------------------------------------------------------------
Excluded log entries: 51456
Log entries saved to merged.log. Time range: (2014-04-21 00:00:00, 2014-04-21 12:00:00)

and check the output in merged.log. You can change the output file using --save.

Filter logs by content

There is a grep-like functionality embedded in the tool. Just use --grep <your-pattern> . You can also tell the parser that the pattern is a regular expression by using --regex. For example:

python -m mergelogs logs/* --grep login --stats
Statistics
================================================================================
Total log lines: 298
Total log entries: 298
Oldest entry timestamp: 2014-04-17 09:38:58
Newest entry timestamp: 2014-04-22 11:53:48
Total log files: 3
Discarded log files: 0
--------------------------------------------------------------------------------
Excluded log entries: 0
Log entries saved to merged.log. Time range: (None, None)
python -m mergelogs logs/* --grep '.*/static/.+\.png' --regex --stats
Statistics
================================================================================
Total log lines: 123
Total log entries: 123
Oldest entry timestamp: 2014-04-17 10:23:56
Newest entry timestamp: 2014-04-22 10:47:31
Useful log files: 1
--------------------------------------------------------------------------------
Excluded log entries: 0
Log entries saved to merged.log. Time range: (None, None)

and again check the output in merged.log.

Drop into the IPython shell

This is the most powerful way of using mergelogs.

You can use the shell to retrieve the same information.

Enter the shell with the following command:

python -m mergelogs logs/* --shell

Then, to print the statistics of the logs, like using --stats --verbose, run:

In [1]: print logs.in_range('2014-04-21 12:00:00', '2014-04-21 23:59:00').stats(verbose=True)
Total log lines: 31703
Total log entries: 24877
Oldest entry timestamp: 2014-04-21 14:24:08.113000
Newest entry timestamp: 2014-04-21 23:58:35.380000
Total log files: 17
        logs/access.log
        logs/all.log
        logs/api.log
        logs/cinder-api.log
        logs/cinder-scheduler.log
        logs/cinder-volume.log
        logs/error.log
        logs/nova-api-metadata.log
        logs/nova-api.log
        logs/nova-cert.log
        logs/nova-compute.log
        logs/nova-conductor.log
        logs/nova-consoleauth.log
        logs/nova-dhcpbridge.log
        logs/nova-manage_lbg.log
        logs/nova-network.log
        logs/nova-scheduler.log
Discarded log files: 0
Holes (min 1h) found: 0

Or, if you want a machine-useable data format, you can have a dictionary using as_dict=True:

In [2]: logs.stats(verbose=True, as_dict=True)
Out[2]:
{'holes': [(<LogEntry (timestamp=datetime.datetime(2014, 4, 16, 21, 48, 37, 116000), logfile='logs/nova-dhcpbridge.log', message='2014-04-16 21:48:37.116 19437 INFO nova.openstack.common.rpc.common [req-ec14df42-6f7f-4ef4-9ddc-43eed4fc7821 None None] Reconnecting to AMQP server on csi:5672\n'),
   <LogEntry (timestamp=datetime.datetime(2014, 4, 17, 9, 36, 23, 295000), logfile='logs/api.log', message='2014-04-17 09:36:23.295 1419 ERROR glance.store.sheepdog [-] Error in store configuration: Unexpected error while running command.\n')),
  (<LogEntry (timestamp=datetime.datetime(2014, 4, 17, 18, 15, 43, 977000), logfile='logs/cinder-volume.log', message='2014-04-17 18:15:43.977 1457 INFO cinder.service [-] Child 1576 killed by signal 15\n'),
   <LogEntry (timestamp=datetime.datetime(2014, 4, 21, 14, 24, 8, 113000), logfile='logs/api.log', message='2014-04-21 14:24:08.113 1394 ERROR glance.store.sheepdog [-] Error in store configuration: Unexpected error while running command.\n'))],
 'logfiles': {'logs/access.log',
  'logs/all.log',
  'logs/api.log',
  'logs/cinder-api.log',
  'logs/cinder-scheduler.log',
  'logs/cinder-volume.log',
  'logs/error.log',
  'logs/nova-api-metadata.log',
  'logs/nova-api.log',
  'logs/nova-cert.log',
  'logs/nova-compute.log',
  'logs/nova-conductor.log',
  'logs/nova-consoleauth.log',
  'logs/nova-dhcpbridge.log',
  'logs/nova-manage_csi.log',
  'logs/nova-manage_lbg.log',
  'logs/nova-network.log',
  'logs/nova-scheduler.log',
  'logs/registry.log'},
 'newest': datetime.datetime(2014, 4, 22, 12, 4, 38, 208000),
 'oldest': datetime.datetime(2014, 4, 16, 20, 40, 48, 613000),
 'total_entries': 51456,
 'total_lines': 63150}

Want to print the log entries containing the text "login" ? Use:

In [3]: logs.grep('login')
Out[3]: <LogEntries (total_entries=298, total_lines=298, num_log_files=3)>

Too many entries to be printed, let's print only the last 3 containing "login":

In [4]: print logs.grep('login').tail(3)
192.168.0.4 - - [22/Apr/2014:11:53:48 +0100] "POST /horizon/auth/login/ HTTP/1.1" 302 1857 "http://192.168.0.10/horizon/auth/login/" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0"
192.168.0.4 - - [22/Apr/2014:11:53:48 +0100] "GET /horizon HTTP/1.1" 302 1733 "http://192.168.0.10/horizon/auth/login/" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0"
192.168.0.4 - - [22/Apr/2014:11:53:48 +0100] "GET /horizon/admin/ HTTP/1.1" 200 4536 "http://192.168.0.10/horizon/auth/login/" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0"

Note that if you have the package grec installed, the grep output will be colorized.

The grep is case insensitive by default. Let's find the last three lines containing exactly "Login" (first letter uppercase):

In [5]: print logs.grep('Login', case_sensitive=True).tail(3)
[Tue Apr 22 10:52:58 2014] [error] Login successful for user "macos".
[Tue Apr 22 10:53:48 2014] [error] Login successful for user "admin".
[Tue Apr 22 10:53:48 2014] [error] Login successful for user "admin".

Want the first three instead? Use head:

In [6]: print logs.grep('Login', case_sensitive=True).head(3)
[Thu Apr 17 09:38:58 2014] [error] Login successful for user "admin".
[Thu Apr 17 09:38:58 2014] [error] Login successful for user "admin".
[Thu Apr 17 09:40:39 2014] [error] Login successful for user "admin".

What about printing the 10th, the 11th and the 12th?

In [7]: print logs.grep('Login', case_sensitive=True)[10:13]
[Thu Apr 17 13:03:10 2014] [error] Login successful for user "admin".
[Thu Apr 17 13:03:10 2014] [error] Login successful for user "admin".
[Thu Apr 17 16:20:58 2014] [error] Login successful for user "kjas kstan".

(yes, the second number is 13, not 12 - this is how Python's slicing works)

Get the logs in the same time range of the examples above:

In [8]: logs.in_range('2014-04-21 12:00:00', '2014-04-21 23:59:00')
Out[8]: <LogEntries (total_entries=24877, total_lines=31703, num_log_files=17)>

Print the last five:

In [9]: print logs.in_range('2014-04-21 12:00:00', '2014-04-21 23:59:00').tail(5)
2014-04-21 23:58:23.277 1551 AUDIT nova.compute.resource_tracker [-] Free ram (MB): 9799
2014-04-21 23:58:23.277 1551 AUDIT nova.compute.resource_tracker [-] Free disk (GB): 76
2014-04-21 23:58:23.277 1551 AUDIT nova.compute.resource_tracker [-] Free VCPUS: -3
2014-04-21 23:58:23.383 1551 INFO nova.compute.resource_tracker [-] Compute_service record updated for littlebigguy:littlebigguy
2014-04-21 23:58:35.380 1707 INFO cinder.volume.manager [-] Updating volume status

Want to search only in specific files? Use in_files. To search only in files whose name contains "nova" use a regular expression:

In [10]: logs.in_files('.*nova.*')
Out[10]: <LogEntries (total_entries=38353, total_lines=49911, num_log_files=11)>

Let's print the last three

In [11]: print logs.in_files('.*nova.*').tail(3)
2014-04-22 12:04:37.094 1579 AUDIT nova.compute.resource_tracker [-] Free disk (GB): -118
2014-04-22 12:04:37.094 1579 AUDIT nova.compute.resource_tracker [-] Free VCPUS: -7
2014-04-22 12:04:37.190 1579 INFO nova.compute.resource_tracker [-] Compute_service record updated for littlebigguy:littlebigguy

Let's combine all together: we want the last 5 lines of the logs contained in "nova" files between 12 PM and 12 AM of 21 April 2014, whose text contains the string "admin":

In [12]: print logs.in_range('2014-04-21 12:00:00', '2014-04-21 23:59:00').in_files('.*nova.*').grep('admin').tail(3)
2014-04-21 18:28:19.364 960 ERROR nova.scheduler.filter_scheduler [req-ec842922-ff2e-4dbc-8e69-f94137dd673d e75374ede9834b41919d07e6a311401c e7dcfe9fe2f143d29e518e41a20553cf] [instance: 23c9e5e7-c994-442a-ab64-de9c1d7fa005] Error from last host: littlebigguy (node littlebigguy): [u'Traceback (most recent call last):\n', u'  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1043, in _build_instance\n    set_access_ip=set_access_ip)\n', u'  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1426, in _spawn\n    LOG.exception(_(\'Instance failed to spawn\'), instance=instance)\n', u'  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1423, in _spawn\n    block_device_info)\n', u'  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2084, in spawn\n    admin_pass=admin_password)\n', u'  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2389, in _create_image\n    project_id=instance[\'project_id\'])\n', u'  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 174, in cache\n    *args, **kwargs)\n', u'  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 309, in create_image\n    self.verify_base_size(base, size)\n', u'  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 224, in verify_base_size\n    raise exception.InstanceTypeDiskTooSmall()\n', u"InstanceTypeDiskTooSmall: Instance type's disk is too small for requested image.\n"]
2014-04-21 18:28:44.866 1551 TRACE nova.compute.manager [instance: 23c9e5e7-c994-442a-ab64-de9c1d7fa005]     admin_pass=admin_password)
2014-04-21 18:28:47.058 1551 TRACE nova.compute.manager [instance: 23c9e5e7-c994-442a-ab64-de9c1d7fa005]     admin_pass=admin_password)

What's missing?

  • the support for timezones. At the moment it supports only naive timestamps
  • pluggable parser backends instead of a horrible try..except chain

Want a new feature? Just ask, see https://insomniac.slackware.it for contacts.

Troubleshooting

You may encounter several errors along the path. I'm collecting the experience here.

Invalid mode when opening the logs

You may get an error like the following when running python -m mergelogs logs/*" or similar command:

C:\Users\ismael\Desktop\UCD CLASSES\Year 2\Corporate investigations
-mergelogs-ffc53172a185>python -m mergelogs logs/*
Traceback (most recent call last):
 File "C:\Python27\lib\runpy.py", line 162, in _run_module_as_main
   "__main__", fname, loader, pkg_name)
 File "C:\Python27\lib\runpy.py", line 72, in _run_code
   exec code in run_globals
 File "C:\Users\ismael\Desktop\UCD CLASSES\Year 2\Corporate invest
mniacslk-mergelogs-ffc53172a185\mergelogs\__main__.py", line 84, in
   main()
 File "C:\Users\ismael\Desktop\UCD CLASSES\Year 2\Corporate invest
mniacslk-mergelogs-ffc53172a185\mergelogs\__main__.py", line 48, in
   total_entries = merge(args.files, args.grep, args.regex)
 File "mergelogs\mergelogs.py", line 288, in merge
   case_sensitive)
 File "mergelogs\mergelogs.py", line 322, in merge_with_prefix
   with open(filename, 'rb') as fd:
IOError: [Errno 22] invalid mode ('rb') or filename: 'logs/*'

Explanation
You are probably running mergelogs under Windows, and you're not using the PowerShell. This is due to the fact that the DOS shell on Windows doesn't support wildcards - the star sign in "logs/*". This is a limit of Windows that can be worked around using the PowerShell, but if you prefer to use the DOS shell, mergelogs is also able to use directories as target.

Solution
Either: use the PowerShell replace "logs/*" with "logs" (or the name of the directory you have logs in). Note that this will pick all the files in the directory "logs"