Common Resource Grep

Version: 1.0.5, Jan 21st, 2016.

(C) Copyright 2013-2016 Craig Ryan. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License (LGPL) version 2.1 which accompanies this distribution, and is available at http://www.gnu.org/licenses/lgpl-2.1.html

CRGREP is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.


CRGREP is an open source COMMON RESOURCE GREP command line utility to match text by name and content discovered in difficult to access resource data.

Search for pattern matches in database tables, ZIP and other archive files, MS Office formats, images and scanned documents, maven dependencies, web resources and combinations of supported resources nested within other resources.

Differences to plain old grep

CRGREP combines various features of both find and grep for deep search and pattern matching. Users of plain old grep will note various general differences. Not all standard grep options are provided either because they are yet to be developed or do not make sense for CRGREP style search. CRGREP will display search results with specific details depending on the type of resource matched and may include information such as page number, column name or sheet number in Excel.


README.md: (this file) discover CRGREP capabilities and usage.

INSTALL.txt: read this if you download a binary distribution ready to run. It is important to read this document, specifically details of configuration of some third party software.

BUILD.txt: read this if you download a source distribution to build CRGREP yourself.

CHANGELOG.txt: history of versions and changes in CRGREP.

This document is accompanied by additional documents:

docs/USAGE.txt           usage, platform and configuration details
docs/FILE_GREP.txt       file grep in detail
docs/DATABASE_GREP.txt   database grep in detail
docs/WEB_GREP.txt        web grep in detail

As this is a multi-platform download, a 'docs/unix/' directory also contains the additional documents in Unix/Mac friendly format.

CRGREP in action

Here are some examples showing what you can do with CRGREP.

1/ What files and data are nested anywhere under my 'target' directory matching 'key' including data buried inside archives?

$ crgrep -r key target

target/simple_file.txt: a key moment
target/monkey-pics.txt:1:A file about happy monkeys.
target/test-ear.ear[META-INF/MANIFEST.MF]:5:Created-By: Apache monkey

2/ Is there data in my database matching 'handle'? ('~/.crgrep' defines user/password)

$ crgrep -d -U "jdbc:sqlite:/databases/db.sqlite3" handle '*'
(relational database)

customers: [id,name,status,joined_date,handle]
customers: 3,Craig,active,2012-10-24 01:05:44,Craig's handle
tags: [id,tag]
tags: 4,handle

$ crgrep -d -U "http://localhost:7474/db/data/" handle '*'
(Neo4J graph database)

Node[3]:Customer {name:"Craig",handle:"Craig's handle"}
Node[4]:Tags {tag:"handle"}

3/ Does my scanned report document contain the word 'report'

$ crgrep --ocr report report_scan.png

report_scan.png:10: abc report for management

4/ Which of my Microsoft Office files mention 'profit'?

$ crgrep profit msoffice/*

agm.doc:2:4:The annual profit has risen this year    
statement.xlsx:1:2:4:Annual profit:
board.ppt:3:Highlights contributing to profit figures

5/ Where is my favourite AC/DC track under an MP3 library?

$ crgrep "Back in Black" music/*

music/HellsBells.mp3: @{Album=Back In Black, TrackTitle=Hells Bells, Music By=Angus Young,..}",

6/ Which of my photo images are tagged with comments of our holiday in Perth?

$ crgrep Perth pics/*.jpeg

pics/pic1112.jpg: @{JpegComment=Lovely shot of Perth city, just arrived.} 
pics/pic1113.jpg: @{JpegComment=Breakfast in a Perth cafe}

7/ Does the google web page contain a 'favicon' reference?

$ crgrep google_favicon http://www.google.com

http://www.google.com:<!doctype html><html itemscope="itemscope" itemtype="http://schema.org/WebPage"><head><meta itemprop="image" content="/images/google_favicon_128.png"><title>Google</title>...

8/ Do I have any maven (POM) dependencies in my project with content matching 'RunWith'?

$ crgrep -m RunWith pom.xml


CRGREP will search for text matches within any combination of supported resources contained within, or referenced by, other supported resources.

Calling CRGREP

Ensure you have at least Java 8 installed and your environment has JAVA_HOME and PATH set correctly e.g:

$ set JAVA_HOME=C:\path\to\java1.8
$ set PATH=...;C:\path\to\crgrep\bin

All CRGREP operations involve a similar set of command line arguments:

$ crgrep <pattern> <resource path(s)>

Simple wildcards '' and '?' may be specified in 'pattern' while 'resource path' supports full glob pattern search including 'ant style' glob (such as 'a/**/.txt').

To read from standard input (stdin) either specify no <resource path> or provide a hyphen '-':

$ cat file.txt | crgrep sometext [-]

See docs/USAGE.txt for further usage details.

Displaying Results

Results are displayed in a format depending on the type of resource and includes the name of the resource, any nesting information, line and page numbers and any matching text.

The basic format for displaying results for file based resources is


Some examples of CRGREP displayed results

 Output                                | Match
 ------------------------------------- | -----------------------------
 src/foo.java                          | File listing match
 src/bar.txt:25:some text              | File content match (+lineno)
 lib/all.zip[image.gif]                | Archive file listing match
 lib/app.war[WEB-INF/web.xml]:6:<d..>  | Archive file content match
 pom.xml->stuff.zip[doc.txt]           | File listing match
 mypic.jpg: @{Size=25,Com=Scene}       | File meta-data match
 TAB: [COL1,COL2,COL3]                 | Table column name match
 TAB: data1,data2,data3                | Table data match
 Node[1]:{name:"John"}                 | Graph database node match
 sample.pdf:1:1:Sample PDF Document    | Text extracted from a PDF 
                                       | (+pageno and +linenum)
 report.docx:2:Second paragraph        | MS Word text (+paragraph)

See docs/USAGE.txt for a detailed description of output formats.

File Grep

A file grep will search for textual matches in the following resource types:

  • Plain text files similar to normal grep
  • Resources within archives such as ZIP, TAR, WAR, EAR and JAR formats
  • Meta data in images (jpeg/gif etc) and MP3 audio files
  • Text in scanned documents (jpeg/gif/tiff/bmp/png), extracted using Optical Character Recognition (OCR) techniques.
  • Text extracted from PDF files and MS Office formats (doc[x], xls[x], ppt[x])
  • Maven POM files, following dependency trees of resource artifacts

The file docs/FILE_GREP.txt provides more details on file based grep.

Database Grep

By specifying a database grep (-d option), the search will attempt to match the search pattern against persisted data in either a relational (SQL) database or graph database identified by a URI. For example

$ crgrep -d -U jdbc:vendor:db 'john' 'mytable.column*'

See docs/DATABASE_GREP.txt for detailed database grep behaviour. This document contains important configuration requirements for connecting to supported database servers.

Web Grep

A web page search is attempted when the <resource> begins with 'http://' and no -d (database) option is specified.

See docs/WEB_GREP.txt for usage and examples.