# Common Resource Grep Version: 1.0.5, Jan 21st, 2016. (C) Copyright 2013-2016 Craig Ryan. All rights reserved. This program is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License (LGPL) version 2.1 which accompanies this distribution, and is available at http://www.gnu.org/licenses/lgpl-2.1.html CRGREP is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. ## Description **CRGREP** is an open source COMMON RESOURCE GREP command line utility to match text by name and content discovered in difficult to access resource data. Search for pattern matches in database tables, ZIP and other archive files, MS Office formats, images and scanned documents, maven dependencies, web resources and combinations of supported resources nested within other resources. ### Differences to plain old grep **CRGREP** combines various features of both **find** and **grep** for deep search and pattern matching. Users of plain old grep will note various general differences. Not all standard grep options are provided either because they are yet to be developed or do not make sense for **CRGREP** style search. **CRGREP** will display search results with specific details depending on the type of resource matched and may include information such as page number, column name or sheet number in Excel. ## Documentation **README.md**: (this file) discover CRGREP capabilities and usage. **INSTALL.txt**: read this if you download a binary distribution ready to run. It is *important* to read this document, specifically details of configuration of some third party software. **BUILD.txt**: read this if you download a source distribution to build CRGREP yourself. **CHANGELOG.txt**: history of versions and changes in CRGREP. This document is accompanied by additional documents: docs/USAGE.txt usage, platform and configuration details docs/FILE_GREP.txt file grep in detail docs/DATABASE_GREP.txt database grep in detail docs/WEB_GREP.txt web grep in detail As this is a multi-platform download, a 'docs/unix/' directory also contains the additional documents in Unix/Mac friendly format. ## CRGREP in action Here are some examples showing what you can do with CRGREP. 1/ What files and data are nested anywhere under my 'target' directory matching 'key' including data buried inside __archives__? $ crgrep -r key target target/simple_file.txt: a key moment target/misc.zip[misc/nested_monkey.txt] target/monkey-pics.txt:1:A file about happy monkeys. target/test-ear.ear[META-INF/MANIFEST.MF]:5:Created-By: Apache monkey 2/ Is there data in my __database__ matching 'handle'? ('~/.crgrep' defines user/password) $ crgrep -d -U "jdbc:sqlite:/databases/db.sqlite3" handle '*' (relational database) customers: [id,name,status,joined_date,handle] customers: 3,Craig,active,2012-10-24 01:05:44,Craig's handle tags: [id,tag] tags: 4,handle $ crgrep -d -U "http://localhost:7474/db/data/" handle '*' (Neo4J graph database) Node[3]:Customer {name:"Craig",handle:"Craig's handle"} Node[4]:Tags {tag:"handle"} 3/ Does my __scanned report document__ contain the word 'report' $ crgrep --ocr report report_scan.png report_scan.png:10: abc report for management 4/ Which of my Microsoft __Office files__ mention 'profit'? $ crgrep profit msoffice/* agm.doc:2:4:The annual profit has risen this year statement.xlsx:1:2:4:Annual profit: board.ppt:3:Highlights contributing to profit figures 5/ Where is my favourite AC/DC track under an __MP3 library__? $ crgrep "Back in Black" music/* music/HellsBells.mp3: @{Album=Back In Black, TrackTitle=Hells Bells, Music By=Angus Young,..}", 6/ Which of my photo __images__ are tagged with comments of our holiday in Perth? $ crgrep Perth pics/*.jpeg pics/pic1112.jpg: @{JpegComment=Lovely shot of Perth city, just arrived.} pics/pic1113.jpg: @{JpegComment=Breakfast in a Perth cafe} 7/ Does the google __web__ page contain a 'favicon' reference? $ crgrep google_favicon http://www.google.com http://www.google.com:<!doctype html><html itemscope="itemscope" itemtype="http://schema.org/WebPage"><head><meta itemprop="image" content="/images/google_favicon_128.png"><title>Google</title>... 8/ Do I have any __maven__ (POM) dependencies in my project with content matching 'RunWith'? $ crgrep -m RunWith pom.xml C:/Users/Craig/.m2/repository/junit/junit/4.8.2/junit-4.8.2.jar[org/junit/runner/RunWith.class] **CRGREP** will search for text matches within any combination of supported resources contained within, or referenced by, other supported resources. ## Calling CRGREP Ensure you have at least **Java 8** installed and your environment has JAVA_HOME and PATH set correctly e.g: $ set JAVA_HOME=C:\path\to\java1.8 $ set PATH=...;C:\path\to\crgrep\bin All CRGREP operations involve a similar set of command line arguments: $ crgrep <pattern> <resource path(s)> Simple wildcards '*' and '?' may be specified in 'pattern' while 'resource path' supports full glob pattern search including 'ant style' glob (such as 'a/\*\*/*.txt'). To read from standard input (stdin) either specify no <resource path> or provide a hyphen '-': $ cat file.txt | crgrep sometext [-] See __docs/USAGE.txt__ for further usage details. ## Displaying Results Results are displayed in a format depending on the type of resource and includes the name of the resource, any nesting information, line and page numbers and any matching text. The basic format for displaying results for file based resources is <resource>[[:pagenum]:linenum:matching_content] Some examples of CRGREP displayed results Output | Match ------------------------------------- | ----------------------------- src/foo.java | File listing match src/bar.txt:25:some text | File content match (+lineno) lib/all.zip[image.gif] | Archive file listing match lib/app.war[WEB-INF/web.xml]:6:<d..> | Archive file content match pom.xml->stuff.zip[doc.txt] | File listing match mypic.jpg: @{Size=25,Com=Scene} | File meta-data match TAB: [COL1,COL2,COL3] | Table column name match TAB: data1,data2,data3 | Table data match Node[1]:{name:"John"} | Graph database node match sample.pdf:1:1:Sample PDF Document | Text extracted from a PDF | (+pageno and +linenum) report.docx:2:Second paragraph | MS Word text (+paragraph) See __docs/USAGE.txt__ for a detailed description of output formats. ## File Grep A file grep will search for textual matches in the following resource types: * Plain text files similar to normal grep * Resources within archives such as ZIP, TAR, WAR, EAR and JAR formats * Meta data in images (jpeg/gif etc) and MP3 audio files * Text in scanned documents (jpeg/gif/tiff/bmp/png), extracted using Optical Character Recognition (OCR) techniques. * Text extracted from PDF files and MS Office formats (doc[x], xls[x], ppt[x]) * Maven POM files, following dependency trees of resource artifacts The file __docs/FILE_GREP.txt__ provides more details on file based grep. ## Database Grep By specifying a database grep (-d option), the search will attempt to match the search pattern against persisted data in either a relational (SQL) database or graph database identified by a URI. For example $ crgrep -d -U jdbc:vendor:db 'john' 'mytable.column*' See __docs/DATABASE_GREP.txt__ for detailed database grep behaviour. This document contains *important* configuration requirements for connecting to supported database servers. ## Web Grep A web page search is attempted when the <resource> begins with 'http://' and no -d (database) option is specified. See __docs/WEB_GREP.txt__ for usage and examples.