ACTION: SEARCH FOR SIMILAR BINARIES
To search for similar binaries, use:
vbclient.py -a matches [--threshold] [--fullmatrix] [--outdir] arg1 arg2 ...
sha1 hashes of files that must be either
binary.unpacked. It doesn't provide similarity for any
archive files. The binaries included in an archive may be found using the
vbclient.py -a query command.
One may consider the
matches command similar to a Google query. It gives a list of all the binaries similar to the
arg, upto a given similarity
threshold is a real number between 0 and 1. If
threshold is not provider, a default value is used.
By default the
matches command returns only the sha1s that are higher than the arg. This default is kept to serve the use case when a user may wish to compute the similarity matrix between a large collection of binaries. In such case, since the similarity matrix is symmetric it is sufficient to return just the upper diagonal. This option serves the power user who may upload a lot of files and search for similarity between all of them by searching for matches of one sha1 at a time.
--fullmatrix option may be used should a user desire to receive all of the matches, i.e., sha1s lower and higher than the query sha1.
The output of the matches is saved in the files
$outdir/similarity.json. CAUTION: These files are overwritten. So if you want to search for similarity between a lot of files, it is suggested to use the
--lf option to give the list of sha1s to be searched.
ACTION: SEARCH FOR SIMILAR PROCEDURES
To search for similar procedures, use:
vbclient.py -a search [--noLibrary] [--limit] sha1/0xrva1 sha1/0xrva2 ...
search command searches procedures similar to a given one. A procedures is identified as
sha1 is the sha1 of the binary and
rva is the relative virtual address of the procedures in hex format.
The --noLibrary option removes library procedures from the search.
The --limit option can be set one of two case-sensitive values- either High or Low.
High limits the procedure search results to semantically equivalent procedures, that is procedures with same juice, or, procedures with very high similarity only.
Low limits the procedure search results to semantically similar, but not equivalent, procedures. These are usually procedures that share some blocks of juice but not all.