ACTION: SEARCH FOR SIMILAR BINARIES
To search for similar binaries, use:
vbclient.py -a matches [--threshold] [--fullmatrix] [--outdir] arg1 arg2 ...
sha1 hashes of files that must be either
binary.unpacked. It doesn't provide similarity for any
archive files. The binaries included in an archive may be found using the
vbclient.py -a query command.
One may consider the
matches command similar to a Google query. It gives a list of all the binaries similar to the
arg, upto a given similarity
threshold is a real number between 0 and 1. If
threshold is not provider, a default value is used.
By default the
matches command returns only the sha1s that are higher than the arg. This default is kept to serve the use case when a user may wish to compute the similarity matrix between a large collection of binaries. In such case, since the similarity matrix is symmetric it is sufficient to return just the upper diagonal. This option serves the power user who may upload a lot of files and search for similarity between all of them by searching for matches of one sha1 at a time.
--fullmatrix option may be used should a user desire to receive all of the matches, i.e., sha1s lower and higher than the query sha1.
The output of the matches is saved in the files
$outdir/similarity.json. CAUTION: These files are overwritten. So if you want to search for similarity between a lot of files, it is suggested to use the
--lf option to give the list of sha1s to be searched.