Wiki

Clone wiki

virusbattle-sdk / Similar Binaries

ACTION: SEARCH FOR SIMILAR BINARIES

To search for similar binaries, use:

  vbclient.py -a matches [--threshold] [--fullmatrix] [--outdir] arg1 arg2 ...

As before arg1, arg2 are sha1 hashes of files that must be either binary.pe32 or binary.unpacked. It doesn't provide similarity for any archive files. The binaries included in an archive may be found using the vbclient.py -a query command.

One may consider the matches command similar to a Google query. It gives a list of all the binaries similar to the arg, upto a given similarity threshold, where threshold is a real number between 0 and 1. If threshold is not provider, a default value is used.

By default the matches command returns only the sha1s that are higher than the arg. This default is kept to serve the use case when a user may wish to compute the similarity matrix between a large collection of binaries. In such case, since the similarity matrix is symmetric it is sufficient to return just the upper diagonal. This option serves the power user who may upload a lot of files and search for similarity between all of them by searching for matches of one sha1 at a time.

The --fullmatrix option may be used should a user desire to receive all of the matches, i.e., sha1s lower and higher than the query sha1.

The output of the matches is saved in the files $outdir/similarity.csv and $outdir/similarity.json. CAUTION: These files are overwritten. So if you want to search for similarity between a lot of files, it is suggested to use the --lf option to give the list of sha1s to be searched.

Updated