Clone wiki

virusbattle-sdk / Semantic Matching


To search for similar binaries, use: -a matches [--threshold] [--fullmatrix] [--outdir] arg1 arg2 ...

As before arg1, arg2 are sha1 hashes of files that must be either binary.pe32 or binary.unpacked. It doesn't provide similarity for any archive files. The binaries included in an archive may be found using the -a query command.

One may consider the matches command similar to a Google query. It gives a list of all the binaries similar to the arg, upto a given similarity threshold, where threshold is a real number between 0 and 1. If threshold is not provider, a default value is used.

By default the matches command returns only the sha1s that are higher than the arg. This default is kept to serve the use case when a user may wish to compute the similarity matrix between a large collection of binaries. In such case, since the similarity matrix is symmetric it is sufficient to return just the upper diagonal. This option serves the power user who may upload a lot of files and search for similarity between all of them by searching for matches of one sha1 at a time.

The --fullmatrix option may be used should a user desire to receive all of the matches, i.e., sha1s lower and higher than the query sha1.

The output of the matches is saved in the files $outdir/similarity.csv and $outdir/similarity.json. CAUTION: These files are overwritten. So if you want to search for similarity between a lot of files, it is suggested to use the --lf option to give the list of sha1s to be searched.


To search for similar procedures, use: -a search [--noLibrary] [--limit] sha1/0xrva1 sha1/0xrva2 ...

The search command searches procedures similar to a given one. A procedures is identified as sha1/0xrva, where sha1 is the sha1 of the binary and rva is the relative virtual address of the procedures in hex format.

The --noLibrary option removes library procedures from the search.

The --limit option can be set one of two case-sensitive values- either High or Low.

High limits the procedure search results to semantically equivalent procedures, that is procedures with same juice, or, procedures with very high similarity only.

Low limits the procedure search results to semantically similar, but not equivalent, procedures. These are usually procedures that share some blocks of juice but not all.