Wiki

Clone wiki

Hash2Vec / Distance

Distance in Hash2Vec

Hash2Vec uses Cosine Similarity for close vectors

Hash2Vec Documentation:

When finding the distance in the algorithm Hash2Vec need to take note of process vectorization in numerical vector.

If vectors have not been normalized to single or decimal vectors, the results of finding the distance may be incorrect.

That's why method Distance have next parameters: count and accuracy (default accuracy 1).

If the vectors are normalized to single or decimal vectors, then the result will be more correct and then the accuracy will be 0.

#!c#

 var vocabulary = new Hash2VecBinaryReader().Read("InputFile"); //InputFile - vectorization vector
 var distanceList = vocabulary.Distance("test", count:50, accuracy:2).ToList();

     distanceList.ForEach(dis =>
                Console.WriteLine("{0}\t\t ||{1,10:F6}", dis.Representation.WordOrNull, dis.DistanceValue));

Updated