Wiki

Clone wiki

Hash2Vec / FuzzySearch

Fuzzy Search in Hash2Vec

Fuzzy Search implemented in the Hash2Vec library -> namespace ServiceManager.FuzzySearch

Hash2Vec Documentation:

The classic implementation of fuzzy search in the Hash2Vec algorithm is constructed by finding a given arithmetic distance for a given search line, that is distance is calculated for each term of a word in a line.

Example classic realization method Fuzzy Search in Hash2Vec (Hash2VecDistance)

#!c#

 public static double Hash2VecDistance(this string str, string comparison)
 {
     var coefficient = 0.0;
     var vocabulary = VectorizeData(str);
     var comparisonWords = comparison.Split(' ','.',',');
     var distanceToList = new List<DistanceTo>();

     foreach (var word in comparisonWords)
     {
         // Get a collection of distances for a word
         var distanceList = vocabulary.Distance(word, count:10, accuracy:0).ToList();               

         // Check the same vector, if true then the distance is 1.0
         var sameVector = vocabulary.Words.FirstOrDefault(w => 
                                    w.NumericVector.EqualsVectors(Hash2Vec.GetHashVector(word)));
         if (sameVector != null) distanceList.Add(new DistanceTo(sameVector,1.0));

         // If the result is more than 0.75 then add to the collection
         var result = distanceList.OrderByDescending(dis => 
                      dis.DistanceValue).FirstOrDefault(dis => dis.DistanceValue > 0.75);
         distanceToList.Add(result);
     }

     // Calculate similarity distance
     distanceToList.ForEach(dis => { if (dis != null) coefficient += dis.DistanceValue; });
     return coefficient/distanceToList.Count;
 }

Hash2Vec contains two methods fuzzy search:

  • Hash2VecDistance
  • Hash2VecDistanceCorrect

You can also implement your methods based on distance

Example test Fuzzy Search in Hash2Vec

#!c#
var input = "молоко домик в деревне";
var name = "молоко в деревне";
var distHash2Vec = input.Hash2VecDistance(name);
 if (distHash2Vec > 0.55) //Result greater than 55 good result
      Console.WriteLine("\t{0:###,###.00000} against {1}", distHash2Vec, name);

Updated