Wiki

Clone wiki

Hash2Vec / Home

Quick start in Hash2Vec

Hash2Vec tool for vectorizing text to numerical vector.

The basis of this algorithm is the principle of obtaining a vector based on the morphological structure of the word and coding this basis of the word into a numerical vector.

Hash2Vec can be used in two operating modes: in fuzzy search mode and in machine learning mode for solving classification problems.

Hash2Vec supports several languages for vectorizing text:

  • Russian
  • English
  • French
  • Russian & English

Hash2Vec Documentation:

Example vectorization text

#!c#
var hash2vecBuild = new Hash2VecBuild(inputFile, outputFile) { WithBinary = binary };
 hash2VecBuild.BuildNormalizationVector(new Hash2VecToRussian()); // Vectorization vector Russian
 hash2VecBuild.BuildNormalizationVector(new Hash2VecToEnglish()); // Vectorization vector English
 hash2VecBuild.BuildNormalizationVector(new Hash2VecToFrench());  // Vectorization vector French
 hash2VecBuild.BuildNormalizationVector(new Core.Hash2Vec(size:75)); // Vectorization vector Russian & English (default length 75)

 //There is no default vector normalization in the initialization Build
 hash2VecBuild.Build(size:75) // Vectorization vector Russian & English (default length 75)

Example normalization vector

#!c#
 var normalizationVectors = new NormalizationVectors();
 var vectors = normalizationVectors.LoadVectors("InputFile"); // Loading vectors to normalization
 normalizationVectors.NormalizationVector(ref vectors); // Normalization vectors
 normalizationVectors.CheckHashVector(ref vectors);  // Vector hash check

Example test distance vector

#!c#
var vocabulary = new Hash2VecBinaryReader().Read("inputFile"); // InputFile vectorization vectors
var distanceList = vocabulary.Distance("milks", 50,0).ToList();

    distanceList.ForEach(dis =>
       Console.WriteLine("{0}\t\t ||{1,10:F6}", dis.Representation.WordOrNull, dis.DistanceValue));

Example test Fuzzy Search in Hash2Vec

#!c#
var input = "молоко домик в деревне";
var name = "молоко в деревне";
var distHash2Vec = input.Hash2VecDistanceCorrect(name);
 if (distHash2Vec > 0.55) //Result greater than 55 good result
      Console.WriteLine("\t{0:###,###.00000} against {1}", distHash2Vec, name);

Updated