Wiki
Clone wikiHash2Vec / Home
Quick start in Hash2Vec
Hash2Vec tool for vectorizing text to numerical vector.
The basis of this algorithm is the principle of obtaining a vector based on the morphological structure of the word and coding this basis of the word into a numerical vector.
Hash2Vec can be used in two operating modes: in fuzzy search mode and in machine learning mode for solving classification problems.
Hash2Vec supports several languages for vectorizing text:
- Russian
- English
- French
- Russian & English
Hash2Vec Documentation:
Example vectorization text
#!c# var hash2vecBuild = new Hash2VecBuild(inputFile, outputFile) { WithBinary = binary }; hash2VecBuild.BuildNormalizationVector(new Hash2VecToRussian()); // Vectorization vector Russian hash2VecBuild.BuildNormalizationVector(new Hash2VecToEnglish()); // Vectorization vector English hash2VecBuild.BuildNormalizationVector(new Hash2VecToFrench()); // Vectorization vector French hash2VecBuild.BuildNormalizationVector(new Core.Hash2Vec(size:75)); // Vectorization vector Russian & English (default length 75) //There is no default vector normalization in the initialization Build hash2VecBuild.Build(size:75) // Vectorization vector Russian & English (default length 75)
Example normalization vector
#!c# var normalizationVectors = new NormalizationVectors(); var vectors = normalizationVectors.LoadVectors("InputFile"); // Loading vectors to normalization normalizationVectors.NormalizationVector(ref vectors); // Normalization vectors normalizationVectors.CheckHashVector(ref vectors); // Vector hash check
Example test distance vector
#!c# var vocabulary = new Hash2VecBinaryReader().Read("inputFile"); // InputFile vectorization vectors var distanceList = vocabulary.Distance("milks", 50,0).ToList(); distanceList.ForEach(dis => Console.WriteLine("{0}\t\t ||{1,10:F6}", dis.Representation.WordOrNull, dis.DistanceValue));
Example test Fuzzy Search in Hash2Vec
#!c# var input = "молоко домик в деревне"; var name = "молоко в деревне"; var distHash2Vec = input.Hash2VecDistanceCorrect(name); if (distHash2Vec > 0.55) //Result greater than 55 good result Console.WriteLine("\t{0:###,###.00000} against {1}", distHash2Vec, name);
Updated