1. medoc
  2. uchardet-enhanced


uchardet-enhanced / essais /

Filename Size Date modified Message
642 B
3.2 KB

Embryonic try to compute the confidence using a full scalar
product of the the digram frequencies from the reference and the current
sample. In practise, sometimes works better than the stock method
(proportion of high frequency digrams), sometimes worse, and not worth the
change. Maybe would work better without the 0,1,2,3 quantization of
frequencies, but needs to recompute all the language tables then. Also the
way the current code computes the table for the sample is probably not ok
for smaller samples where there may not be 512 or 1024 most frequent pairs:
would probably need to reduce those threshold in this case.