Adjust the near-duplicate detection algorithm

Issue #2 new
Ali Hürriyetoglu repo owner created an issue

Take the size of the text into account while you calculate the similarity.

If they are below a certain token size, use another measure, for instance Gestalt approach.

Comments (2)

  1. Log in to comment