Crashing when extracting non-English subtitles

Issue #404 closed
Egill Sigurður Friðbjarnarson created an issue

OCR on English subtitles works splendidly.

But when attempting to extract non-English subtitles, Subler crashes without warning. The loading-bar is visible for only a second before the program crashes.

I have tried this with:

  • several different sources
  • several languages
  • different Tesseract training files

The documentation is not clear wether the .traineddata files need to have a specific name, but I tried with both full and abbreviated language names.

I'm running Subler v 1.4.5, MacOS High Sierra v 10.13.1.

I thank you for your amazing program.

Comments (5)

  1. David Munch

    Just FYI, I do this regularly on media with danish (Blu-ray and DVD sourced) subtitles (Has the letters æøå, which Tesseract catches just fine), and I never see crashes.

  2. Christian Leberfinger

    I experienced the same problem as original poster. I found that the linked tesseract files didn't work for me. Tesseract data files underwent some re-organization as it seems.

    But in their wiki, the Tesseract developers maintain a list of links to old data files: https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-302

    -> I strongly suggest to update the link mentioned in https://bitbucket.org/galad87/subler/wiki/Subtitles%20Guide to the working one.

    BR Christian

  3. Log in to comment