other language stops conversion

Issue #140 resolved
MI B created an issue

What I did:

Added the latest swedish language 3.02 tessdata from the source at 'https://code.google.com/p/tesseract-ocr/downloads/detail?name=swe.traineddata.gz&can=2&q=' to the suggested folder '~/Library/Application Support/Subler/tessdata'

What I expected: I had verified it working fine before, but the characters with umlauts had their umlauts dropped. Based on that I was hoping conversion would still work and umlauts would be kept.

What happens: Subtitle conversion stops working completely if i have swedish language tessdata installed, even as it worked before without umlauts.

My source has swedish as well as english VobSub source. The file was made with Handbrake.

How can I make this work with another language? Is this a known bug or limitation? How can I assist supplying info on this? What can I test myself?

Comments (17)

  1. MI B reporter

    OK, I've found no way to export the sub. I have a 110MB clip. OK, to attach that? Or I can share it via dropbox. Or you have a suggestion for a tool for exporting the VOBSUB only?

  2. Damiano Galassi repo owner

    Did you decompress the tesseract-ocr-3.02.swe.tar.gz and copied only the swe.traineddata file, right?

  3. MI B reporter

    Yes, of course. I tried placing swe.traineddata at '~/Library/Application Support/Subler/' and the suggested '~/Library/Application Support/Subler/tessdata' and even put it next to eng.traineddata file inside Subler at one time. Is there some way I can verify if the file is picked up at all? A debug mode I can use?

  4. MI B reporter

    I'm investigating if there's a local problem here so re-ripping my source DVD.

    Perhaps I should have mentioned I'm on OS X Mavericks 10.9.5. Are there perhaps some other dependency that could be at play here, like I need to update some library in OS X or something like that?

  5. MI B reporter

    I've reripped now, installed the 3.0 version (rather than 3.02) and later removed the tessdata file. Removed and reinstalled Subler 1.06 as well.

    Still there is no conversion taking place at all, not even without umlauts. If I open the resulting file it doesn't contain a new subtitle text track, not even for the english VOBSUB.

    I tried to attach a 318kb big extracted DVD_TITLE subtitle here, but doesn't seem to be possible.

  6. Damiano Galassi repo owner

    It works for me on 10.11, I've put the swe.traineddata in ~/Library/Application Support/Subler/tessdata . Can you describe exactly the procedure you use to convert the vobsub track in Subler?

  7. MI B reporter

    FIrst, I'd like to point out an apparent error in the Wiki FAQ that says: "Subler will load the new .tessdata file automatically when needed.". I assume this is supposed to be ".traineddata"?

    This is how I do the whole process including variations and installation of Subler: I download Subler-1.0.6.zip and install this in my "Applications". Any previous files from Subler have been removed with AppCleaner.: Deleted Subler files.png

    As a later step (to first try once without): Downloaded https://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.swe.tar.gz and install the resulting "swe.traineddata" to the suggested folder (See below).

    Processing in Subler: 1. I drag the movie to the Subler icon in the dock. The tracks look like this: Source tracks window.png 2. I use these settings in preferences: Subler preferences.png

    1. I use "Save As" to a new file to not overwrite the original.
    2. Optionally I add directly to the Queue instead

    Already here then without tessdata, in iTunes 12.3.1.23 there are no subtitles at all displayed. iTunes language.png

    If I repeat installation — Again first removing everything with AppCleaner —and install the "swe.traineddata" file to /Library/Application Support/Subler/tessdata — which means I have to create the folder "tessdata" — and redo processing the results are the same. If I just add the movies to the queue the result is again a file without Tx3g text. At least that I can see.

    As I've said I managed to make subtitles that did work the first time I ran Subler. But since I installed swe.tranineddata and removed it, there are no text subtitles generated seemingly no matter what I do.

  8. MI B reporter

    Could my issues be because Subler rely on something no longer in my system? Some library? What can I do to further analyse my machine in order to find the cause? Is there a debug mode? I have developer tools for Mavericks installed, but I'm not experienced with the language I assume you use, nor with Xcode 6.2 that I think is the latest I can run.

    Would it help at all if I attempt to build on my machine? Is it at all possible with 6.2?

  9. Damiano Galassi repo owner

    Wait, you open the mp4 directly? That won't work. Subler convert things only if you import a file, create a new document and click the plus button.

  10. MI B reporter

    I can add that lsof on the ".traineddata" file indicates the file is at least read once, at startup.

    "Path\x20F 23769 userName 116r REG 1,2 5949353 16511969 ~/Library/Application Support/Subler/tessdata/swe.traineddata"

  11. MI B reporter

    Problem solved! Thank you!

    Unfortunately I don't think this was obvious though. Also why can you add directly to the queue if that doesn't involve automatic importing? I'd prefer if I could make settings once and then drop all files on the queue, then run it. Is there something I can do there or do I have to always import each file from the menu option?

    Is this what sets are about perhaps?

  12. Log in to comment