A Rails application using Tesseract-OCR to extract words and characters from images. Also contains a Japanese dictionary (EDICT) and example sentences (Tanaka Corpus).
- tesseract-ocr and Japanese language files
- libtesseract and libleptonica
Deploying in development environment
Assuming you have Rails set up.
bundle install rake db:migrate rake dictionary:import rake sentences:import rake kanji:import rails server
The dictionary and sentences are downloaded by the rake task, so an Internet connection is required.
To run the tests run the following command
A coverage report is generated and placed in coverage/index.html
Follow this link.