Storage for scanned papers with OCR and thumbnails

User scans a paper document (single or multiple pages), adds it to the organizer (via CLI or web), writes a description and tags. The images are automatically parsed by OCR (so fulltext search is possible). Thumbnails are generated (so user can view multiple pages at once).

  • Multi-page documents
  • OCR on save (cuneiform seems to give the most accurate results compared with gocr and ocrad, especially if language if defined by user)
  • Allow user to edit the OCRed text but mainly use it for search.

  1. Andy Mikhailenko

    Mostly implemented in dd0e3fa389fe (adding a page via CLI; OCR on the fly or by request; manual or automatic summary; details extracted from the image via OCR; web interface with thumbnails for list and detail views).

    To do:

    • complete one-stage import process (scan, parse, save);
    • edit summary via web UI;
    • "papers" as ordered lists of "pages";
    • categorization (at least tags);
    • search (by summary and/or details);
    • link to other OrgTool documents (like projects, events, people, plans, messages, needs).
