calibre_utils / README.txt


Helper scripts for some Calibre_ tasks. 

Script list

The following scripts are available:


    Checks books without ISBN (set in metadata) for ISBN-like string
    present in leading pages. If found, add it to the metadata (what
    makes it possible to download full metadata, covers, etc).


    Convert any .doc to .rtf (unless already present) - using


    Checks given directory tree for books not yet present in calibre,
    add them if found. Uses binary file comparison to check whether
    the file is identical (file name and metadata are not used, on


    Checks whether Calibre database directory contains some unregistered
    files and report them if found.


Queries Calibre for all books without ISBN, then tries to locate
ISBN inside (via scanning a few leading pages) and updates Calibre
book metadata if ISBN is found.

Run it without parameters::


Later on ISBN can be used to grab the book metatada and/or book cover
inside Calibre GUI. Just spawn Calibre and look for books with ISBN
set and missing metadata, for example using query like::

     isbn:~[0-9] not publisher:~[a-z]

(above means: isbn contains some digit, publisher does not contain any
letter), then - depending on your workflow - either update them
automatically (mark them, right click, expand Edit Medatada
Information submenu and pick Download Metadata) or review individually
(right click/Edit Metadata/Edit Metadata Individually and use 
metadata download buttons, reviewing replies before applying them).


Queries Calibre for all books which have only .doc format, then uses
OpenOffice to convert them to .rtf and add this format as an

OpenOffice (and pyuno libraries provided by it) are used in the

Run it without parameters::


Note: the script happens to crash on the end of the job (while
finishing).  I haven't diagnosed the reasons (most likely the problem
is in the libraries I use), but the crash is harmless and does not
influence the actual conversion process.


Reports the files present inside Calibre library directory but not
present in the database (= not visible in the interface).

The files are reported to standard output. To add them
all to calibre, pipe output. For example:: | xargs -d "\n" calibredb add

(but, better, review everything beforehand)

*The problematic scenario may happen for example if Calibre is used
from two or more machines over synchronized or networked directory
and, by mistake two copies are run simultaneously.*


Scans given directory, adds to calibre all books which are not yet
present there. Duplicate checking is done solely on file content
comparison, file is skipped if identical file is already present
in Calibre.
I wrote this script to handle *I want to ensure everything is already
imported and can be deleted* scenario.


    calibre_add_if_missing /home/jan/OldBooks

(and later remove OldBooks if everything is OK).

Installation and configuration


Calibre must be installed, properly configured and has
some database (otherwise it does not make sense to run those scripts).


command must be in PATH (or calibredb variable inside .ini file must
be properly set, see below).

Tools providing commands::


should be installed and present in PATH (or properly configured in
.ini, or disabled in .ini, see below). On Ubuntu Linux or Debian Linux
those can be installed from standard repositories, just install the
following packages::


Python 2.6 or 2.7 is required (scripts are using some features
introduced in 2.6 - in particular tempfile extensions, subprocess and
namedtuple). Also, lxml library must be installed.  On Debian or
Ubuntu just install the following packages::


For calibre_convert_docs_to_rtf to work, ootools_ library must be
installed. Simplest method to install it::

    easy_install ootools

(on Ubuntu `sudo easy_install ootools`).

I develop and use those scripts on Ubuntu Linux. They should work on
Windows or Mac if necessary tools are installed, but I've never tried

Actual installation


    easy_install mekk.calibre

should do.


The `~/.calibre-utils` file can be used to configure some program
settings.  The file is created, if missing, whenever any of the
scripts is run, and can be customized.

Here is the default content::

    catdoc = catdoc
    archmage = archmage
    djvutxt = djvutxt
    calibredb = calibredb
    pdftotext = pdftotext
    guess_lead_lines = 10000
    guess_lead_pages = 10

The commands section defines location of the external tools being
used.  In case the commands are present in PATH, bare names can be
used. Otherwise full path can be specified. Finally, if some tool is
missing, it can be defined as empty string.

The isbn-search section specifies how many leading pages (in
page-based document formats like PDF or DJVU) or lines (in the free
formats like TXT or CHM) are scanned looking for ISBN-like strings.

For example, the file can be changed so::

    catdoc = /usr/local/bin/catdoc
    archmage = 
    djvutxt = 
    calibredb = /opt/calibre/calibredb
    pdftotext = pdftotext
    guess_lead_lines = 12000
    guess_lead_pages = 15

In such a case catdoc will be used from /usr/local/bin, calibredb will
be expected in /opt/calibre, pdftotext will be sought in PATH, and
archmage and djvutxt will be treat as missing (so the isbn guessing
script won't be able to scan CHM and DJVU files for ISBN and will
ignore them).

Sources, bug reports

The project is `hosted here`_.

.. _hosted here:
.. _Calibre:
.. _ootools: