Major immediate tasks: bring in previous functionality (pageNormalizer, pathNormalizer, whitespaceChunker, text, etc.) identify text chunks * process text chunks

Most of the code has been written some months ago but will take time to refactor

