Novel Reader: Offline novel reader in Python


  • 自动提取网页正文
  • 能检测和删除小说中的文字型广告
  • 能用自己的样式表定制生成的HTML样式
  • 能修改小说图片章节的图片宽度,以便在手机上阅读
  • 能在Windows和Linux下打印彩色文字
  • 能以HTML或者文本格式输出
  • 能获取部分通过javascript输出的内容

你可以运行python -h 来查看如何使用这个工具离线下载小说。

  • 第一次运行的时候,你需要运行python -r YOUR_LOCAL_NOVEL_DIR -u the.novel.url来设置本地小说下载的根目录, 同时开始下载the.novel.url这个网址(必须是小说的目录页)对应的小说
  • 以后你就可以只用运行python -u the.novel.url来更新本地下载的小说或者下载新的小说了

除了小说阅读以外,这个工具还提供了正文提取(extractor)、彩色打印(colorful)与调整图像大小(resizer)的模块,其中彩色打印模块的测试很简单,只需运行python colorful.py即可。


from extractor import Extractor, DocBuilder, FsCache
url = 'some url'
text = Extractor(url, cache = FsCache(alwaysCache = True)).get_text()
html = DocBuilder(url, text['title'], text['html'], text['links']).buildhtml()

本软件在Python 2.6/2.7下运行,依赖于lxmlPIL,在判断编码的时候可能会用到chardet。若要使用javascript,则需要Windows平台或者安装PyV8


Novel Reader features:

  • Extract text automatically from HTML pages
  • Detect and remove text ads for novels
  • Render result HTML with customizable style sheets
  • Change scanned pages' width so you can read them comfortablly on mobile phone
  • Print colorful under console for Linux & Windows
  • Export as text or HTML
  • Can extract content from some javascript code

You can use the tool as a novel reader by running python -h

  • When you first time run the tool, you can use python -r YOUR_LOCAL_NOVEL_DIR -u the.novel.url to set the root directory for the offline novels and also to download the novel denoted by the.novel.url
  • Afterwards, you can use python -u the.novel.url to update the local novel or to download a new novel

Recent activity

Commits by were pushed to raphaelzhang/novel-reader

54d1fdf - 1. 加入穷游的动态加载图片处理 2. 考虑小说名可能为1个字的情况 3. 需要加入完全单个网页离线下载的功能以及成片下载,不过滤的功能
