Bitbucket is a code hosting site with unlimited public and private repositories. We're also free for small teams!

Close

Novel Reader: Offline novel reader in Python

小说阅读器的功能:

  • 自动提取网页正文
  • 能检测和删除小说中的文字型广告
  • 能用自己的样式表定制生成的HTML样式
  • 能修改小说图片章节的图片宽度,以便在手机上阅读
  • 能在Windows和Linux下打印彩色文字
  • 能以HTML或者文本格式输出
  • 能获取部分通过javascript输出的内容

你可以运行python offline_explorer.py -h 来查看如何使用这个工具离线下载小说。

  • 第一次运行的时候,你需要运行python offline_explorer.py -r YOUR_LOCAL_NOVEL_DIR -u the.novel.url来设置本地小说下载的根目录, 同时开始下载the.novel.url这个网址(必须是小说的目录页)对应的小说
  • 以后你就可以只用运行python offline_explorer.py -u the.novel.url来更新本地下载的小说或者下载新的小说了

除了小说阅读以外,这个工具还提供了正文提取(extractor)、彩色打印(colorful)与调整图像大小(resizer)的模块,其中彩色打印模块的测试很简单,只需运行python colorful.py即可。

正文提取模块可以这样使用:

from extractor import Extractor, DocBuilder, FsCache
url = 'some url'
text = Extractor(url, cache = FsCache(alwaysCache = True)).get_text()
html = DocBuilder(url, text['title'], text['html'], text['links']).buildhtml()

本软件在Python 2.6/2.7下运行,依赖于lxmlPIL,在判断编码的时候可能会用到chardet。若要使用javascript,则需要Windows平台或者安装PyV8

在我的博客上有相应的正文提取的简单介绍


Novel Reader features:

  • Extract text automatically from HTML pages
  • Detect and remove text ads for novels
  • Render result HTML with customizable style sheets
  • Change scanned pages' width so you can read them comfortablly on mobile phone
  • Print colorful under console for Linux & Windows
  • Export as text or HTML
  • Can extract content from some javascript code

You can use the tool as a novel reader by running python offline_explorer.py -h

  • When you first time run the tool, you can use python offline_explorer.py -r YOUR_LOCAL_NOVEL_DIR -u the.novel.url to set the root directory for the offline novels and also to download the novel denoted by the.novel.url
  • Afterwards, you can use python offline_explorer.py -u the.novel.url to update the local novel or to download a new novel

Recent activity

raph...@gmail.com

Commits by raph...@gmail.com were pushed to raphaelzhang/novel-reader

54d1fdf - 1. 加入穷游的动态加载图片处理 2. 考虑小说名可能为1个字的情况 3. 需要加入完全单个网页离线下载的功能以及成片下载,不过滤的功能
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.