pomp / docs / quickstart.rst


Pomp is fun to use, incredibly easy for basic applications.

A Minimal Application

For a minimal application all you need is to define you crawler by inherit :class:`BaseCrawler`:

import re
from pomp.core.base import BaseCrawler, BasePipeline
from pomp.contrib import SimpleDownloader

python_sentence_re = re.compile('[\w\s]{0,}python[\s\w]{0,}', re.I | re.M)

class MyCrawler(BaseCrawler):
    """Extract all sentences with `python` word"""
    ENTRY_URL = '' # entry point

    def extract_items(self, response):
        for i in python_sentence_re.findall(response.body.decode('utf-8')):
            yield i.strip()

    def next_url(self, response):
        return None # one page crawler, stop crawl

class PrintPipeline(BasePipeline):
    def process(self, item):
        print('Sentence:', item)

if __name__ == '__main__':
    from pomp.core.engine import Pomp

    pomp = Pomp(


Item pipelines

Custom downloader

Downloader middleware

Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.