Wiki

Clone wiki

Pynav / Pynav-0.6

Pynav 0.6

Pynav 0.6 is no longer maintained, you should use the new version: Pynav-0.7 branch

Introduction

Pynav is a Python programmatic web browser to fetch data and test web sites.
Bug reporting and features asking are welcome.
Pynav on pypi : http://pypi.python.org/pypi/pynav/.

Features

  • Post authentication
  • User agent support
  • Automatic cookie handling
  • HTTP Basic Authentication support
  • HTTPS support
  • Proxy support
  • Timeout support
  • Reg exp searching
  • Links fetching with reg exp filter
  • History (pages, posts and responses)
  • Save and load history from a file and replay navigation
  • Random sleep time beetween pages
  • Errors handling
  • Document type and server headers information, real url (in case of redirection)

TODO

  • Best files handle : Read header of the http server response to get the file type and the real file name.
  • File upload support in post values.

Licence

GNU General Public License (GPL)

Installation

Requirements

Minimum Python version: 2.5

Works on Python 2.6

Works on Android with ASE

Latest stable version with pip

    $ pip install pynav

Latest stable version with easy_install

    $ easy_install pynav

or a specific version:

    $ easy_install http://bitbucket.org/sloft/pynav/downloads/pynav-0.6-py2.6.egg

Latest stable version from tar.gz archive

Download pynav-0.6.5.tar.gz and extract it:

    $ wget http://bitbucket.org/sloft/pynav/downloads/pynav-0.6.5.tar.gz
    $ tar xzf pynav-0.6.5.tar.gz

Go into the extracted directory and run setup.py:

    $ cd pynav-0.6.5/
    $ python setup.py install

Dev version from hg source

$ hg clone https://bitbucket.org/sloft/pynav/

Examples

Post authentication, images and files downloading with simple filter or regular expression

from pynav import Pynav

def test1():
    p = Pynav()
    p.go('http://www.example.com/connexion', {'login' : 'toto', 'pass' : 'toto'})
    
    if p.find('My profile'):
        print 'connected into profile area'

    p.go('http://www.example.com/photos/')

    for image in p.get_all_images('.png'):
        p.download(image, '/tmp/images/')

    for link in n.get_all_links('download_part.*?\.zip'):
        p.download(link)

-

Using HTTP Basic authentication, post authentication and cookie check

def test2():
    p = Pynav(timeout=5)
    p.auto_referer=True
    p.set_http_auth('http://example.com', 'login', 'pass')
    p.go('http://example.com/private/')

    p.go('http://www.example.com/private/connexion', {'login' : 'toto', 'pass' : 'toto'})
    if p.cookie_exists('id'):
        print 'Connected

    p.set_page_delay(2, 4)

    for link in p.get_all_links('news'):
        print link
        p.go(link)

    for page in p.history:
        print page['url'], ':', page['post']

-

Using proxy

def test3():
    p = Pynav(timeout=6, proxy='http://www.example.com:3128/')
    p.verbose=True
    p.referer = 'http://www.example.com'
    page = p.go('http://www.example.com/tracks')
    print p.strip_tags(page)

Updated