:mod:urllib2 --- extensible library for opening URLs

Note

The :mod:urllib2 module has been split across several modules in Python 3.0 named :mod:urllib.request and :mod:urllib.error. The :term:2to3 tool will automatically adapt imports when converting your sources to 3.0.

The :mod:urllib2 module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world --- basic and digest authentication, redirections, cookies and more.

The :mod:urllib2 module defines the following functions:

The following exceptions are raised as appropriate:

The following classes are provided:

This class is an abstraction of a URL request.

url should be a string containing a valid URL.

data may be a string specifying additional data to send to the server, or None if no such data is needed. Currently HTTP requests are the only ones that use data; the HTTP request will be a POST instead of a GET when the data parameter is provided. data should be a buffer in the standard :mimetype:application/x-www-form-urlencoded format. The :func:urllib.urlencode function takes a mapping or sequence of 2-tuples and returns a string in this format.

headers should be a dictionary, and will be treated as if :meth:add_header was called with each key and value as arguments. This is often used to "spoof" the User-Agent header, which is used by a browser to identify itself -- some HTTP servers only allow requests coming from common browsers as opposed to scripts. For example, Mozilla Firefox may identify itself as "Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11", while :mod:urllib2's default user agent string is "Python-urllib/2.6" (on Python 2.6).

The final two arguments are only of interest for correct handling of third-party HTTP cookies:

origin_req_host should be the request-host of the origin transaction, as defined by RFC 2965. It defaults to cookielib.request_host(self). This is the host name or IP address of the original request that was initiated by the user. For example, if the request is for an image in an HTML document, this should be the request-host of the request for the page containing the image.

unverifiable should indicate whether the request is unverifiable, as defined by RFC 2965. It defaults to False. An unverifiable request is one whose URL the user did not have the option to approve. For example, if the request is for an image in an HTML document, and the user had no option to approve the automatic fetching of the image, this should be true.

The :class:OpenerDirector class opens URLs via :class:BaseHandlers chained together. It manages the chaining of handlers, and recovery from errors.

This is the base class for all registered handlers --- and handles only the simple mechanics of registration.

A class which defines a default handler for HTTP error responses; all responses are turned into :exc:HTTPError exceptions.

A class to handle redirections.

A class to handle HTTP Cookies.

Cause requests to go through a proxy. If proxies is given, it must be a dictionary mapping protocol names to URLs of proxies. The default is to read the list of proxies from the environment variables :envvar:<protocol>_proxy. If no proxy environment variables are set, in a Windows environment, proxy settings are obtained from the registry's Internet Settings section and in a Mac OS X environment, proxy information is retrieved from the OS X System Configuration Framework.

To disable autodetected proxy pass an empty dictionary.

Keep a database of (realm, uri) -> (user, password) mappings.

Keep a database of (realm, uri) -> (user, password) mappings. A realm of None is considered a catch-all realm, which is searched if no other realm fits.

This is a mixin class that helps with HTTP authentication, both to the remote host and to a proxy. password_mgr, if given, should be something that is compatible with :class:HTTPPasswordMgr; refer to section :ref:http-password-mgr for information on the interface that must be supported.

Handle authentication with the remote host. password_mgr, if given, should be something that is compatible with :class:HTTPPasswordMgr; refer to section :ref:http-password-mgr for information on the interface that must be supported.

Handle authentication with the proxy. password_mgr, if given, should be something that is compatible with :class:HTTPPasswordMgr; refer to section :ref:http-password-mgr for information on the interface that must be supported.

This is a mixin class that helps with HTTP authentication, both to the remote host and to a proxy. password_mgr, if given, should be something that is compatible with :class:HTTPPasswordMgr; refer to section :ref:http-password-mgr for information on the interface that must be supported.

Handle authentication with the remote host. password_mgr, if given, should be something that is compatible with :class:HTTPPasswordMgr; refer to section :ref:http-password-mgr for information on the interface that must be supported.

Handle authentication with the proxy. password_mgr, if given, should be something that is compatible with :class:HTTPPasswordMgr; refer to section :ref:http-password-mgr for information on the interface that must be supported.

A class to handle opening of HTTP URLs.

A class to handle opening of HTTPS URLs.

Open local files.

Open FTP URLs.

Open FTP URLs, keeping a cache of open FTP connections to minimize delays.

A catch-all class to handle unknown URLs.

Request Objects

The following methods describe all of :class:Request's public interface, and so all must be overridden in subclasses.

OpenerDirector Objects

:class:OpenerDirector instances have the following methods:

OpenerDirector objects open URLs in three stages:

The order in which these methods are called within each stage is determined by sorting the handler instances.

1. Every handler with a method named like :samp:{protocol}_request has that method called to pre-process the request.

2. Handlers with a method named like :samp:{protocol}_open are called to handle the request. This stage ends when a handler either returns a non-:const:None value (ie. a response), or raises an exception (usually :exc:URLError). Exceptions are allowed to propagate.

In fact, the above algorithm is first tried for methods named :meth:default_open. If all such methods return :const:None, the algorithm is repeated for methods named like :samp:{protocol}_open. If all such methods return :const:None, the algorithm is repeated for methods named :meth:unknown_open.

Note that the implementation of these methods may involve calls of the parent :class:OpenerDirector instance's :meth:~OpenerDirector.open and :meth:~OpenerDirector.error methods.

3. Every handler with a method named like :samp:{protocol}_response has that method called to post-process the response.

BaseHandler Objects

:class:BaseHandler objects provide a couple of methods that are directly useful, and others that are meant to be used by derived classes. These are intended for direct use:

The following members and methods should only be used by classes derived from :class:BaseHandler.

Note

The convention has been adopted that subclasses defining :meth:protocol_request or :meth:protocol_response methods are named :class:\*Processor; all others are named :class:\*Handler.

HTTPRedirectHandler Objects

Note

Some HTTP redirections require action from this module's client code. If this is the case, :exc:HTTPError is raised. See RFC 2616 for details of the precise meanings of the various redirection codes.

:class:HTTPCookieProcessor instances have one attribute:

ProxyHandler Objects

These methods are available on :class:HTTPPasswordMgr and :class:HTTPPasswordMgrWithDefaultRealm objects.

CacheFTPHandler Objects

:class:CacheFTPHandler objects are :class:FTPHandler objects with the following additional methods:

Examples

This example gets the python.org main page and displays the first 100 bytes of it:

>>> import urllib2
>>> f = urllib2.urlopen('http://www.python.org/')
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<?xml-stylesheet href="./css/ht2html


Here we are sending a data-stream to the stdin of a CGI and reading the data it returns to us. Note that this example will only work when the Python installation supports SSL.

>>> import urllib2
>>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi',
...                       data='This data is passed to stdin of the CGI')
>>> f = urllib2.urlopen(req)
Got Data: "This data is passed to stdin of the CGI"


The code for the sample CGI used in the above example is:

#!/usr/bin/env python
import sys
print 'Content-type: text-plain\n\nGot Data: "%s"' % data


Use of Basic HTTP Authentication:

import urllib2
# Create an OpenerDirector with support for Basic HTTP Authentication...
auth_handler = urllib2.HTTPBasicAuthHandler()
user='klem',
opener = urllib2.build_opener(auth_handler)
# ...and install it globally so it can be used with urlopen.
urllib2.install_opener(opener)


:func:build_opener provides many handlers by default, including a :class:ProxyHandler. By default, :class:ProxyHandler uses the environment variables named <scheme>_proxy, where <scheme> is the URL scheme involved. For example, the :envvar:http_proxy environment variable is read to obtain the HTTP proxy's URL.

This example replaces the default :class:ProxyHandler with one that uses programmatically-supplied proxy URLs, and adds proxy authorization support with :class:ProxyBasicAuthHandler.

proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
proxy_auth_handler = urllib2.ProxyBasicAuthHandler()

opener = urllib2.build_opener(proxy_handler, proxy_auth_handler)
# This time, rather than install the OpenerDirector, we use it directly:


Use the headers argument to the :class:Request constructor, or:

import urllib2
req = urllib2.Request('http://www.example.com/')

:class:OpenerDirector automatically adds a :mailheader:User-Agent header to every :class:Request. To change this:
import urllib2

Also, remember that a few standard headers (:mailheader:Content-Length, :mailheader:Content-Type and :mailheader:Host) are added when the :class:Request is passed to :func:urlopen (or :meth:OpenerDirector.open).