python-clinic / Doc / library / pyexpat.rst

:mod:`xml.parsers.expat` --- Fast XML parsing using Expat


The :mod:`pyexpat` module is not secure against maliciously constructed data. If you need to parse untrusted or unauthenticated data see :ref:`xml-vulnerabilities`.

The :mod:`xml.parsers.expat` module is a Python interface to the Expat non-validating XML parser. The module provides a single extension type, :class:`xmlparser`, that represents the current state of an XML parser. After an :class:`xmlparser` object has been created, various attributes of the object can be set to handler functions. When an XML document is then fed to the parser, the handler functions are called for the character data and markup in the XML document.

This module uses the :mod:`pyexpat` module to provide access to the Expat parser. Direct use of the :mod:`pyexpat` module is deprecated.

This module provides one exception and one type object:

The :mod:`xml.parsers.expat` module contains two functions:

XMLParser Objects

:class:`xmlparser` objects have the following methods:

:class:`xmlparser` objects have the following attributes:

The following attributes contain values relating to the most recent error encountered by an :class:`xmlparser` object, and will only have correct values once a call to :meth:`Parse` or :meth:`ParseFile` has raised a :exc:`xml.parsers.expat.ExpatError` exception.

The following attributes contain values relating to the current parse location in an :class:`xmlparser` object. During a callback reporting a parse event they indicate the location of the first of the sequence of characters that generated the event. When called outside of a callback, the position indicated will be just past the last parse event (regardless of whether there was an associated callback).

Here is the list of handlers that can be set. To set a handler on an :class:`xmlparser` object o, use o.handlername = func. handlername must be taken from the following list, and func must be a callable object accepting the correct number of arguments. The arguments are all strings, unless otherwise stated.

ExpatError Exceptions

:exc:`ExpatError` exceptions have a number of interesting attributes:


The following program defines three handlers that just print out their arguments.

import xml.parsers.expat

# 3 handler functions
def start_element(name, attrs):
    print('Start element:', name, attrs)
def end_element(name):
    print('End element:', name)
def char_data(data):
    print('Character data:', repr(data))

p = xml.parsers.expat.ParserCreate()

p.StartElementHandler = start_element
p.EndElementHandler = end_element
p.CharacterDataHandler = char_data

p.Parse("""<?xml version="1.0"?>
<parent id="top"><child1 name="paul">Text goes here</child1>
<child2 name="fred">More text</child2>
</parent>""", 1)

The output from this program is:

Start element: parent {'id': 'top'}
Start element: child1 {'name': 'paul'}
Character data: 'Text goes here'
End element: child1
Character data: '\n'
Start element: child2 {'name': 'fred'}
Character data: 'More text'
End element: child2
Character data: '\n'
End element: parent

Content Model Descriptions

Content modules are described using nested tuples. Each tuple contains four values: the type, the quantifier, the name, and a tuple of children. Children are simply additional content module descriptions.

The values of the first two fields are constants defined in the :mod:`xml.parsers.expat.model` module. These constants can be collected in two groups: the model type group and the quantifier group.

The constants in the model type group are:

The constants in the quantifier group are:

Expat error constants

The following constants are provided in the :mod:`xml.parsers.expat.errors` module. These constants are useful in interpreting some of the attributes of the :exc:`ExpatError` exception objects raised when an error has occurred. Since for backwards compatibility reasons, the constants' value is the error message and not the numeric error code, you do this by comparing its :attr:`code` attribute with :samp:`[errors.XML_ERROR_{CONSTANT_NAME}]`.

The errors module has the following attributes:


[1]The encoding string included in XML output should conform to the appropriate standards. For example, "UTF-8" is valid, but "UTF8" is not. See and .