Commits

Nick Coghlan committed fb3ef3e

Finish off binary data section. First draft of the update complete

Comments (0)

Files changed (1)

Doc/library/stdtypes.rst

 
 
 To clarify the above rules, here's some example Python code,
-equivalent to the builtin hash, for computing the hash of a rational
+equivalent to the built-in hash, for computing the hash of a rational
 number, :class:`float`, or :class:`complex`::
 
 
 
 The only operation that immutable sequence types generally implement that is
 not also implemented by mutable sequence types is support for the :func:`hash`
-builtin.
+built-in.
 
 This support allows immutable sequences, such as :class:`tuple` instances, to
 be used as :class:`dict` keys and stored in :class:`set` and :class:`frozenset`
 * Using a pair of square brackets to denote the empty list: ``[]``
 * Using square brackets, separating items with commas: ``[a]``, ``[a, b, c]``
 * Using a list comprehension: ``[x for x in iterable]``
-* Using the :func:`list` builtin: ``list()`` or ``list(iterable)``
-
-Many other operations also produce lists, including the :func:`sorted` builtin.
+* Using the :func:`list` built-in: ``list()`` or ``list(iterable)``
+
+Many other operations also produce lists, including the :func:`sorted` built-in.
 
 Lists implement all of the :ref:`common <typesseq-common>` and
 :ref:`mutable <typesseq-mutable>` sequence operations. Lists also provide the
 
 Tuples are immutable sequences, typically used to store collections of
 heterogeneous data (such as the 2-tuples produced by the :func:`enumerate`
-builtin). Tuples are also used for cases where an immutable sequence of
+built-in). Tuples are also used for cases where an immutable sequence of
 homogeneous data is needed (such as allowing storage in a :class:`set` or
 :class:`dict` instance).
 
 * Using a pair of parentheses to denote the empty tuple: ``()``
 * Using a trailing comma for a singleton tuple: ``a,`` or ``(a,)``
 * Separating items with commas: ``a, b, c`` or ``(a, b, c)``
-* Using the :func:`tuple` builtin: ``tuple()`` or ``tuple(iterable)``
+* Using the :func:`tuple` built-in: ``tuple()`` or ``tuple(iterable)``
 
 Note that the parentheses are optional (except in the empty tuple case, or
 when needed to avoid syntactic ambiguity). It is actually the comma which
 
 The :class:`range` type represents an immutable sequence of numbers and is
 commonly used for looping a specific number of times. Instances are created
-using the :func:`range` builtin.
+using the :func:`range` built-in.
 
 For positive indices with results between the defined ``start`` and ``stop``
 values, integers within the range are determined by the formula:
 Text Sequence Type --- :class:`str`
 ===================================
 
-.. TODO: clean up this section based on the restructure
-
 .. index::
    object: string
    object: bytes
 including supported escape sequences, and the ``r`` ("raw") prefix that
 disables most escape sequence processing.
 
-There is no mutable string type, but :class:`io.StringIO` can be used to
-efficiently construct strings from multiple fragments.
+Strings may also be created from other objects with the :func:`str` built-in.
+
+Since there is no separate "character" type, indexing a string produces
+strings of length 1. That is, for a non-empty string *s*, ``s[0] == s[0:1]``.
+
+There is also no mutable string type, but :meth:`str.join` or
+:class:`io.StringIO` can be used to efficiently construct strings from
+multiple fragments.
 
 
 .. _string-methods:
 Binary Sequence Types --- :class:`bytes`, :class:`bytearray`, :class:`memoryview`
 =================================================================================
 
-.. TODO: clean up this section based on the restructure
-
-
-Bytes and bytearray objects contain single bytes -- the former is immutable
-while the latter is a mutable sequence.  Bytes objects can be constructed the
-constructor, :func:`bytes`, and from literals; use a ``b`` prefix with normal
-string syntax: ``b'xyzzy'``.  To construct byte arrays, use the
-:func:`bytearray` function.
-
-While string objects are sequences of characters (represented by strings of
-length 1), bytes and bytearray objects are sequences of *integers* (between 0
-and 255), representing the ASCII value of single bytes.  That means that for
-a bytes or bytearray object *b*, ``b[0]`` will be an integer, while
-``b[0:1]`` will be a bytes or bytearray object of length 1.  The
-representation of bytes objects uses the literal format (``b'...'``) since it
-is generally more useful than e.g. ``bytes([50, 19, 100])``.  You can always
-convert a bytes object into a list of integers using ``list(b)``.
-
-Also, while in previous Python versions, byte strings and Unicode strings
-could be exchanged for each other rather freely (barring encoding issues),
-strings and bytes are now completely separate concepts.  There's no implicit
-en-/decoding if you pass an object of the wrong type.  A string always
-compares unequal to a bytes or bytearray object.
-
+.. index::
+   object: bytes
+   object: bytearray
+   object: memoryview
+   module: array
+
+The core built-in types for manipulating binary data are :class:`bytes` and
+:class:`bytearray`. They are supported by :class:`memoryview` which uses
+the buffer protocol to access the memory of other binary objects without
+needing to make a copy.
+
+The :mod:`array` module supports efficient storage of basic data types like
+32-bit integers and IEEE754 double-precision floating values.
+
+.. _typebytes:
+
+Bytes
+-----
+
+.. index:: object: bytes
+
+Bytes objects are immutable sequences of single bytes. Since many major
+binary protocols are based on the ASCII text encoding, bytes objects offer
+several methods that are only valid when working with ASCII compatible
+data and are closely related to string objects in a variety of other ways.
+
+Firstly, the syntax for bytes literals is largely the same as that for string
+literals, except that a ``b`` prefix is added:
+
+* Single quotes: ``b'still allows embedded "double" quotes'``
+* Double quotes: ``b"still allows embedded 'single' quotes"``.
+* Triple quoted: ``b'''3 single quotes'''``, ``b"""3 double quotes"""``
+
+Only ASCII characters are permitted in bytes literals (regardless of the
+declared source code encoding). Any binary values over 127 must be entered
+into bytes literals using the appropriate escape sequence.
+
+As with string literals, bytes literals may also use a ``r`` prefix to disable
+processing of escape sequences. See :ref:`strings` for more about the various
+forms of bytes literal, including supported escape sequences.
+
+While bytes literals and representations are based on ASCII text, bytes
+objects actually behave like immutable sequences of integers, with each
+value in the sequence restricted such that ``0 <= x < 256`` (attempts to
+violate this restriction will trigger :exc:`ValueError`. This is done
+deliberately to emphasise that while many binary formats include ASCII based
+elements and can be usefully manipulated with some text-oriented algorithms,
+this is not generally the case for arbitrary binary data (blindly applying
+text processing algorithms to binary data formats that are not ASCII
+compatible will usually lead to data corruption).
+
+In addition to the literal forms, bytes objects can be created in a number of
+other ways:
+
+* A zero-filled bytes object of a specified length: ``bytes(10)``
+* From an iterable of integers: ``bytes(range(20))``
+* Copying existing binary data via the buffer protocol:  ``bytes(obj)``
+
+Since bytes objects are sequences of integers, for a bytes object *b*,
+``b[0]`` will be an integer, while ``b[0:1]`` will be a bytes object of
+length 1.  (This contrasts with text strings, where both indexing and
+slicing will produce a string of length 1)
+
+The representation of bytes objects uses the literal format (``b'...'``)
+since it is often more useful than e.g. ``bytes([46, 46, 46])``.  You can
+always convert a bytes object into a list of integers using ``list(b)``.
+
+Note for Python 2.x users: In the Python 2.x series, a variety of implicit
+conversions between 8-bit strings (the closest thing 2.x offers to a built-in
+binary data type) and Unicode strings were permitted. This was a backwards
+compatibility workaround to account for the fact that Python originally only
+supported 8-bit text, and Unicode text was a later addition. In Python 3.x,
+those implicit conversions are gone - conversions between 8-bit binary data
+and Unicode text must be explicit, and bytes and string objects will always
+compare unequal.
+
+
+.. _typebytearray:
+
+Bytearray Objects
+-----------------
 
 .. index:: object: bytearray
 
-List and bytearray objects support additional operations that allow in-place
-modification of the object.  Other mutable sequence types (when added to the
-language) should also support these operations.  Strings and tuples are
-immutable sequence types: such objects cannot be modified once created. The
-following operations are defined on mutable sequence types (where *x* is an
-arbitrary object).
-
-Note that while lists allow their items to be of any type, bytearray object
-"items" are all integers in the range 0 <= x < 256.
+:class:`bytearray` objects are a mutable counterpart to :class:`bytes`
+objects. There is no dedicated literal syntax for bytearray objects, instead
+they are always created by calling the constructor:
+
+* Creating an empty instance: ``bytearray()``
+* Creating a zero-filled instance with a given length: ``bytearray(10)``
+* From an iterable of integers: ``bytearray(range(20))``
+* Copying existing binary data via the buffer protocol:  ``bytearray(b'Hi!)``
+
+As bytearray objects are mutable, they support the
+:ref:`mutable <typesseq-mutable>` sequence operations in addition to the
+common bytes and bytearray operations described in :ref:`bytes-methods`.
 
 
 .. _bytes-methods:
 
-Bytes and Byte Array Methods
-----------------------------
+Bytes and Bytearray Operations
+------------------------------
 
 .. index:: pair: bytes; methods
            pair: bytearray; methods
 
-Bytes and bytearray objects, being "strings of bytes", have all methods found on
-strings, with the exception of :func:`encode`, :func:`format` and
-:func:`isidentifier`, which do not make sense with these types.  For converting
-the objects to strings, they have a :func:`decode` method.
-
-Wherever one of these methods needs to interpret the bytes as characters
-(e.g. the :func:`is...` methods), the ASCII character set is assumed.
-
-.. versionadded:: 3.3
-   The functions :func:`count`, :func:`find`, :func:`index`,
-   :func:`rfind` and :func:`rindex` have additional semantics compared to
-   the corresponding string functions: They also accept an integer in
-   range 0 to 255 (a byte) as their first argument.
+Both bytes and bytearray objects support the :ref:`common <typesseq-common>`
+sequence operations. They interoperate not just with operands of the same
+type, but with any object that supports the
+:ref:`buffer protocol <bufferobjects>`. Due to this flexibility, they can be
+freely mixed in operations without causing errors. However, the return type
+of the result may depend on the order of operands.
+
+Due to the common use of ASCII text as the basis for binary protocols, bytes
+and bytearray objects provide almost all methods found on text strings, with
+the exceptions of
+* :meth:`str.encode` (which converts text strings to bytes objects)
+* :meth:`str.format` and :meth:`str.format_map` (which are used to format
+  text for display to users)
+* :meth:`str.isidentifier`, :meth:`str.isnumeric`, :meth:`str.isdecimal`,
+  :meth:`str.isprintable` (which are used to check various properties of
+  text strings which are not typically applicable to binary protocols).
+
+All other string methods are supported, although sometimes with slight
+differences in functionality and semantics (as described below).
 
 .. note::
 
    The methods on bytes and bytearray objects don't accept strings as their
    arguments, just as the methods on strings don't accept bytes as their
-   arguments.  For example, you have to write ::
+   arguments.  For example, you have to write::
 
       a = "abc"
       b = a.replace("a", "f")
 
-   and ::
+   and::
 
       a = b"abc"
       b = a.replace(b"a", b"f")
 
+Whenever a bytes or bytearray method needs to interpret the bytes as
+characters (e.g. the :meth:`is...` methods, :meth:`split`, :meth:`strip`),
+the ASCII character set is assumed (text strings use Unicode semantics).
+
+.. note::
+   Using these ASCII based methods to manipulate binary data that is not
+   stored in an ASCII based format may lead to data corruption.
+
+The search operations (:keyword:`in`, :meth:`count`, :meth:`find`,
+:meth:`index`, :meth:`rfind` and :meth:`rindex`) all accept both integers
+in the range 0 to 255 as well bytes and byte array sequences.
+
+.. versionchanged:: 3.3
+   All of the search methods accept an integer in range 0 to 255 (a byte) as
+   their first argument, not just containment testing.
+
+
+Each bytes and bytearray instance provides a :meth:`decode` convenience
+method that is the inverse of "meth:`str.encode`:
 
 .. method:: bytes.decode(encoding="utf-8", errors="strict")
             bytearray.decode(encoding="utf-8", errors="strict")
    .. versionchanged:: 3.1
       Added support for keyword arguments.
 
-
-The bytes and bytearray types have an additional class method:
+Since 2 hexadecimal digits correspond precisely to a single byte, hexadecimal
+numbers are a commonly used format for describing binary data. Accordingly,
+the bytes and bytearray types have an additional class method to read data in
+that format:
 
 .. classmethod:: bytes.fromhex(string)
                  bytearray.fromhex(string)
    decoding the given string object.  The string must contain two hexadecimal
    digits per byte, spaces are ignored.
 
-   >>> bytes.fromhex('f0 f1f2  ')
-   b'\xf0\xf1\xf2'
+   >>> bytes.fromhex('2Ef0 F1f2  ')
+   b'.\xf0\xf1\xf2'
 
 
 The maketrans and translate methods differ in semantics from the versions
 
 .. _typememoryview:
 
-memoryview type
----------------
+Memory Views
+------------
 
 :class:`memoryview` objects allow Python code to access the internal data
 of an object that supports the :ref:`buffer protocol <bufferobjects>` without