Commits

Kristján Valur Jónsson committed 9a46c58 Merge

Merge documentation changes

Comments (0)

Files changed (9)

pypy/doc/architecture.rst

  * a common translation and support framework for producing
    implementations of dynamic languages, emphasizing a clean
    separation between language specification and implementation
-   aspects.
+   aspects.  We call this the `RPython toolchain`_.
 
  * a compliant, flexible and fast implementation of the Python_ Language 
-   using the above framework to enable new advanced features without having
-   to encode low level details into it.
+   which uses the above toolchain to enable new advanced high-level features 
+   without having to encode the low-level details.
 
-By separating concerns in this way, we intend for our implementation
-of Python - and other dynamic languages - to become robust against almost 
-all implementation decisions, including target platform, memory and 
-threading models, optimizations applied, up to to the point of being able to
-automatically *generate* Just-in-Time compilers for dynamic languages.
-
-Conversely, our implementation techniques, including the JIT compiler 
-generator, should become robust against changes in the languages 
-implemented. 
-
+By separating concerns in this way, our implementation
+of Python - and other dynamic languages - is able to automatically
+generate a Just-in-Time compiler for any dynamic language.  It also
+allows a mix-and-match approach to implementation decisions, including
+many that have historically been outside of a user's control, such as
+target platform, memory and 
+threading models, garbage collection strategies, and optimizations applied, 
+including whether or not to have a JIT in the first place.
 
 High Level Goals
 =============================
 -----------------------------------------------
 
 Traditionally, language interpreters are written in a target platform language
-like C/Posix, Java or C#.  Each such implementation fundamentally provides 
-a mapping from application source code to the target environment.  One of 
-the goals of the "all-encompassing" environments, like the .NET framework
+such as C/Posix, Java or C#.  Each implementation provides 
+a fundamental mapping between application source code and the target 
+environment.  One of 
+the goals of the "all-encompassing" environments, such as the .NET framework
 and to some extent the Java virtual machine, is to provide standardized
 and higher level functionalities in order to support language implementers
 for writing language implementations. 
 PyPy is experimenting with a more ambitious approach.  We are using a
 subset of the high-level language Python, called RPython_, in which we
 write languages as simple interpreters with few references to and
-dependencies on lower level details.  Our translation framework then
+dependencies on lower level details.  The `RPython toolchain`_
 produces a concrete virtual machine for the platform of our choice by
 inserting appropriate lower level aspects.  The result can be customized
 by selecting other feature and platform configurations.
 Our goal is to provide a possible solution to the problem of language
 implementers: having to write ``l * o * p`` interpreters for ``l``
 dynamic languages and ``p`` platforms with ``o`` crucial design
-decisions.  PyPy aims at having any one of these parameters changeable
-independently from each other:
+decisions.  PyPy aims at making it possible to change each of these
+variables independently such that:
 
 * ``l``: the language that we analyze can be evolved or entirely replaced;
 
 The Translation Framework
 -------------------------
 
-The job of the translation tool chain is to translate RPython_ programs
-into an efficient version of that program for one of various target
+The job of the RPython toolchain is to translate RPython_ programs
+into an efficient version of that program for one of the various target
 platforms, generally one that is considerably lower-level than Python.
 
 The approach we have taken is to reduce the level of abstraction of the
 assume an object-oriented model with classes, instances and methods (as,
 for example, the Java and .NET virtual machines do).
 
-The translation tool chain never sees the RPython source code or syntax
+The RPython toolchain never sees the RPython source code or syntax
 trees, but rather starts with the *code objects* that define the
 behaviour of the function objects one gives it as input.  It can be
 considered as "freezing" a pre-imported RPython program into an
   and compiled into an executable.
 
 This process is described in much more detail in the `document about
-the translation process`_ and in the paper `Compiling dynamic language
+the RPython toolchain`_ and in the paper `Compiling dynamic language
 implementations`_.
 
 .. _`control flow graph`: translation.html#the-flow-model
 .. _Annotator: translation.html#the-annotation-pass
 .. _RTyper: rtyper.html#overview
 .. _`various transformations`: translation.html#the-optional-transformations
-.. _`document about the translation process`: translation.html
+.. _`document about the RPython toolchain`: translation.html
 .. _`garbage collector`: garbage_collection.html
-
-
+.. _`RPython toolchain`: translation.html
 .. _`standard interpreter`: 
 .. _`python interpreter`: 
 

pypy/doc/cli-backend.rst

     int_add
     STORE v2
 
-The code produced works correctly but has some inefficiency issue that
+The code produced works correctly but has some inefficiency issues that
 can be addressed during the optimization phase.
 
 The CLI Virtual Machine is fairly expressive, so the conversion
 between PyPy's low level operations and CLI instruction is relatively
-simple: many operations maps directly to the correspondent
+simple: many operations maps directly to the corresponding
 instruction, e.g int_add and sub.
 
 By contrast some instructions do not have a direct correspondent and
 Mapping exceptions
 ------------------
 
-Both RPython and CLI have its own set of exception classes: some of
+Both RPython and CLI have their own set of exception classes: some of
 these are pretty similar; e.g., we have OverflowError,
 ZeroDivisionError and IndexError on the first side and
 OverflowException, DivideByZeroException and IndexOutOfRangeException
 To do so, you can install `Python for .NET`_. Unfortunately, it does
 not work out of the box under Linux.
 
-To make it working, download and unpack the source package of Python
+To make it work, download and unpack the source package of Python
 for .NET; the only version tested with PyPy is the 1.0-rc2, but it
 might work also with others. Then, you need to create a file named
 Python.Runtime.dll.config at the root of the unpacked archive; put the

pypy/doc/coding-guide.rst

-=====================================
+====================================
 Coding Guide
-=====================================
+====================================
 
 .. contents::
 

pypy/doc/configuration.rst

-=============================
+ =============================
 PyPy's Configuration Handling
 =============================
 
 Due to more and more available configuration options it became quite annoying to
 hand the necessary options to where they are actually used and even more
-annoying to add new options. To circumvent these problems the configuration
-management was introduced. There all the necessary options are stored into an
-configuration object, which is available nearly everywhere in the translation
-toolchain and in the standard interpreter so that adding new options becomes
+annoying to add new options. To circumvent these problems configuration
+management was introduced. There all the necessary options are stored in a
+configuration object, which is available nearly everywhere in the `RPython 
+toolchain`_ and in the standard interpreter so that adding new options becomes
 trivial. Options are organized into a tree. Configuration objects can be
 created in different ways, there is support for creating an optparse command
 line parser automatically.
 
+_`RPython toolchain`: translation.html
 
 Main Assumption
 ===============
 
 Configuration objects are produced at the entry points  and handed down to
 where they are actually used. This keeps configuration local but available
-everywhere and consistent. The configuration values can be created using the
-command line (already implemented) or a file (still to be done).
+everywhere and consistent. The configuration values are created using the
+command line.
 
 
 API Details
 The usage of config objects in PyPy
 ===================================
 
-The two large parts of PyPy, the standard interpreter and the translation
+The two large parts of PyPy, the Python interpreter_ and the `RPython 
+toolchain`_ 
 toolchain, have two separate sets of options. The translation toolchain options
 can be found on the ``config`` attribute of all ``TranslationContext``
 instances and are described in `pypy/config/translationoption.py`_. The interpreter options
 are attached to the object space, also under the name ``config`` and are
 described in `pypy/config/pypyoption.py`_.
-
+_interpreter: interpreter.html
 .. include:: _ref.txt

pypy/doc/ctypes-implementation.rst

 Running application examples
 ==============================
 
-`pyglet`_ is known to run. We had some success also with pygame-ctypes which is not maintained anymore and with a snapshot of the experimental pysqlite-ctypes. We will only describe how to run the pyglet examples.
+`pyglet`_ is known to run. We also had some success with pygame-ctypes (which is no longer maintained) and with a snapshot of the experimental pysqlite-ctypes. We will only describe how to run the pyglet examples.
 
 pyglet
 -------

pypy/doc/garbage_collection.rst

 Introduction
 ============
 
-**Warning**: The overview and description of our garbage collection
-strategy and framework is not here but in the `EU-report on this
-topic`_.  The present document describes the specific garbage collectors
-that we wrote in our framework.
+The overview and description of our garbage collection strategy and
+framework can be found in the `EU-report on this topic`_.  Please refer
+to that file for an old, but still more or less accurate, description.
+The present document describes the specific garbage collectors that we
+wrote in our framework.
 
 .. _`EU-report on this topic`: http://codespeak.net/pypy/extradoc/eu-report/D07.1_Massive_Parallelism_and_Translation_Aspects-2007-02-28.pdf
 
 For more details, see the `overview of command line options for
 translation`_.
 
+The following overview is written in chronological order, so the "best"
+GC (which is the default when translating) is the last one below.
+
 .. _`overview of command line options for translation`: config/commandline.html#translation
 
 Mark and Sweep
 More details are available as comments at the start of the source
 in `pypy/rpython/memory/gc/markcompact.py`_.
 
+Minimark GC
+-----------
+
+This is a simplification and rewrite of the ideas from the Hybrid GC.
+It uses a nursery for the young objects, and mark-and-sweep for the old
+objects.  This is a moving GC, but objects may only move once (from
+the nursery to the old stage).
+
+The main difference with the Hybrid GC is that the mark-and-sweep
+objects (the "old stage") are directly handled by the GC's custom
+allocator, instead of being handled by malloc() calls.  The gain is that
+it is then possible, during a major collection, to walk through all old
+generation objects without needing to store a list of pointers to them.
+So as a first approximation, when compared to the Hybrid GC, the
+Minimark GC saves one word of memory per old object.
+
+There are a number of environment variables that can be tweaked to
+influence the GC.  (Their default value should be ok for most usages.)
+You can read more about them at the start of
+`rpython/memory/gc/minimark.py`_.
+
+In more details:
+
+- The small newly malloced objects are allocated in the nursery (case 1).
+  All objects living in the nursery are "young".
+
+- The big objects are always handled directly by the system malloc().
+  But the big newly malloced objects are still "young" when they are
+  allocated (case 2), even though they don't live in the nursery.
+
+- When the nursery is full, we do a minor collection, i.e. we find
+  which "young" objects are still alive (from cases 1 and 2).  The
+  "young" flag is then removed.  The surviving case 1 objects are moved
+  to the old stage. The dying case 2 objects are immediately freed.
+
+- The old stage is an area of memory containing old (small) objects.  It
+  is handled by `rpython/memory/gc/minimarkpage.py`_.  It is organized
+  as "arenas" of 256KB or 512KB, subdivided into "pages" of 4KB or 8KB.
+  Each page can either be free, or contain small objects of all the same
+  size.  Furthermore at any point in time each object location can be
+  either allocated or freed.  The basic design comes from ``obmalloc.c``
+  from CPython (which itself comes from the same source as the Linux
+  system malloc()).
+
+- New objects are added to the old stage at every minor collection.
+  Immediately after a minor collection, when we reach some threshold, we
+  trigger a major collection.  This is the mark-and-sweep step.  It walks
+  over *all* objects (mark), and then frees some fraction of them (sweep).
+  This means that the only time when we want to free objects is while
+  walking over all of them; we never ask to free an object given just its
+  address.  This allows some simplifications and memory savings when
+  compared to ``obmalloc.c``.
+
+- As with all generational collectors, this GC needs a write barrier to
+  record which old objects have a reference to young objects.
+
+- Additionally, we found out that it is useful to handle the case of
+  big arrays specially: when we allocate a big array (with the system
+  malloc()), we reserve a small number of bytes before.  When the array
+  grows old, we use the extra bytes as a set of bits.  Each bit
+  represents 128 entries in the array.  Whenever the write barrier is
+  called to record a reference from the Nth entry of the array to some
+  young object, we set the bit number ``(N/128)`` to 1.  This can
+  considerably speed up minor collections, because we then only have to
+  scan 128 entries of the array instead of all of them.
+
+- As usual, we need special care about weak references, and objects with
+  finalizers.  Weak references are allocated in the nursery, and if they
+  survive they move to the old stage, as usual for all objects; the
+  difference is that the reference they contain must either follow the
+  object, or be set to NULL if the object dies.  And the objects with
+  finalizers, considered rare enough, are immediately allocated old to
+  simplify the design.  In particular their ``__del__`` method can only
+  be called just after a major collection.
+
+- The objects move once only, so we can use a trick to implement id()
+  and hash().  If the object is not in the nursery, it won't move any
+  more, so its id() and hash() are the object's address, cast to an
+  integer.  If the object is in the nursery, and we ask for its id()
+  or its hash(), then we pre-reserve a location in the old stage, and
+  return the address of that location.  If the object survives the
+  next minor collection, we move it there, and so its id() and hash()
+  are preserved.  If the object dies then the pre-reserved location
+  becomes free garbage, to be collected at the next major collection.
+
+
 .. include:: _ref.txt

pypy/doc/getting-started-python.rst

 `CPythons core language regression tests`_ and comes with many of the extension
 modules included in the standard library including ``ctypes``. It can run large
 libraries such as Django_ and Twisted_. There are some small behavioral
-differences to CPython and some missing extensions, for details see `CPython
+differences with CPython and some missing extensions, for details see `CPython
 differences`_.
 
 .. _Django: http://djangoproject.com
    * ``libexpat1-dev`` (for the optional ``pyexpat`` module)
    * ``libssl-dev`` (for the optional ``_ssl`` module)
    * ``libgc-dev`` (for the Boehm garbage collector: only needed when translating with `--opt=0, 1` or `size`)
-   * ``python-sphinx`` (for the optional documentation build)
+   * ``python-sphinx`` (for the optional documentation build.  You need version 1.0.7 or later)
    * ``python-greenlet`` (for the optional stackless support in interpreted mode/testing)
 
 2. Translation is somewhat time-consuming (30 min to
    * ``--stackless``: this produces a pypy-c that includes features
      inspired by `Stackless Python <http://www.stackless.com>`__.
 
-   * ``--gc=boehm|ref|marknsweep|semispace|generation|hybrid``:
+   * ``--gc=boehm|ref|marknsweep|semispace|generation|hybrid|minimark``:
      choose between using
      the `Boehm-Demers-Weiser garbage collector`_, our reference
-     counting implementation or four of own collector implementations
-     (the default depends on the optimization level).
+     counting implementation or one of own collector implementations
+     (the default depends on the optimization level but is usually
+     ``minimark``).
 
 Find a more detailed description of the various options in our `configuration
 sections`_.

pypy/doc/getting-started.rst

 Just the facts 
 ============== 
 
+Download a pre-built PyPy
+-------------------------
+
+The quickest way to start using PyPy is to download a prebuilt binary for your
+OS and architecture.  You can either use the `most recent release`_ or one of
+our `development nightly build`_.  Please note that the nightly builds are not
+guaranteed to be as stable as official releases, use them at your own risk.
+
+.. _`most recent release`: http://pypy.org/download.html
+.. _`development nightly build`: http://buildbot.pypy.org/nightly/trunk/
+
+Installing PyPy
+---------------
+
+PyPy is ready to be executed as soon as you unpack the tarball or the zip
+file, with no need install it in any specific location::
+
+    $ tar xf pypy-1.5-linux.tar.bz2
+
+    $ ./pypy-1.5-linux/bin/pypy
+    Python 2.7.1 (?, Apr 27 2011, 12:44:21)
+    [PyPy 1.5.0-alpha0 with GCC 4.4.3] on linux2
+    Type "help", "copyright", "credits" or "license" for more information.
+    And now for something completely different: ``implementing LOGO in LOGO:
+    "turtles all the way down"''
+    >>>>
+
+If you want to make PyPy available system-wide, you can put a symlink to the
+``pypy`` executable in ``/usr/local/bin``.  It is important to put a symlink
+and not move the binary there, else PyPy would not be able to find its
+library.
+
+If you want to install 3rd party libraries, the most convenient way is to
+install setuptools_, which will bring ``easy_install`` to you::
+
+    $ wget http://peak.telecommunity.com/dist/ez_setup.py
+
+    $ ./pypy-1.5-linux/bin/pypy ez_setup.py
+
+    $ ls ./pypy-1.5-linux/bin/
+    easy_install  easy_install-2.7  pypy
+
+3rd party libraries will be installed in ``pypy-1.5-linux/site-packages``, and
+the scripts in ``pypy-1.5-linux/bin``.
+
+Installing using virtualenv
+---------------------------
+
+It is often convenient to run pypy inside a virtualenv.  To do this
+you need a recent version of virtualenv -- 1.6.1 or greater.  You can
+then install PyPy both from a precompiled tarball or from a mercurial
+checkout::
+
+	# from a tarball
+	$ virtualenv -p /opt/pypy-c-jit-41718-3fb486695f20-linux/bin/pypy my-pypy-env
+
+	# from the mercurial checkout
+	$ virtualenv -p /path/to/pypy/pypy/translator/goal/pypy-c my-pypy-env
+
+Note that bin/python is now a symlink to bin/pypy.
+
+
 Clone the repository
 --------------------
 
-Before you can play with PyPy, you will need to obtain a copy
-of the sources.  This can be done either by `downloading them
-from the download page`_ or by checking them out from the
-repository using mercurial.  We suggest using mercurial if one
-wants to access the current development.
+If you prefer to `compile PyPy by yourself`_, or if you want to modify it, you
+will need to obtain a copy of the sources.  This can be done either by
+`downloading them from the download page`_ or by checking them out from the
+repository using mercurial.  We suggest using mercurial if one wants to access
+the current development.
 
 .. _`downloading them from the download page`: http://pypy.org/download.html
 
 
 where XXXXX is the revision id.
 
+
+.. _`compile PyPy by yourself`: getting-started-python.html
 .. _`our nightly tests:`: http://buildbot.pypy.org/summary?branch=<trunk>
 
-If you want to commit to our repository on bitbucket, you will have to
-install subversion in addition to mercurial.
-
-Installing using virtualenv
----------------------------
-
-It is often convenient to run pypy inside a virtualenv.  To do this
-you need a recent version of virtualenv -- 1.5 or greater.  You can
-then install PyPy both from a precompiled tarball or from a mercurial
-checkout::
-
-	# from a tarball
-	$ virtualenv -p /opt/pypy-c-jit-41718-3fb486695f20-linux/bin/pypy my-pypy-env
-
-	# from the mercurial checkout
-	$ virtualenv -p /path/to/pypy/pypy/translator/goal/pypy-c my-pypy-env
-
-Note that bin/python is now a symlink to bin/pypy.
-
-
 Where to go from here
 ----------------------
 

pypy/doc/translation.rst

-=====================
- PyPy - Translation
-=====================
+=============================
+ PyPy - The RPython Toolchain
+=============================
 
 .. contents::