Commits

Maciej Fijalkowski  committed 2059ff1

Kill the distribution thoughts (they are a bit random admitedly)

  • Participants
  • Parent commits 48eeb9b
  • Branches documentation-cleanup

Comments (0)

Files changed (5)

File pypy/doc/discussion/distribution-implementation.rst

-=====================================================
-Random implementation details of distribution attempt
-=====================================================
-
-.. contents::
-.. sectnum::
-
-This document attempts to broaden this `dist thoughts`_.
-
-.. _`dist thoughts`: distribution-newattempt.html
-
-Basic implementation:
----------------------
-
-First we do split objects into value-only primitives (like int) and other.
-Basically immutable builtin types which cannot contain user-level objects
-(int, float, long, str, None, etc.) will be always transferred as value-only
-objects (having no states etc.). The every other object (user created classes,
-instances, modules, lists, tuples, etc. etc.) are always executed by reference.
-(Of course if somebody wants to ie. copy the instance, he can marshal/pickle
-this to string and send, but it's outside the scope of this attempt). Special
-case might be immutable data structure (tuple, frozenset) containing simple
-types (this becomes simple type).
-
-XXX: What to do with code types? Marshalling them and sending seems to have no
-sense. Remote execution? Local execution with remote f_locals and f_globals?
-
-Every remote object has got special class W_RemoteXXX where XXX is interp-level
-class implementing this object. W_RemoteXXX implements all the operations
-by using special app-level code that sends method name and arguments over the wire
-(arguments might be either simple objects which are simply send over the app-level
-code or references to local objects).
-
-So the basic scheme would look like::
-
-    remote_ref = remote("Object reference")
-    remote_ref.any_method()
-
-``remote_ref`` in above example looks like normal python object to user,
-but is implemented differently (W_RemoteXXX), and uses app-level proxy
-to forward each interp-level method call.
-
-Abstraction layers:
--------------------
-
-In this section we define remote side as a side on which calls are
-executed and local side is the one on which calls are run.
-
-* Looking from the local side, first thing that we see is object
-  which looks like normal object (has got the same interp-level typedef)
-  but has got different implementation. Basically this is the shallow copy
-  of remote object (however you define shallow, it's up to the code which
-  makes the copy. Basically the copy which can be marshalled or send over
-  the wire or saved for future purpose). This is W_RemoteXXX where XXX is
-  real object name. Some operations on that object requires accessing remote
-  side of the object, some might not need such (for example remote int
-  is totally the same int as local one, it could not even be implemented
-  differently).
-
-* For every interp-level operation, which accesses internals that are not
-  accessible at the local side, (basically all attribute accesses which
-  are accessing things that are subclasses of W_Object) we provide special
-  W_Remote version, which downloads necessary object when needed
-  (if accessed). This is the same as normal W_RemoteXXX (we know the type!)
-  but not needed yet.
-
-* From the remote point of view, every exported object which needs such
-  has got a local appropriate storage W_LocalXXX where XXX is a type 
-  by which it could be accessed from a wire.
-
-The real pain:
---------------
-
-For every attribute access when we get W_RemoteXXX, we need to check
-the download flag - which sucks a bit. (And we have to support it somehow
-in annotator, which sucks a lot). The (some) idea is to wrap all the methods
-with additional checks, but that's both unclear and probably not necessary.
-
-XXX If we can easily change underlying implementation of an object, than
-this might become way easier. Right now I'll try to have it working and
-thing about RPython later.
-
-App-level remote tool:
-----------------------
-
-For purpose of app-level tool which can transfer the data (well, socket might
-be enough, but suppose I want to be more flexible), I would use `py.execnet`_,
-probably using some of the Armin's hacks to rewrite it using greenlets instead
-of threads.
-
-.. _`py.execnet`: http://codespeak.net/execnet/

File pypy/doc/discussion/distribution-newattempt.rst

-Distribution:
-=============
-
-This is outcome of Armin's and Samuele's ideas and our discussion, 
-kept together by fijal.
-
-The communication layer:
-========================
-
-Communication layer is the layer which takes care of explicit
-communication. Suppose we do have two (or more) running interpreters
-on different machines or in different processes. Let's call it *local side*
-(the one on which we're operating) and *remote side*.
-
-What we want to achieve is to have a transparent enough layer on local
-side, which does not allow user to tell the objects local and remote apart
-(despite __pypy__.internal_repr, which I would consider cheating).
-
-Because in pypy we have possibility to have different implementations
-for types (even builtin ones), we can use that mechanism to implement
-our simple RMI.
-
-The idea is to provide thin layer for accessing remote object, lays as
-different implementation for any possible object. So if you perform any
-operation on an object locally, which is really a remote object, you
-perform all method lookup and do a call on it. Than proxy object
-redirects the call to app-level code (socket, execnet, whatever) which
-calls remote interpreter with given parameters. It's important that we
-can always perform such a call, even if types are not marshallable, because
-we can provide remote proxies of local objects to remote side in that case.
-
-XXX: Need to explain in a bit more informative way.
-
-Example:
---------
-
-Suppose we do have ``class A`` and instance ``a = A()`` on remote side
-and we want to access this from a local side. We make an object of type
-``object`` and we do copy
-``__dict__`` keys with values, which correspond to objects on the remote
-side (have the same type to user) but they've got different implementation.
-(Ie. method calling will look like quite different).
-
-Even cooler example:
---------------------
-
-Reminding hpk's example of 5-liner remote file server. With this we make::
-
-  f = remote_side.import(open)
-  f("file_name").read()
-
-Implementation plans:
----------------------
-
-We need:
-
-* app-level primitives for having 'remote proxy' accessible
-
-* some "serialiser" which is not truly serialising stuff, but making
-  sure communication will go.
-
-* interp-level proxy object which emulates every possible object which
-  delegates operations to app-level primitive proxy.
-
-* to make it work....

File pypy/doc/discussion/distribution-roadmap.rst

-Distribution:
-=============
-
-Some random thoughts about automatic (or not) distribution layer.
-
-What I want to achieve is to make clean approach to perform
-distribution mechanism with virtually any distribution heuristic.
-
-First step - RPython level:
----------------------------
-
-First (simplest) step is to allow user to write RPython programs with
-some kind of remote control over program execution. For start I would
-suggest using RMI (Remote Method Invocation) and remote object access
-(in case of low level it would be struct access). For the simplicity
-it will make some sense to target high-level platform at the beginning
-(CLI platform seems like obvious choice), which provides more primitives
-for performing such operations. To make attempt easier, I'll provide
-some subset of type system to be serializable which can go as parameters
-to such a call.
-
-I take advantage of several assumptions:
-
-* globals are constants - this allows us to just run multiple instances
-  of the same program on multiple machines and perform RMI.
-
-* I/O is explicit - this makes GIL problem not that important. XXX: I've got
-  to read more about GIL to notice if this is true.
-
-Second step - doing it a little bit more automatically:
--------------------------------------------------------
-
-The second step is to allow some heuristic to live and change
-calls to RMI calls. This should follow some assumptions (which may vary,
-regarding implementation):
-
-* Not to move I/O to different machine (we can track I/O and side-effects
-  in RPython code).
-
-* Make sure all C calls are safe to transfer if we want to do that (this
-  depends on probably static API declaration from programmer "I'm sure this
-  C call has no side-effects", we don't want to check it in C) or not transfer
-  them at all.
-
-* Perform it all statically, at the time of program compilation.
-
-* We have to generate serialization methods for some classes, which 
-  we want to transfer (Same engine might be used to allow JSON calls in JS
-  backend to transfer arbitrary python object).
-
-Third step - Just-in-time distribution:
----------------------------------------
-
-The biggest step here is to provide JIT integration into distribution
-system. This should allow to make it really useful (probably compile-time
-distribution will not work for example for whole Python interpreter, because
-of too huge granularity). This is quite unclear for me how to do that
-(JIT is not complete and I don't know too much about it). Probably we
-take JIT information about graphs and try to feed it to heuristic in some way
-to change the calls into RMI.
-
-Problems to fight with:
------------------------
-
-Most problems are to make mechanism working efficiently, so:
-
-* Avoid too much granularity (copying a lot of objects in both directions
-  all the time)
-
-* Make heuristic not eat too much CPU time/memory and all of that.
-
-* ...

File pypy/doc/discussion/distribution.rst

-.. XXX fijal, can this be killed?
-
-===================================================
-(Semi)-transparent distribution of RPython programs
-===================================================
-
-Some (rough) ideas how I see distribution
------------------------------------------
-
-The main point about it, is to behave very much like JIT - not
-to perform distribution on Python source code level, but instead
-perform distribution of RPython source, and eventually perform
-distribution of interpreter at the end.
-
-This attempt gives same advantages as off-line JIT (any RPython based
-interpreter, etc.) and gives nice field to play with different
-distribution heuristics. This also makes eventually nice possibility 
-of integrating JIT with distribution, thus allowing distribution
-heuristics to have more information that they might have otherwise and
-as well with specializing different nodes in performing different tasks.
-
-Flow graph level
-----------------
-
-Probably the best place to perform distribution attempt is to insert
-special graph distributing operations into low-level graphs (either lltype
-or ootype based), which will allow distribution heuristic to decide
-on entrypoint to block/graph/some other structure??? what variables/functions
-are accessed inside some part and if it's worth transferring it over wire.
-
-Backend level
--------------
-
-Backends will need explicit support for distribution of any kind. Basically
-it should be possible for backend to remotely call block/graph/structure
-in any manner (it should strongly depend on backend possibilities).

File pypy/doc/discussions.rst

 
 .. toctree::
 	
-	discussion/distribution-implementation.rst
-	discussion/distribution-newattempt.rst
-	discussion/distribution-roadmap.rst
-	discussion/distribution.rst
 	discussion/finalizer-order.rst
 	discussion/howtoimplementpickling.rst
 	discussion/improve-rpython.rst