Commits

Britton Smith  committed 3922752 Merge

Merged.

  • Participants
  • Parent commits 00b9c02, 9a2bae2

Comments (0)

Files changed (9)

File source/advanced/creating_frontend.rst

+.. _creating_frontend:
+
+Creating A New Code Frontend
+============================
+
+``yt`` is designed to support analysis and visualization of data from multiple
+different simulation codes, although it has so far been most successfully
+applied to Adaptive Mesh Refinement (AMR) data. For a list of codes and the
+level of support they enjoy, we've created a handy [[CodeSupportLevels|table]].
+We'd like to support a broad range of codes, both AMR-based and otherwise. To
+add support for a new code, a few things need to be put into place. These
+necessary structures can be classified into a couple categories:
+
+ * Data meaning: This is the set of parameters that convert the data into
+   physically relevant units; things like spatial and mass conversions, time
+   units, and so on.
+ * Data localization: These are structures that help make a "first pass" at data
+   loading. Essentially, we need to be able to make a first pass at guessing
+   where data in a given physical region would be located on disk. With AMR
+   data, this is typically quite easy: the grid patches are the "first pass" at
+   localization.
+ * Data reading: This is the set of routines that actually perform a read of
+   either all data in a region or a subset of that data.
+
+Data Meaning Structures
+-----------------------
+
+If you are interested in adding a new code, be sure to drop us a line on
+`yt-dev <http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org>`_!
+
+To get started, make a new directory in ``yt/frontends`` with the name of your
+code -- you can start by copying into it the contents of the ``stream``
+directory, which is a pretty empty format. You'll then have to create a subclass
+of ``StaticOutput``. This subclass will need to handle conversion between the
+different physical units and the code units; for the most part, the examples of
+``OrionStaticOutput`` and ``EnzoStaticOutput`` should be followed, but
+``ChomboStaticOutput``, as a slightly newer addition, can also be used as an
+instructive example -- be sure to add an ``_is_valid`` classmethod that will
+verify if a filename is valid for that output type, as that is how "load" works.
+
+A new set of fields must be added in the file ``fields.py`` in that directory.
+For the most part this means subclassing ``CodeFieldInfoContainer`` and adding
+the necessary fields specific to that code. Here is the Chombo field container:
+
+.. code-block:: python
+
+    from UniversalFields import *
+    class ChomboFieldContainer(CodeFieldInfoContainer):
+        _shared_state = {}
+        _field_list = {}
+    ChomboFieldInfo = ChomboFieldContainer()
+    add_chombo_field = ChomboFieldInfo.add_field
+
+The field container is a shared state object, which is why we explicitly set
+``_shared_state`` equal to a mutable.
+
+Data Localization Structures
+----------------------------
+
+As of right now, the "grid patch" mechanism is going to remain in yt, however in
+the future that may change. As such, some other output formats -- like Gadget --
+may be shoe-horned in, slightly.
+
+Hierarchy
+^^^^^^^^^
+
+To set up data localization, an ``AMRHierarchy`` subclass must be added in the
+file ``data_structures.py``. The hierarchy object must override the following
+methods:
+
+ * ``_detect_fields``: ``self.field_list`` must be populated as a list of
+   strings corresponding to "native" fields in the data files.
+ * ``_setup_classes``: it's probably safe to crib this from one of the other
+   ``AMRHierarchy`` subclasses.
+ * ``_count_grids``: this must set self.num_grids to be the total number of
+   grids in the simulation.
+ * ``_parse_hierarchy``: this must fill in ``grid_left_edge``,
+   ``grid_right_edge``, ``grid_particle_count``, ``grid_dimensions`` and
+   ``grid_levels`` with the appropriate information. Additionally, ``grids``
+   must be an array of grid objects that already know their IDs.
+ * ``_populate_grid_objects``: this initializes the grids by calling
+   ``_prepare_grid`` and ``_setup_dx`` on all of them.  Additionally, it should
+     set up ``Children`` and ``Parent`` lists on each grid object.
+ * ``_setup_unknown_fields``: If a field is in the data file that yt doesn't
+   already know, this is where you make a guess at it.
+ * ``_setup_derived_fields``: ``self.derived_field_list`` needs to be made a
+   list of strings that correspond to all derived fields valid for this
+   hierarchy.
+
+For the most part, the ``ChomboHierarchy`` should be the first place to look for
+hints on how to do this; ``EnzoHierarchy`` is also instructive.
+
+Grids
+^^^^^
+
+A new grid object, subclassing ``AMRGridPatch``, will also have to be added.
+This should go in ``data_structures.py``. For the most part, this may be all
+that is needed:
+
+.. code-block:: python
+
+    class ChomboGrid(AMRGridPatch):
+        _id_offset = 0
+        __slots__ = ["_level_id"]
+        def __init__(self, id, hierarchy, level = -1):
+            AMRGridPatch.__init__(self, id, filename = hierarchy.hierarchy_filename,
+                                  hierarchy = hierarchy)
+            self.Parent = []
+            self.Children = []
+            self.Level = level
+
+
+Even the most complex grid object, ``OrionGrid``, is still relatively simple.
+
+Data Reading Functions
+----------------------
+
+In ``io.py``, there are a number of IO handlers that handle the mechanisms by
+which data is read off disk.  To implement a new data reader, you must subclass
+``BaseIOHandler`` and override the following methods:
+
+ * ``_read_field_names``: this routine accepts a grid object and must return all
+   the fields in the data file affiliated with that grid. It is used at the
+   initialization of the ``AMRHierarchy`` but likely not later.
+ * ``modify``: This accepts a field from a data file and returns it ready to be
+   used by yt. This is used in Enzo data for preloading.
+ * ``_read_data_set``: This accepts a grid object and a field name and must
+   return that field, ready to be used by yt as a NumPy array. Note that this
+   presupposes that any actions done in ``modify`` (above) have been executed.
+ * ``_read_data_slice``: This accepts a grid object, a field name, an axis and
+   an (integer) coordinate, and it must return a slice through the array at that
+   value.
+ * ``preload``: (optional) This accepts a list of grids and a list of datasets
+   and it populates ``self.queue`` (a dict keyed by grid id) with dicts of
+   datasets.
+ * ``_read_exception``: (property) This is a tuple of exceptions that can be
+   raised by the data reading to indicate a field does not exist in the file.
+
+
+And that just about covers it. Please feel free to email
+`yt-users <http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org>`_ or
+`yt-dev <http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org>`_ with
+any questions, or to let us know you're thinking about adding a new code to yt.

File source/advanced/developing.rst

    ``yt/utilities/command_line.py`` in the function ``do_bootstrap``.
 
 Here is the list of items that the script will attempt to accomplish, along
-with a brief motivation of each.  
+with a brief motivation of each.
 
  #. **Ensure that the yt-supplemental repository is checked out into
     ``YT_DEST``**.  To make sure that the extensions we're going to use to
 start developing yt efficiently.
 
 .. _included-hg-extensions:
- 
+
 Included hg Extensions
 ^^^^^^^^^^^^^^^^^^^^^^
 
 
 If you just want to *look* at the source code, you already have it on your
 computer.  Go to the directory where you ran the install_script.sh, then
-go to ``$YT_DEST/src/yt-hg`` .  In this directory are a number of 
+go to ``$YT_DEST/src/yt-hg`` .  In this directory are a number of
 subdirectories with different components of the code, although most of them
 are in the yt subdirectory.  Feel free to explore here.  If you're looking
-for a specific file or function in the yt source code, use the unix find 
+for a specific file or function in the yt source code, use the unix find
 command:
 
 .. code-block:: bash
 
    $ find <DIRECTORY_TREE_TO_SEARCH> -name '<FILENAME>'
 
-The above command will find the FILENAME in any subdirectory in the 
-DIRECTORY_TREE_TO_SEARCH.  Alternatively, if you're looking for a function 
+The above command will find the FILENAME in any subdirectory in the
+DIRECTORY_TREE_TO_SEARCH.  Alternatively, if you're looking for a function
 call or a keyword in an unknown file in a directory tree, try:
 
 .. code-block:: bash
 
 This can be very useful for tracking down functions in the yt source.
 
-While you can edit this source code and execute it on your local machine, 
-you will be unable to share your work with others in the community (or 
+While you can edit this source code and execute it on your local machine,
+you will be unable to share your work with others in the community (or
 get feedback on your work).  If you want to submit your modifications to the
 yt project, follow the directions below.
 
 yt is hosted on BitBucket, and you can see all of the yt repositories at
 http://hg.yt-project.org/ .  With the yt installation script you should have a
 copy of Mercurial for checking out pieces of code.  Make sure you have followed
-the steps above for bootstrapping your development (to assure you have a 
+the steps above for bootstrapping your development (to assure you have a
 bitbucket account, etc.)
 
 In order to access the source code for yt, we ask that you make a "fork" of
 the main yt repository on bitbucket.  A fork is simply an exact copy of the
 main repository (along with its history) that you will now own and can make
-modifications as you please.  You can create a personal fork by visiting the yt 
+modifications as you please.  You can create a personal fork by visiting the yt
 bitbucket webpage at https://bitbucket.org/yt_analysis/yt/wiki/Home .  After
-logging in, you should see an option near the top right labeled "fork".  
+logging in, you should see an option near the top right labeled "fork".
 Click this option, and then click the fork repository button on the subsequent
 page.  You now have a forked copy of the yt repository for your own personal
 use.
 
 This forked copy exists on the bitbucket repository, so in order to access
-it locally, follow the instructions at the top of that webpage for that 
+it locally, follow the instructions at the top of that webpage for that
 forked repository, namely run at a local command line:
 
 .. code-block:: bash
 
    $ hg clone http://bitbucket.org/<USER>/<REPOSITORY_NAME>
 
-This downloads that new forked repository to your local machine, so that you 
+This downloads that new forked repository to your local machine, so that you
 can access it, read it, make modifications, etc.  It will put the repository in
-a local directory of the same name as the repository in the current working 
+a local directory of the same name as the repository in the current working
 directory.  You can see any past state of the code by using the hg log command.
-For example, the following command would show you the last 5 changesets 
+For example, the following command would show you the last 5 changesets
 (modifications to the code) that were submitted to that repository.
 
 .. code-block:: bash
    $ cd <REPOSITORY_NAME>
    $ hg log -l 5
 
-Using the revision specifier (the number or hash identifier next to each 
-changeset), you can update the local repository to any past state of the 
+Using the revision specifier (the number or hash identifier next to each
+changeset), you can update the local repository to any past state of the
 code (a previous changeset or version) by executing the command:
 
 .. code-block:: bash
 
 Lastly, if you want to use this new downloaded version of your yt repository
 as the *active* version of yt on your computer (i.e. the one which is executed
-when you run yt from the command line or ``from yt.mods import *``), 
-then you must "activate" it using the following commands from within the 
-repository directory.  
+when you run yt from the command line or ``from yt.mods import *``),
+then you must "activate" it using the following commands from within the
+repository directory.
 
 In order to do this for the first time with a new repository, you have to
 copy some config files over from your yt installation directory (where yt
 to the repository, but they need to be reviewed/tested by other users of
 the code before they're pulled into the main repository.
 
-When you're ready to submit them to the main repository, simply go to the 
-bitbucket page for your personal fork of the yt-analysis yt repository, 
+When you're ready to submit them to the main repository, simply go to the
+bitbucket page for your personal fork of the yt-analysis yt repository,
 and click the button to issue a pull request (at top right):
 
 Make sure you notify ``yt_analysis`` and put in a little description.  That'll
 
 .. code-block:: bash
 
-   $ ``cp yt-old/src/yt-hg/*.cfg yt-testing``    
+   $ ``cp yt-old/src/yt-hg/*.cfg yt-testing``
    $ cd yt-testing
    $ python setup.py develop
 
    $ cd yt-old/src/yt-hg
    $ python setup.py develop
 
-If you want to accept the changeset or reject it (if you have sufficient 
+If you want to accept the changeset or reject it (if you have sufficient
 priveleges) or comment on it, you can do so from its pull request webpage.
 
 How To Read The Source Code
 
 ``yt`` strives to be a general-purpose analysis tool for astrophysical data.
 To that end, we'd like to short up our support for codes besides Enzo, as well
-as ensure that the other codes we support -- Orion, Tiger, etc -- are
+as ensure that the other codes we support -- Orion, Tiger, etc. -- are
 well-supported.
 
-`A page has been set up <http://yt-project.org/wiki/AddingSupportForANewCode>`_
-on the Trac site to describe the method of adding support for a new code to
-``yt``.  Please feel free to use it as a reference, but if you would like some
-assistance, drop a line to one of the mailing lists (see :ref:`mailing-list`)
-for more help.
+The :ref:`creating_frontend` page describes the process of adding support for a
+new code to ``yt``.  Please feel free to use it as a reference, but if you would
+like some assistance, drop a line to one of the mailing lists (see
+:ref:`mailing-list`) for more help.
 
 GUIs and Interactive Exploration
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File source/advanced/parallel_computation.rst

 This example above can be modified to loop over anything that can be saved to
 a Python list: halos, data files, arrays, and more.
 
+Parallel Time Series Analysis
+-----------------------------
+
+The same :func:`parallel_objects` machinery discussed above is turned on by
+default when using a ``TimeSeries`` object (see :ref:`time-series-analysis`) to
+iterate over simulation outputs.  The syntax for this is very simple.  As an
+example, we can use the following script to find the angular momentum vector in
+a 1 pc sphere centered on the maximum density cell in a large number of
+simulation outputs:
+
+.. code-block:: python
+
+   from yt.mods import *
+   all_files = glob.glob("DD*/output_*")
+   all_files.sort()
+   ts = TimeSeries.from_filenames(all_files, Parallel = True)
+   sphere = ts.sphere("max", (1.0, "pc"))
+   L_vecs = sphere.quantities["AngularMomentumVector"]()
+
+Note that this script can be run in serial or parallel with an arbitrary number
+of processors.  When running in parallel, each output is given to a different
+processor.  By default, Parallel is set to ``True``, so you do not have to
+explicitly set ``Parallel = True`` as in the above example. 
+
+One could get the same effect by iterating over the individual parameter files
+in the TimeSeries object:
+
+.. code-block:: python
+
+   from yt.mods import *
+   all_files = glob.glob("DD*/output_*")
+   all_files.sort()
+   ts = TimeSeries.from_filenames(all_files, Parallel = True)
+   my_storage = {}
+   for sto,pf in ts.piter(storage=my_storage):
+	sphere = pf.h.sphere("max", (1.0, "pc"))
+	L_vec = sphere.quantities["AngularMomentumVector"]()
+	sto.result_id = pf.parameter_filename
+	sto.result = L_vec
+
+   L_vecs = []
+   for fn, L_vec in sorted(my_storage.items()):
+	L_vecs.append(L_vec)
+
+
+You can also request a fixed number of processors to calculate each
+angular momenum vector.  For example, this script will calculate each angular
+momentum vector using a workgroup of four processors.
+
+.. code-block:: python
+
+   from yt.mods import *
+   all_files = glob.glob("DD*/output_*")
+   all_files.sort()
+   ts = TimeSeries.from_filenames(all_files, Parallel = 4)
+   sphere = ts.sphere("max", (1.0, "pc))
+   L_vecs = sphere.quantities["AngularMomentumVector"]()
+
+If you do not want to use ``parallel_objects`` parallelism when using a
+TimeSeries object, set ``Parallel = False``.  When running python in parallel,
+this will use all of the available processors to evaluate the requested
+operation on each simulation output.  Some care and possibly trial and error
+might be necessary to estimate the correct settings for your Simulation
+outputs.
+
 Parallel Performance, Resources, and Tuning
 -------------------------------------------
 
-Optimizing parallel jobs in YT is difficult; there are many parameters
-that affect how well and quickly the job runs.
-In many cases, the only way to find out what the minimum (or optimal)
-number of processors is, or amount of memory needed, is through trial and error.
-However, this section will attempt to provide some insight into what are good
-starting values for a given parallel task.
+Optimizing parallel jobs in YT is difficult; there are many parameters that
+affect how well and quickly the job runs.  In many cases, the only way to find
+out what the minimum (or optimal) number of processors is, or amount of memory
+needed, is through trial and error.  However, this section will attempt to
+provide some insight into what are good starting values for a given parallel
+task.
 
 Grid Decomposition
 ++++++++++++++++++

File source/analysis_modules/halo_profiling.rst

 .. code-block:: python
 
   import yt.analysis_modules.halo_profiler.api as HP
-  hp = HP.halo_profiler("DD0242/DD0242")
+  hp = HP.HaloProfiler("DD0242/DD0242")
 
 Most of the halo profiler's options are configured with keyword arguments given at 
 instantiation.  These options are:

File source/analysis_modules/merger_tree.rst

 Clearly, another requirement is that Python has the
 `sqlite3 library <http://docs.python.org/library/sqlite3.html>`_
 installed.
+This should be built along with everything else yt needs
+if the ``install_script.sh`` was used.
 
 The merger tree can be calculated in parallel, and if necessary, it will run
 the halo finding in parallel as well. Please see the note below about the
 at the same time (`see more here <http://www.sqlite.org/lockingv3.html#how_to_corrupt>`_).
 NFS disks can store files on multiple physical hard drives, and it can take time
 for changes made by one task to appear to all the parallel tasks.
+Only one task of the merger tree ever interacts with the database,
+so these dangers are minimal,
+but in general it's a good idea to know something about the disk used to
+store the database.
 
-The Merger Tree takes extra caution to ensure that every task sees the exact
-same version of the database before writing to it, and only one task
-ever writes to the database at a time.
-This is accomplished by using MPI Barriers and md5 hashing of the database
-between writes.
 In general, it is recommended to keep the database on a 'real disk' 
-(/tmp for example, if all the tasks are on the same SMP node) if possible,
+(/tmp for example, if all the tasks are on the same SMP node,
+or RAM disk for extra speed) if possible,
 but it should work on a NFS disk as well.
-If the database must be stored on a NFS disk, the documentation for the NFS protocol
-should be consulted to see what settings are available that can minimize the potential for
-file replication problems of the database.
+If a temporary disk is used to store the database while it's being built,
+remember to copy the file to a permanent disk after the merger tree script
+is finished.
+
 
 Running and Using the Halo Merger Tree
 --------------------------------------
 If the halos are to be found during the course of building the merger tree,
 run with an appropriate number of tasks to the size of the dataset and the
 halo finder used.
-The merger tree itself, which compares halo membership in parallel very effectively,
-is almost completely constrained by the
-read/write times of the SQLite file.
+The speed of the merger tree itself,
+which compares halo membership in parallel very effectively,
+is almost completely constrained by the read/write times of the SQLite file.
 In tests with the halos pre-located, there is not much speedup beyond two MPI tasks.
 There is no negative effect with running the merger tree with more tasks (which is
 why if halos are to be found by the merger tree, the merger tree should be
-run with as many tasks as that step requires), but there is no benefit.
+run with as many tasks as that step requires), and indeed if the simulation
+is a large one, running in parallel does provide memory parallelism,
+which is important.
 
-How The Database Is Handled
----------------------------
+How The Database Is Handled In Analysis Restarts
+------------------------------------------------
 
 The Merger Tree is designed to allow the merger tree database to be built
 incrementally.
 referencing the same database as before.
 By referencing the same database as before, work does not need to be repeated.
 
+If the merger tree process is interrupted before completion (say, if the 
+jobs walltime is exceeded and the scheduler kills it), just run the exact
+same job again.
+The merger tree will check to see what work has already been completed, and
+resume where it left off.
+
 Additional Parameters
 ~~~~~~~~~~~~~~~~~~~~~
 
     rebuild the database regardless of whether or not the halo files or
     database exist on disk already.
     Default: False.
-  * ``sleep`` (float) - The amount of time in seconds tasks waits between
-    checks to make sure the SQLite database file is globally-identical.
-    This time is used to allow a parallel file system to synch up globally.
-    The value may not be negative or zero. Default: 1.
   * ``index`` (bool) - Whether to add an index to the SQLite file. True makes
     SQL searches faster at the cost of additional disk space. Default=True.
 

File source/analyzing/objects.rst

    from yt.mods import *
    import shelve
 
+   pf = load("my_data") # not necessary if storeparameterfiles is on
+
    obj_file = shelve.open("my_storage_file.cpkl")
    pf, obj = obj_file["my_sphere"]
 
-Note here that this behaves slightly differently than above -- we do not need
-to load the parameter file ourselves, as the load process actually does that
-for us!  Additionally, we can store multiple objects in a single shelve file,
-so we have to call the sphere by name.
+If you have turned on ``storeparameterfiles`` in your configuration,
+you won't need to load the parameterfile again, as the load process
+will actually do that for you in that case.  Additionally, we can
+store multiple objects in a single shelve file, so we have to call the
+sphere by name.
 
 .. note:: It's also possible to use the standard :mod:`cPickle` module for
           loading and storing objects -- so in theory you could even save a

File source/analyzing/time_series_analysis.rst

 But this is not really very nice.  This ends up requiring a lot of maintenance.
 The :class:`~yt.data_objects.time_series.TimeSeriesData` object has been
 designed to remove some of this clunkiness and present an easier, more unified
-approach to analyzing sets of data.  Furthermore, future versions of yt will
-automatically parallelize operations conducted on time series of data.
+approach to analyzing sets of data.  Even better,
+:class:`~yt.data_objects.time_series.TimeSeriesData` works in parallel by
+default (see :ref:`parallel-computation`), so you can use a ``TimeSeriesData``
+object to quickly and easily parallelize your analysis.  Since doing the same
+analysis task on many simulation outputs is 'embarrasingly' parallel, this
+naturally allows for almost arbitrary speedup - limited only by the number of
+available processors and the number of simulation outputs.
 
 The idea behind the current implementation of time series analysis is that
 the underlying data and the operators that act on that data can and should be
    print ms
 
 This allows you to create your own analysis tasks that will be then available
-to time series data objects.  In the future, this will allow for transparent
-parallelization.
+to time series data objects.  Since ``TimeSeriesData`` objects iterate over
+filenames in parallel by default, this allows for transparent parallelization. 

File source/reference/api/extension_types.rst

    ~yt.visualization.image_writer.map_to_colors
    ~yt.visualization.image_writer.strip_colormap_data
    ~yt.visualization.image_writer.splat_points
+   ~yt.visualization.image_writer.annotate_image
 
 We also provide a module that is very good for generating EPS figures,
 particularly with complicated layouts.
    :toctree: generated/
 
    ~yt.visualization.eps_writer.DualEPS
+   ~yt.visualization.eps_writer.single_plot
+   ~yt.visualization.eps_writer.multiplot
+   ~yt.visualization.eps_writer.multiplot_yt
+   ~yt.visualization.eps_writer.return_cmap
 
 .. _image-panner-api:
 

File source/reference/field_list.rst

File contents unchanged.