Module containing a number of general helper classes for specification of HDF5-based data formats. This module alos contains specific file format implementations, e.g, the brainformat.
Modules
brain.dataformat.base | Module with the base classes used for specification of HDF5 file formats and HDF5 object modules. |
brain.dataformat.brainformat | Module for specification of the BRAIN file format API. |
brain.dataformat.annotation | Module with classes to help with the specification and management of and interaction with data annotations. |
File Format Base Classes
brain.dataformat.base.ManagedObject(hdf_object) | Abstract base class defining the base API for |
brain.dataformat.base.ManagedGroup(hdf_object) | Base class for specification of managed h5py.Group objects. |
brain.dataformat.base.ManagedDataset(hdf_object) | Base class for specification of managed h5py.Group objects. |
brain.dataformat.base.ManagedFile(hdf_object) | Base class for specification of managed h5py.File objects. |
BRAIN File Format Classes
brain.dataformat.brainformat.BrainDataFile(...) | Class for management of HDF5 brain files. |
brain.dataformat.brainformat.BrainDataData(...) | Class for management of the data group for storage of brain recordings data. |
brain.dataformat.brainformat.BrainDataInternalData(...) | Class for management of the ‘internal’ group for storage of brain recordings data. |
brain.dataformat.brainformat.BrainDataExternalData(...) | Class for management of the ‘external’ group for storage of recordings external to the brain. |
brain.dataformat.brainformat.BrainDataDescriptors(...) | Class for management of the descriptors group for storage of data descriptions and metadata. |
brain.dataformat.brainformat.BrainDataStaticDescriptors(...) | Class for management of the ‘static’ group for storage of static data descriptors. |
brain.dataformat.brainformat.BrainDataDynamicDescriptors(...) | Class for management of the ‘dynamic’ group for storage of dynamic data descriptors. |
brain.dataformat.brainformat.BrainDataECoG(...) | Class for management of managed h5py.Group objects structured to store ECoG brain recording data. |
brain.dataformat.brainformat.BrainDataECoGProcessed(...) | Class for management of h5py.Group objects structured to store processed ECoG brain recording data. |
Annotation Classes
The basic concept for using annotations are as follows. We have DataSelections to describe a particular subset of a given data object (e.g, h5py.Dataset, numpy array, or any other kind of data that supports .shape and h5py.Dataset slicing). An Annotation consists of a type, description, and selection describing a particular data subset. An AnnotationCollection then describes collection annotations and is used to query and manage many annotations. Finally the AnnotationDataGroup describes the interface for storing and retrieving AnnotationCollections from/to HDF5. The brain.dataformat.annotation module provides the following classes for definition, management, interaction, and storage of data annotations, which implement these concepts:
brain.dataformat.annotation.DataSelection(...) | A single data data_selection for a given dataset. |
brain.dataformat.annotation.Annotation(...) | Annotate a particular dataset or subset of data. |
brain.dataformat.annotation.AnnotationCollection(...) | A collection of annotations |
brain.dataformat.annotation.AnnotationDataGroup(...) | Managed group for storage of annotations. |
Module with the base classes used for specification of HDF5 file formats and HDF5 object modules.
Bases: brain.dataformat.base.ManagedObject
Base class for specification of managed h5py.Group objects.
Variables: | hdf_object – See ManagedObject |
---|
Functions to be implemented by derived class:
- get_format_specification(..) : Overwrite to specify the format. See ManagedObject.
- populate(..) : Overwrite to implement the creation of the object. See ManagedObject.
Numpy-style slicing to read data.
Parameters: | hdf_object – The h5py.Group or h5py.Dataset object managed by the current instance. |
---|---|
Raises : | ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function. |
NumPy-style slicing to write data
Bases: brain.dataformat.base.ManagedObject
Base class for specification of managed h5py.File objects.
Variables: | hdf_object – h5py.File or h5py.Group indicating the HDF5 file or string indicating the HDF5 file ot be opened. See also ManagedObject.hdf_object |
---|
Functions to be implemented by derived class:
- get_format_specification(..) : Overwrite to specify the format. See ManagedObject.
- populate(..) : Overwrite to implement the creation of the object. See ManagedObject.
Parameters: |
|
---|
Flush the HDF5 file.
Bases: brain.dataformat.base.ManagedObject
Base class for specification of managed h5py.Group objects.
Variables: | hdf_object – See ManagedObject |
---|
Implicit attributes available via the h5py.Group object
Variables: |
|
---|
Functions to be implemented by derived class:
- get_format_specification(..) : Overwrite to specify the format. See ManagedObject.
- populate(..) : Overwrite to implement the creation of the object. See ManagedObject.
Dict-like containership testing. name may be a relative or absolute path.
Enable slicing into the group.
Parameters: | hdf_object – The h5py.Group or h5py.Dataset object managed by the current instance. |
---|---|
Raises : | ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function. |
Get (name, value) pairs for object directly attached to this group. Values for broken soft or external links show up as None.
Get the names of directly attached group members. See h5py.Group
Get the objects contained in the group (Group and Dataset instances). Broken soft or external links show up as None.
Bases: object
Abstract base class defining the base API for classes responsible for managing a specific hdf5 h5py.Group or h5py.Dataset object.
Functions to be implemented by derived class:
- get_managed_object_type : Overwrite in case that the derived class manages a dataset
- get_format_specification : Overwrite to specify the format
- populate: Overwrite to implement the creation of the object
Variables: |
|
---|
Initialize the management object.
Parameters: | hdf_object – The h5py.Group, h5py.File, or h5py.Dataset object managed by the current instance. |
---|---|
Raises : | ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function. |
list of weak references to the object (if defined)
Internal helper function used to check all attributes of a hdf5 object against a given attribute specification. Warnings are raised for every non-format compliance found.
Parameters: |
|
---|---|
Returns: | Boolean indicating compliance. |
Internal helper function used to check all managed objects in a parent object against a given managed object specification. Warnings are raised for every non-format compliance found.
Parameters: |
|
---|---|
Returns: | Boolean indicating compliance. |
Internal helper function used to check all datasets in a given list against a given dataset specification. Warnings are raised for every non-format compliance found.
Parameters: |
|
---|---|
Returns: | Boolean indicating compliance. |
Internal helper function used to check all relevant datasets in a parent object against a given dataset specification. Warnings are raised for every non-format compliance found.
Parameters: |
|
---|---|
Returns: | Boolean indicating compliance. |
Internal helper function used to check a given hdf dataset against a given dataset specification. Warnings are raised for every non-format compliance found.
Parameters: |
|
---|---|
Returns: | Boolean indicating compliance. |
Check if the HDF5 object assigned to the given instance of a subclass of ManagedObject is compliant with the corresponding format specification. Warnings are raised for every non-format compliance found. Use e.g, “with warnings.catch_warnings(record=True) as w:” when calling the function to record all warnings.
Parameters: | current_only – If current only is set then the current object is validated. Set to false in order to force validation of all objects of the current type contained in the parent group of the current object. The current object is self.hdf_object. |
---|---|
Returns: | Boolean indicating whether the hdf object is compliant with the format. |
Internal helper function used to check all groups in a parent object against a given group specification. Warnings are raised for every non-format compliance found.
Parameters: |
|
---|---|
Returns: | Boolean indicating compliance. |
Internal helper function used to check all relevant groups in a parent object against a given group specification. Warnings are raised for every non-format compliance found. This function uses the check_format_compliance(..) function.
Parameters: |
|
---|---|
Returns: | Boolean indicating compliance. |
Internal helper function used to check a given hdf_group against a given group specification. Warnings are raised for every non-format compliance found. This function used the check_format_compliance(..) function.
Parameters: |
|
---|---|
Returns: | Boolean indicating compliance. |
Close the HDF5 file associated with the managed object.
Create new managed object in the given parent group. The type of object created is decided by the get_managed_object_type(...) f function defined implemented by derived classes (default is ‘group’). The The functions creates the new object and assigns common attributes. It then creates the manager object and calls the corresponding populate(...) function to initialize the new object.
NOTE See the populate method of the corresponding derived class for details on the additional keyword arguments.
Parameters: |
|
---|---|
Returns: | Instance of the derived ManagedObject responsible for management of the newly create h5py.Group or h5py.Dataset object. |
Raises : | ValueError is raised in case that a conflicting object should exist, an illegal type is encountered, or if the creation of the object is not explicitly specified as permitted in the spec. |
Check whether the format specification as given in the file is the same as the format specification given by the current instance of the managed object.
Get all objects that are managed by the current class contained in the parent group.
Parameters: |
|
---|---|
Returns: | List of managed h5py.Group, h5py.Dataset objects managed by the current class or list of ManagedObject instances responsible for managing the found groups/datasets. |
Get the name of the file containing the managed object. Alternatively we may also use self.file.filename
Parameters: | absolute_path (bool) – Set to True to retrieve an absolute path |
---|
Return dictionary describing the specification of the format.
Get the format specification as given in the file.
Returns: | Python dict with the format specification as given in the file or None if not available. |
---|
Recursively construct the format specification for the current class including the specification of all included managed objects. The specification of managed objects are inserted in the groups and dataset dicts respectively and are accordingly removed from the managed_objects lists (i.e., the managed_objects list is empty after all replacements have been completed.). The result is a full specification for the current class. The function also adds the specification for the format_type, description, ‘format_specification`, and object_id attributes as those are implicitly defined by the ManageObject class, and are implicitly part of the spec of all managed object (even though the specific spec of the different objects usually does not explicitly declare them).
Returns: | Python dict with the full format specification for the current managed object class. The spec may be converted to JSON using json.dumps. |
---|
Get the h5py object managed. This a convenience function that simply returns self.hdf_object.
Get an instance of the ManagedObject class for managing the given hdf_object. None is returned in case that the given hdf_object is not a managed object.
Parameters: | hdf_object – The h5py.Dataset, h5py.File, or h5py.Group for which the corresponding instance of the relevant derived class of ManagedObject should be generated. If a ManagedObject instance is provided as input, then the hdf_object will be returned. NOTE: The behavior fo the function is undefined for objects other then the mentioned h5py data objects. NOTE: A string to a valid HDF5 data file is allowed for convenience, in which case the file will be opened in append mode=a if possible and if that fails in read-only mode ‘r’. A warning will be issued if the file could only be opened in read-only mode. |
---|---|
Returns: | Instance of ManagedObject or None in case the object is not managed or the manager object cannot be constructed. |
Raises: |
|
Get whether a group or dataset is managed by the class. The default implementation assumes that a group is managed. The method must be overwritten in derived classes that manage datasets.
Returns: | String ‘group’ or ‘dataset’ or ‘file’ indicating whether a group, file, or dataset is managed by the class. |
---|
Get the number of dimensions for a dataset based on the specifcation of dimension scales.
Returns: | None in case that no dimensions scales are given and the dimensionality of the dataset is not fixed. Returns a tuple of two integer indicating the minimum number of dimensions and the maximum number of dimensions. |
---|
Get the optional id of the object.
Returns: | The id of the object or None if no id exists. |
---|
Check whether the object has an id assigned to it.
Check whether the given hdf object is managed by a brain file API class.
Parameters: | hdf_object – The hdf5 object to be checked (dataset or group) |
---|---|
Returns: | Boolean indicating whether the given hdf5 object is managed or not. |
Check whether the given hdf_object is managed by the current class.
Param : | hdf_object: The hdf5 object to be checked |
---|---|
Returns: | Boolean indicating whether the object is managed by the current class. |
The populate method is called by the create method after the basic common setup is complete. The function should be used to populate the managed object (e.g., add dimensions to a datadset or add required datasets to a group. The populate method is passed the kwargs handed to the create method.
Parameters: | kwargs – Any additional keyword arguments supported by the specific implementation of the populate method. |
---|
Define the id of the object.
Parameters: | object_id (str, unicode, or None) – The object id to be used. If None is given, then the object id will be deleted. |
---|
Bases: brain.dataformat.base.ManagedFile
Container file used to store a single managed object in a separate external file that is then linked to other parent files. This container is used as part of the external storage feature available for all ManagedObject implementations.
Enable slicing into the root group of the file
Get dictionary describing the format.
Populate the managed object file.
Module for specification of the BRAIN file format API.
Bases: brain.dataformat.base.ManagedGroup
Class for management of the data group for storage of brain recordings data.
Variables: | hdf_object – See ManagedGroup |
---|
Implicit instance variables (i.e., these are mapped names but not stored explicitly)
Variables: |
|
---|
Parameters: | hdf_object – The h5py.Group object managed by the current instance. |
---|---|
Raises : | ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function. |
Get the external managed object with the external data. In principal the current format defines that there should be always exactly one external group. However, multiple could be supported so this function is prepared to handle this in case the format should change in the future.
Parameters: | index – Optional input parameter (should be always 0 for now) to define which external group should be retrieved. Currently their should be always exactly one. |
---|---|
Returns: | BrainDataExternalData object or None |
Get dictionary describing the format.
Get the internal managed object with the internal data. In principal the current format defines that there should be always exactly one internal group. However, multiple could be supported so this function is prepared to handle this in case the format should change in the future.
Parameters: | index – Optional input parameter (should be always 0 for now) to define which internal group should be retrieved. Currently their should be always exactly one. |
---|---|
Returns: | BrainDataInternalData object or None |
Populate the Brain file with the Data and Descriptors group.
Bases: brain.dataformat.base.ManagedGroup
Class for management of the descriptors group for storage of data descriptions and metadata.
Variables: | hdf_object – See ManagedFile |
---|
Implicit instance variables (i.e., these are mapped names but not stored explicitly)
Variables: |
|
---|
Parameters: | hdf_object – The h5py.Group object managed by the current instance. |
---|---|
Raises : | ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function. |
Get the dynamic descriptors managed object with the dynamic descriptor data. In principal the current format defines that there should be always exactly one dynamic descriptor group. However, multiple could be supported so this function is prepared to handle this in case the format should change in the future.
Parameters: | index – Optional input parameter to define the index of dynamic descriptors. Default is index=0. |
---|---|
Returns: | BrainDataEcoGProcessed object or None |
Get dictionary describing the format of the group.
Populate the ‘descriptors’ group with all required elements.
Get the static descriptors managed object with the static descriptor data. In principal the current format defines that there should be always exactly one static descriptor group. However, multiple could be supported so this function is prepared to handle this in case the format should change in the future.
Parameters: | index – Optional input parameter to define the index of static descriptors. Default is index=0. |
---|---|
Returns: | BrainDataEcoGProcessed object or None |
Bases: brain.dataformat.base.ManagedGroup
Class for management of the ‘dynamic’ group for storage of dynamic data descriptors.
Parameters: | hdf_object – The h5py.Group object managed by the current instance. |
---|---|
Raises : | ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function. |
Get dictionary describing the format of the group.
Populate the ‘data’ group with all required elements.
Bases: brain.dataformat.base.ManagedGroup
Class for management of managed h5py.Group objects structured to store ECoG brain recording data.
Variables: | hdf_object – See ManagedObject |
---|
Read data.
Parameters: | hdf_object – The h5py.Group or h5py.Dataset object managed by the current instance. |
---|---|
Raises : | ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function. |
Write data.
Get the anatomy information associated with the dataset.
Parameters: |
|
---|---|
Raises KeyError: | |
A KeyError is generated if a new anatomy_names array or anatomy_ids array is given but a prior description already exists. |
|
Returns: | 1D Numpy array indicating for each recording the name of the region. The function also returns a second 1D numpy indicating for each recording the index of the region the electrode was located in. Note, if any of the anatomy arrays are missing, then None will be returned instead of the anatomy array. |
Get the requested annotation object.
Parameters: | index – The index of the annotation object to be retrieved. |
---|---|
Returns: | If index is specified to a value >=0 then a single AnnotationDataGroup object is returned (or None if the index is invalid). Otherwise a list of all AnnoationDataGroup object is returned. |
Get the h5py Dimension scales associated with the ecog dataset
Parameters: | get_hdf – Get the h5py.DimensionScale associated with the ecog dataset if set to True (Default). If set to False a list of lists of dicts with a summary of the dimensions is returned instead which is formatted as follows: [axis_index][scale_index]. Each dict then contains the name, unit, dataset, and axis information for the dimensions scale. |
---|
Get the h5py.Dataset object of the ecog data.
For get_hdf==True this is the same as self[‘raw_data’] and self[‘ecog_data’].
For get_hdf==False this is the same as self[‘raw_data’][:] and self[‘ecog_data’][:]
Parameters: | get_hdf – Get the value in numpy format (set to False) or the h5py dataset for the sampling rate (set to True). Default value is True. |
---|---|
Returns: | h5py.Dataset if get_hdf is True. numpy array of the full data if get_hdf is False. None in case that the ecog data should be missing |
Get dictionary describing the format.
Check whether an anatomy description is available for the ECoG dataset.
Returns: | Boolean indicating whether an anatomy description by name and or id is available. |
---|
Check whether annotations exists for the ECoG dataset.
Check whether a layout description is available for the ECoG dataset.
Returns: | Boolean indicating whether a layout dataset is available. |
---|
Get the h5py.Dataset of the layout of the ecog_data. Same as self[‘layout’]
Parameters: |
|
---|---|
Returns: | h5py.Dataset if get_hdf is True. Numpy array if get_hdf is False. None in case the layout data does not exist. |
Get the number of annotation objects associated with the ECoG data.
The populate method is called by the create method after the basic common setup is complete. The function should be used to populate the managed object (e.g., add dimensions to a datadset or add required datasets to a group. The populate method is passed the kwargs handed to the create method.
Parameters: |
|
---|---|
Raises : | ValueError is raised in case that the object cannot be populated. |
Get the h5py.Dataset object of the sampling rate data. Same as self[‘sampling_rate’]
Parameters: | get_hdf – Get the value in numpy format (set to False) or the h5py dataset for the sampling rate (set to True). Default value is True. |
---|---|
Returns: | h5py.Dataset if get_hdf is True. Float if get_hdf is False. None in case the sampling_rate data should be missing. |
Bases: brain.dataformat.brainformat.BrainDataECoG
Class for management of h5py.Group objects structured to store processed ECoG brain recording data.
Variables: | hdf_object – See ManagedObject and BrainDataECoG. |
---|
Parameters: | hdf_object – The h5py.Group or h5py.Dataset object managed by the current instance. |
---|---|
Raises : | ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function. |
Get dictionary describing the format.
This function adapts the specification of the BrainDataECoG class.
Returns: | Dictionary with the format specification. |
---|
Get the original name specificed by the user for the dataset.
Returns: | String indicating the original name of the dataset specified by the user or None in case no user-defined name was specified. |
---|
The populate method is called by the create method after the basic common setup is complete. The function should be used to populate the managed object (e.g., add dimensions to a datadset or add required datasets to a group. The populate method is passed the kwargs handed to the create method.
Parameters: |
|
---|---|
Raises : | ValueError is raised in case that the object cannot be populated. |
Bases: brain.dataformat.base.ManagedGroup
Class for management of the ‘external’ group for storage of recordings external to the brain.
Parameters: | hdf_object – The h5py.Group object managed by the current instance. |
---|---|
Raises : | ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function. |
Get dictionary describing the format of the group.
Populate the ‘data’ group with all required elements.
Bases: brain.dataformat.base.ManagedFile
Class for management of HDF5 brain files.
Variables: | hdf_object – See ManagedFile |
---|
Implicit instance variables (i.e., these are mapped names but not stored explicitly)
Variables: |
|
---|
Enable slicing into the file
Parameters: |
|
---|
Get the data managed object with the data. In principal the current format defines that there should be always exactly one descriptors group. However, multiple could be supported so this function is prepared to handle this in case the format should change in the future.
Parameters: | index – Optional input parameter (should be always 0 for now) to define which data group should be retrieved. Currently their should be always exactly one. |
---|---|
Returns: | BrainDataData object or None |
Get the descriptors managed object with the descriptors. In principal the current format defines that there should be always exactly one descriptors group. However, multiple could be supported so this function is prepared to handle this in case the format should change in the future.
Parameters: | index – Optional input parameter (should be always 0 for now) to define which descriptors group should be retrieved. Currently their should be always exactly one. |
---|---|
Returns: | BrainDataDescriptors object or None |
Get dictionary describing the format.
Populate the Brain file with the Data and Descriptors group.
Bases: brain.dataformat.base.ManagedGroup
Class for management of the ‘internal’ group for storage of brain recordings data.
Variables: | hdf_object – See ManagedGroup |
---|
Implicit instance variables (i.e., these are mapped names but not stored explicitly)
Variables: |
|
---|
Parameters: | hdf_object – The h5py.Group object managed by the current instance. |
---|---|
Raises : | ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function. |
Get the ecog managed object with the ecog data.
Parameters: | index – Optional input parameter to define the index of the ecog dataset. |
---|---|
Returns: | BrainDataEcoG object or None |
Get the ecog_processed_ managed object with the processed ecog data.
Parameters: | index – Optional input parameter to define the index of the processed ecog dataset. |
---|---|
Returns: | BrainDataEcoGProcessed object or None |
Get dictionary describing the format of the group.
Get the number of ecog datasets available.
Get the number of processed ecog datasets available.
Populate the ‘data’ group with all required elements.
Bases: brain.dataformat.base.ManagedObjectFile
Container file used to store a collection of BrainDataFile collections of neural data. This container is used to organize multiple experiments, sessions, etc. stored in separate files into a single collection, allow a user to open a single file and interact with the collection of files as if they were all stored in the same file.
Add the given BrainDataFile object to the collection.
Parameters: |
|
---|---|
Raises: |
|
Get dictionary describing the format.
Bases: brain.dataformat.base.ManagedGroup
Class for management of the ‘static’ group for storage of static data descriptors.
Parameters: | hdf_object – The h5py.Group object managed by the current instance. |
---|---|
Raises : | ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function. |
Get dictionary describing the format of the group.
Populate the ‘data’ group with all required elements.
Module with classes to help with the specification and management of and interaction with data annotations. Annotations are used to describe specific subsets of data in further detail. This is used, e.g., to describe the location of data channels or to mark events in time.
Bases: object
Annotate a particular dataset or subset of data.
Variables: |
|
---|
Check whether a data_selection is a subset of the other data_selection.
Equals operator (==)
Greater than or equals operator (>=)
Greater than operator (>)
Less than or equals operator (<=)
Get number of selected elements.
Check whether the data selection of the annotation precedes the selection of the other annotation. Same as self.data_selection >> other.data_selection. See DataSelection.__lshift__() for details.
Less than operator (<)
Not equals operator (!=)
Check whether the data selection of the annotation follows the selection of the other annotation. Same as self.data_selection >> other.data_selection. See DataSelection.__rshift__() for details.
list of weak references to the object (if defined)
Load the data associated with this DataSelection.
Bases: object
A collection of annotations
Instance Variables
Variables: |
|
---|
Filtering
Filtering provides means to locate annotations of interest. Filtering is performed via the provided set of filter functions, including:
- index_filter(..) : Select all annotations with the given index.
- axis_filter(..) : Find all annotations that select by a given set of axes
- type_filter(..) : Find all annotations of the given type
- description_filer(..) : Find all annotations with the given description
- description_contains_filter(..) : Find all annotations where the description contains the given string
Filter functions generate a 1D bool-type numpy array, indicating which annotations are selected by the filter. As bool arrays, results of filters may be combined using bitwise logical operations, e.g.,
- a & b : AND : Select all annotations selected by both filters a and b
- a | b : OR : Select all annotations that are selected by either filter a or b
- a ^ b XOR : Exclusive or, select all annotations where the filters a and b differ
- ~a : NOT : Invert the selection of the filter, selecting all annotations not selected by a
Selecting:
Once we have identified a set of relevant annotations via filtering one can select the annotations of interest directly using standard array slicing. E.g,:
>>> a = AnnotationCollection(..)
>>> f = a.type_filter('event') & a.axis_filter([0])
>>> v = a[f]
The result of selecting annotations is reduced AnnotationCollection object. In addition to filter, one can also select annotations directly using standard data selection/slicing, e.g, a[0:10] to select the first 10 annotations.
NOTE: When sub-selection is performed, all data relevant to the selected annotations will be loaded into memory, whereas when an AnnotationCollection is constructed initially it may be initialized using h5py.Dataset objects where the data resides in file and is loaded by the filters as needed.
Getting the Annotations
From the AnnotationCollection we can retrieve a list of all selected Annotation objects via the get_annotations() functions. This will convert the annotations from the collective data structures used for filtering to individual Annotation objects. This is typically done after the filtering is complete.
Merging Annotations
Once we have selected and retrieved the Annotation objects of interest, the individual Annotations may be combined using standard bitwise logical operators and compared using standard comparison operators (see the documentation of the Annotation class for further details. As convenience functions, the AnnotationCollection class provides a set of merge functions which will generate a single combined Annotation by merging all the selected annotations using a given bitwise logical operation, e.g, merge_and(..), merge_or, and merge_xor
Other Operations
- len : Get the number of annotations in the collection using the standartd Python len(a).
Example
>>> a = AnnotationDataGroup(...) # Load annotation collection from file
>>> f = a.type_filter('event') & a.axis_filter([0]) # Find all annotations that define an event on the axis 0
>>> s = a[f] # Select all relevant annotations
>>> s_all = s.get_annotations() # Get all annotations
>>> s1 = a.merge_or() # Define a single combined annotation
Get a new AnnotationCollection for the subset of annotations selected.
NOTE: While most other variables are sub-selects (and loaded into memory) the self.selections collection of selections remains unmodified. This strategy i) allows us to keep the selections out-of-core in the HDF5 file, and ii) avoids complex updates of references to the selections.
Get the number of annotations in the collection.
list of weak references to the object (if defined)
Add a new annotation to the collection.
NOTE: if the AnnotationCollection object was populated with h5py.Dataset objects, e.g., as is the case when using an AnnotationDataGroup instance, then the annotation will be written to file. If the AnnotationCollection was initialized as a pure in-memory collection using numpy arrays, then the arrays will only be updated in memory.
NOTE: The Annotation is assumed to refer to the same data object as this AnnotationCollection. If this is not the case, then the annotation will be either reassigned to the data object of the collection (as long as the data object’s shape match) or a ValueError is raised.
Parameters: | annotation (Annotation) – The annotation to be added |
---|---|
Raises : | ValueError in case the annotation can not be added to the collection |
Get all annotations that filter by a given set of axes.
Parameters: | axis_list – List of all axis relevant selections must filter by. Only selections that select by all specified axes will be retrieved. The axis may be specified by their integer index (recommended, -1,0,1, ... n) or by the name of the dimension in the data_object. |
---|
Examples:
>>> from brain.dataformat.brainformat.braindata import *
>>> f= BrainDataFile('testfile_real.h5')
>>> d = f.data().internal().ecog_data(0)
>>> d.annotations()
>>> a = d.annotations(0)
>>> a.axis_filter(0) # Filter based on a single axis index
>>> a.axis_filter('space') # Filter based on a single axis index using the name of the axis
>>> a.axis_filter(['space', 1]) # Filter based on multiple axis. We here can mix axis names and indexes
NOTE: Global selections are treated independently and are as such if the filter asks for axis=1 selections with axis=-1 (global) will not be included.
Returns: | Bool array indicating for each annotation whether it has been selected (True) or not (False) |
---|
Get all annotations for which the description contains the given text
Parameters: | description – String of the partial description to be found. |
---|---|
Returns: | Bool array indicating for each annotation whether it has been selected (True) or not (False) |
Get all annotations with the given description.
Parameters: | description – String of the description to be located. |
---|---|
Returns: | Bool array indicating for each annotation whether it has been selected (True) or not (False) |
Get a list of all annotations.
Returns: | List of Annotation objects |
---|
Filter annotations based on their index.
Parameters: | selection – Any valid selection supported by numpy, e.g., a slice object or an integer index. E.g., to select annotation 200 to 300 we could do index_filter(slice(200,301). NOTE: Following standard numpy selection schema the upper bound is not included in this example. |
---|---|
Returns: | Bool array indicating for each annotation whether it has been selected (True) or not (False) |
Get single AnnotationObject (or None) that combines all annotations in this collections via logical AND.
This is a convenience function and is equivalent to calling get_annotations(...) and combining all annotations in the returned list via logical & (AND).
Get single AnnotationObject (or None) that combines all annotations in this collections via logical OR.
This is a convenience function and is equivalent to calling get_annotations(...) and combining all annotations in the returned list via logical | (OR).
Get single AnnotationObject (or None) that combines all annotations in this collections via logical XOR.
This is a convenience function and is equivalent to calling get_annotations(...) and combining all annotations in the returned list via logical ^ (XOR).
Get the number of annotations in the collection.
Get all annotations where the annotation type contains the given string.
Parameters: | annotation_type – String with annotation type substring to look for |
---|---|
Returns: | Bool array indicating for each annotation whether it has been selected (True) or not (False) |
Get all annotations with the given type.
Parameters: | annotation_type – Either the string of the annotation type or integer with the index of the annotation type. |
---|---|
Returns: | Bool array indicating for each annotation whether it has been selected (True) or not (False) |
Bases: brain.dataformat.base.ManagedGroup, brain.dataformat.annotation.AnnotationCollection
Managed group for storage of annotations.
Parameters: | hdf_object – The h5py.Group object managed by this class. |
---|
HDF5 file structure: -> ‘data_object’ –> Link to the HDF5 dataset/group we select from -> ‘annotation_types’ –> 1D Dataset with the list of all available types (vlen str) -> ‘annotation_type_indexes’ –> 1D Dataset with the index into the annotation_types array -> ‘descriptions’ –> 1D Dataset with the description of the annotations -> ‘selection_indexes’ –> (#selections, #axis+1) dataset, indicating for each axis the selection that applies or -1 if none. The axes are ordered as [-1, 0, 1, 2, ...] -> ‘selections_axis_#’ –> 2D dataset per each axis with all 1D selections. + (#axis+1) dataset for global selections -> axis_index : 1D dataset for the dimension-scale for selection_indexes -> collection_description: Attribute with string describing the collection
If item is a string, then retrieve the corresponding object from the HDF5 file using the ManagedGroup.__getitem__(..) method, otherwise retrieve the corresponding AnnotationCollection using the implementation of AnnotationCollection.__getitem__(..)
Return dictionary describing the specification of the format.
The populate method is called by the create method after the basic common setup is complete. The function should be used to populate the managed object (e.g., add dimensions to a datadset or add required datasets to a group. The populate method is passed the kwargs handed to the create method.
Parameters: |
|
---|---|
Raises : | ValueError is raised in case that the object cannot be populated. |
Bases: object
A single data data_selection for a given dataset.
Basic Structure
The data_selection object defines a set of selections (one per axis). Axes for which no data_selection is specified are interpreted as an ALL data_selection, i.e., [1,1,1,...,1] is assumed. The individual selections are assumed to be combined via a binary AND operation, i.e., only objects selected by all selections are seen as being selected. A data_selection may also be applied to the -1 axis, indicating that the data_selection is applied to all axes. In the case of such a global data_selection, the data_selection is described via a bool matrix of the same shape as the data (rather than a vector). As such, specifying a global (-1) data_selection is generally something that should be avoided if possible, because this may result in very large data_selection arrays, as a bool value must be created for each data element.
Creating and Updating Data Selections
A default DataSelection my be created simply by specifying only the data object the selection applies to without specifying any explicit selection:
>>> data = np.arange(100).reshape((10, 10))
>>> d1 = DataSelection(data_object=data)
This will create a default DataSelection with an empty selection dict, i.e., d1.selections = {}. As missing selections are treated as ALL, this results in a selection data implicitly selects all objects.
We may now change the selection, simply using array slicing. In the slicing we specify the axis index first, and then the elements to be changed:
>>> d1[0, 1:3] = True # axis=0, selection=1:3
>>> print d1[0]
[ True True True False False False False False False False]
NOTE: While missing selections are treated as ALL in general, if a selection is missing during assignment, we will initialize the missing selection as all False. This divergence from the standard scheme achieves that all values outside of the given first assignment are not selected . This divergence is made to: i) allow the more intuitive usage for users of selecting elements of interest via assignment rather then having to define all elements that are not of interest and ii) to avoid the problem that elements are automatically selected for the user.
NOTE: Instead of specifying the index of the axis, we may also use the label of the dimension if the data object the DataSelection applies is an h5py.Dataset with dimension scales. E.g, d1[‘time’, 1:3] = True
Instead of creating a default selection and then updating it, we can also directly create a custom selection, by directly initializing the selections dictionary during creation of the DataSelection object:
>>>t = np.zeros(10, ‘bool’) >>>t[0:3] = True >>>d1 = DataSelection(data_object=data, selections= {0: t})
Accessing Axis Selection
To access the selections associated with the different axes we can do one of the following two things:
>>> print d1[0] # Get selection for axis 0 using array slicing
[ True True True False False False False False False False]
>>> print d1.selections[0] # Get selection for axis 0 by accessing the selection dict directly
[ True True True False False False False False False False]
As shown above, we can retrieve the selection associated using standard slicing against the DataSelection directly or by slicing into the selections dict. For axis with an explicit selection, both methods result in the same behavior. However, for axes without a selection, the first approch will yield None, while the latter approach will result in a KeyError:
>>> print d1[1]
>>> print d1.selections[1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 1
Logical Operations / Combining Data Selection
DataSelection objects may be combined via bitwise binary operators:
- & : AND operator defining the intersection of two DataSelections s1 & s2
- | : OR operator defining the merge of two DataSelections s1 | s2
- ^ : XOR of two data DataSelections s1 ^ s2
- ~ : NOT operator inverting a DataSelection ~s1
Comparing Data Selections
- in : s1 in s2’ Check if DataSelection s1 is a subset of s2 `s1 in s2
- ‘>=` : s1 >= s2 : Check if the DataSelection s1 contains all elements of s2.
- ‘>’ : s1 > s‘ : Check if the DataSelection s1 contains all elements of s2 and has additional elements
- ‘==’ : s1 == s2 : Check if the DataSelection s1 selects exactly the same elements as s2
- ‘!=’ : s1 != s2 : Check if the DataSelection s1 selects at least in parts different elements than s2
- ‘<’ : s1 < s2 : Check if the DataSelection s1 is contained in s2 but s2 selects additional elements as well
- ‘<=’ : s1 <= s2 : Check if the DataSelection s1 is a subset of s2
- >> : s1 >> s2 : Check if s1 follows s2, i.e., all elements selected by s1 appear after all elements selected by s2.
- << : s1 >> s2 : Check if s1 precedes s2, i.e., all elements selected by s1 appear before all elements selected by s2.
Other Selection Operations
- validate() : Validate that a data_selection is well-defined/valid
- len : The python len operator functions is equivalent to calling count() (see next).
- count() : Count the total number of elements selected. This is equivalent to len(a)
- simplify_selection() : Remove unnecessary components of the data_selection.
- collapse_selection() : Collapse the data_selection to a single global data_selection.
Variables: |
|
---|
Return a new DataSelection that defines a logical AND of the two selections. Selections are combined on a per-axis basis.
NOTE: If only one of the two selections defines a data_selection for a given axes then the single data_selection that is defined will be used, implicitly interpreting the missing data_selection as an ALL [1,1,1....1] data_selection.
Parameters: | other (DataSelection) – The right-hand DataSelection object B of the A & B operation. |
---|
Compute the smallest and largest selected index along all axes.
Returns: | This function returns a tuple of two lists, one with the lower axis bounds and one with the upper axis bounds. |
---|
Internal helper function used to collapse a set of selections. A data_selection may be collapsed to a single global data_selection if a global data_selection is already present.
Parameters: |
|
---|
Internal helper function used to implement a series of comparison operators.
Parameters: |
|
---|---|
Type : | str or unicode |
Check whether a data_selection is a subset of the other data_selection.
Equals operator (==)
Greater than or equals operator (>=)
Get the selection for the given axis. If no selection is present for the specified axis, then None is returned instead
Greater than operator (>)
Initialize a new DataSelection object.
Parameters: |
|
---|
Return a new DataSelection object that inverts the current data_selection.
NOTE: This may result in a complex, global data_selection. Only in the case where we have no or only have a single data_selection applied to one axes can we maintain a simple per-axis data_selection. Global selections may require large memory as we need to store one bool for each element in the dataset.
NOTE: The data_selection of this data_selection will be simplified first in an attempt to reduce the number of axes selections used with the goal to keep a simple per-axis data_selection object if possible.
Less than or equals operator (<=)
Get number of selected elements.
a << b : Identify if a precedes b.
This checks for all selections applied whether all elements selected by a appear before the elements selected by b. This means both a >> b and b >> a can be False at the same time in the case that either a and b overlap in their selection ranges.
Less than operator (<)
Not equals operator (!=)
Return a new DataSelection that defines the logical OR of the two selections.
NOTE: Due to the complexity of the resulting data_selection, the returned DataSelection may define a complex, global data_selection for axis -1. Only in the case where both selections define just a single data_selection on the same axis do we keep a simple data_selection. Global selections may require large memory as we need to store one bool for each element in the dataset.
Parameters: | other (DataSelection) – The right-hand DataSelection object B of the A & B operation. |
---|
a >> b : Identify if a follows b.
This checks for all selections applied whether all elements selected by a appear after the elements selected by b. This means both a >> b and b >> a can be False at the same time in the case that either a and b overlap in their selection ranges.
Define the selection for a given axis using the following syntax.
s[axis_index, value_selection] = True
The right-hand assignment value must be a bool or bool-array. The axis_index must be an integer indicating a valid axis. The selection may be any valid data selection along the given axis.
Note, if not selection exists yet for the given axis, then all values outside of the given assignment will be set to False. This is in contrast to the fact that non-existent selections are generally treated as all 1. This divergence is made to: i) allow the more intuitive usage for users of selecting elements of interest via assignment rather then having to define all elements that are not of interest and ii) avoid the problem that elements are automatically selected for the user.
list of weak references to the object (if defined)
Return a new data data_selection that defines an XOR between the two selections. NOTE: If an data_selection does not define a data_selection for an axis, then it is interpreted as having an ALL data_selection, i.e., [1,1,1,...1]. As such for axes for which only one side defines an explicit data_selection, the operation equivalent ot an invert/negation of the data_selection that exists. NOTE: Due to the complexity of the XOR this operation typically result in a global -1 axis data_selection in order to be able to represent the resulting data_selection, even if the inputs are all just per-axis selections. Only in the case where both selections define just a single data_selection on the same axis do we keep a simple data_selection. Global selections may require large memory as we need to store one bool for each element in the dataset.
Get the list of axes that are sub-selected by this selections. This may be an empty list.
Parameters: |
|
---|---|
Returns: | List of integers with the axes indices. |
Collapse the data_selection to a single global data_selection.
Parameters: | global_only (bool) – Boolean indicating whether the data_selection should always be collapsed to a single global data_selection (False) or only in case that a global data_selection is already present (True). Default is True. |
---|
Get the number of elements selected by the data_selection. Dimensions without an explicit data_selection are counted as having an ALL data_selection. This is equivalent to calling len() on the object.
Get the number of elements selected by the individual parts of the selections defined in the self.selections object. Dimensions without an explicit data_selection are interpreted as having an ALL data_selection (i.e, the length of that axes is returned)
Returns: | Dictionary of {axis_index: count}. Axis -1 indicates the presence of a global data_selection. |
---|
Load the data associated with this DataSelection.
Remove unnecessary entries from the data_selection dict, i.e., entries that select all elements for a given axis.
Check whether the data_selection is valid.