dataformat Package

Module containing a number of general helper classes for specification of HDF5-based data formats. This module alos contains specific file format implementations, e.g, the brainformat.

Modules

brain.dataformat.base Module with the base classes used for specification of HDF5 file formats and HDF5 object modules.
brain.dataformat.brainformat Module for specification of the BRAIN file format API.
brain.dataformat.annotation Module with classes to help with the specification and management of and interaction with data annotations.

File Format Base Classes

brain.dataformat.base.ManagedObject(hdf_object) Abstract base class defining the base API for
brain.dataformat.base.ManagedGroup(hdf_object) Base class for specification of managed h5py.Group objects.
brain.dataformat.base.ManagedDataset(hdf_object) Base class for specification of managed h5py.Group objects.
brain.dataformat.base.ManagedFile(hdf_object) Base class for specification of managed h5py.File objects.

BRAIN File Format Classes

brain.dataformat.brainformat.BrainDataFile(...) Class for management of HDF5 brain files.
brain.dataformat.brainformat.BrainDataData(...) Class for management of the data group for storage of brain recordings data.
brain.dataformat.brainformat.BrainDataInternalData(...) Class for management of the ‘internal’ group for storage of brain recordings data.
brain.dataformat.brainformat.BrainDataExternalData(...) Class for management of the ‘external’ group for storage of recordings external to the brain.
brain.dataformat.brainformat.BrainDataDescriptors(...) Class for management of the descriptors group for storage of data descriptions and metadata.
brain.dataformat.brainformat.BrainDataStaticDescriptors(...) Class for management of the ‘static’ group for storage of static data descriptors.
brain.dataformat.brainformat.BrainDataDynamicDescriptors(...) Class for management of the ‘dynamic’ group for storage of dynamic data descriptors.
brain.dataformat.brainformat.BrainDataECoG(...) Class for management of managed h5py.Group objects structured to store ECoG brain recording data.
brain.dataformat.brainformat.BrainDataECoGProcessed(...) Class for management of h5py.Group objects structured to store processed ECoG brain recording data.

Annotation Classes

The basic concept for using annotations are as follows. We have DataSelections to describe a particular subset of a given data object (e.g, h5py.Dataset, numpy array, or any other kind of data that supports .shape and h5py.Dataset slicing). An Annotation consists of a type, description, and selection describing a particular data subset. An AnnotationCollection then describes collection annotations and is used to query and manage many annotations. Finally the AnnotationDataGroup describes the interface for storing and retrieving AnnotationCollections from/to HDF5. The brain.dataformat.annotation module provides the following classes for definition, management, interaction, and storage of data annotations, which implement these concepts:

brain.dataformat.annotation.DataSelection(...) A single data data_selection for a given dataset.
brain.dataformat.annotation.Annotation(...) Annotate a particular dataset or subset of data.
brain.dataformat.annotation.AnnotationCollection(...) A collection of annotations
brain.dataformat.annotation.AnnotationDataGroup(...) Managed group for storage of annotations.

base Module

Module with the base classes used for specification of HDF5 file formats and HDF5 object modules.

class brain.dataformat.base.ManagedDataset(hdf_object)

Bases: brain.dataformat.base.ManagedObject

Base class for specification of managed h5py.Group objects.

Variables:hdf_object – See ManagedObject

Functions to be implemented by derived class:

  • get_format_specification(..) : Overwrite to specify the format. See ManagedObject.
  • populate(..) : Overwrite to implement the creation of the object. See ManagedObject.
__getitem__(item)

Numpy-style slicing to read data.

__init__(hdf_object)
Parameters:hdf_object – The h5py.Group or h5py.Dataset object managed by the current instance.
Raises :ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function.
__module__ = 'brain.dataformat.base'
__setitem__(key, value)

NumPy-style slicing to write data

classmethod get_managed_object_type()
class brain.dataformat.base.ManagedFile(hdf_object, mode='r')

Bases: brain.dataformat.base.ManagedObject

Base class for specification of managed h5py.File objects.

Variables:hdf_object – h5py.File or h5py.Group indicating the HDF5 file or string indicating the HDF5 file ot be opened. See also ManagedObject.hdf_object

Functions to be implemented by derived class:

  • get_format_specification(..) : Overwrite to specify the format. See ManagedObject.
  • populate(..) : Overwrite to implement the creation of the object. See ManagedObject.
__init__(hdf_object, mode='r')
Parameters:
  • hdf_object – This can be either the h5py.File object, a h5py.Group or h5py.Dataset contained in the file of interest, or a string indicating the name the file to be opened.
  • mode – Used only if hdf_object is a string and a file is opened anew. Indicate the mode in which a file should be opened, e.g., ‘r’, ‘w’, ‘a’. See the h5py.File documentation for details.
__module__ = 'brain.dataformat.base'
flush()

Flush the HDF5 file.

classmethod get_managed_object_type()
class brain.dataformat.base.ManagedGroup(hdf_object)

Bases: brain.dataformat.base.ManagedObject

Base class for specification of managed h5py.Group objects.

Variables:hdf_object – See ManagedObject

Implicit attributes available via the h5py.Group object

Variables:
  • attrs – HDF5 Attributes for this group. Same as self.hdf_object.attrs.
  • id – The groups’s low-level identifer; an instance of h5py.GroupID. Same as self.hdf_object.id
  • ref – An HDF5 object reference pointing to this group. See using object references as part of the h5py docs. Same as self.hdf_object.ref.
  • regionref – A proxy object allowing you to interrogate region references. See using region references as part of the h5py docs. Same as self.hdf_object.regionref.
  • name – String giving the full path to this group.
  • file – The BrainDataFile object for the file instance in which the group resides. If the file is not a managed BrainDataFile then file is set to the h5py.File object instead.
  • parent – The ManagedGroup object for the group instance containing this group. If the parent group is not a managed group then parent will be set to the corresponding h5py.Group object instead.

Functions to be implemented by derived class:

  • get_format_specification(..) : Overwrite to specify the format. See ManagedObject.
  • populate(..) : Overwrite to implement the creation of the object. See ManagedObject.
__contains__(item)

Dict-like containership testing. name may be a relative or absolute path.

__getitem__(item)

Enable slicing into the group.

__init__(hdf_object)
Parameters:hdf_object – The h5py.Group or h5py.Dataset object managed by the current instance.
Raises :ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function.
__module__ = 'brain.dataformat.base'
classmethod get_managed_object_type()
items()

Get (name, value) pairs for object directly attached to this group. Values for broken soft or external links show up as None.

keys()

Get the names of directly attached group members. See h5py.Group

values()

Get the objects contained in the group (Group and Dataset instances). Broken soft or external links show up as None.

class brain.dataformat.base.ManagedObject(hdf_object)

Bases: object

Abstract base class defining the base API for classes responsible for managing a specific hdf5 h5py.Group or h5py.Dataset object.

Functions to be implemented by derived class:

  • get_managed_object_type : Overwrite in case that the derived class manages a dataset
  • get_format_specification : Overwrite to specify the format
  • populate: Overwrite to implement the creation of the object
Variables:
  • type_attribute_name – Name of the attribute used to store the information about the management class.
  • description_attribute_name – Name of the attribute used to store the human-readable descriptions of the purpose and content of the class.
  • object_id_attribute_name – Name of the attribute used to store an optional object id which may be used to reference the data in a persistent fashion (e.g. DOI associated with the given data object)
  • hdf_object – The HDF5 object managed by the object
  • ... – All attributes of the hdf_object set object are exposed also via this class through the __getattr__ function and can be used as usual.
__dict__ = <dictproxy object at 0x110d7dfa0>
__getattr__(item)
__init__(hdf_object)

Initialize the management object.

Parameters:hdf_object – The h5py.Group, h5py.File, or h5py.Dataset object managed by the current instance.
Raises :ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function.
__module__ = 'brain.dataformat.base'
__weakref__

list of weak references to the object (if defined)

classmethod check_attribute_format_compliance(attribute_spec, hdf_object)

Internal helper function used to check all attributes of a hdf5 object against a given attribute specification. Warnings are raised for every non-format compliance found.

Parameters:
  • attribute_spec – The attribute specification
  • hdf_object – The object with which the attributes with the given attribute specification are associated with.
Returns:

Boolean indicating compliance.

classmethod check_contained_managed_objects_format_compliance(managed_spec, parent_group)

Internal helper function used to check all managed objects in a parent object against a given managed object specification. Warnings are raised for every non-format compliance found.

Parameters:
  • managed_spec – The managed specification
  • parent_group – The parent object that should contain one or more object with the given managed object specification.
Returns:

Boolean indicating compliance.

classmethod check_dataset_format_compliance(dataset_spec, dataset_list)

Internal helper function used to check all datasets in a given list against a given dataset specification. Warnings are raised for every non-format compliance found.

Parameters:
  • dataset_spec – The dataset specification
  • dataset_list – The list of datasets to be validated agains the spec..
Returns:

Boolean indicating compliance.

classmethod check_dataset_format_compliance_contained(dataset_spec, parent_group)

Internal helper function used to check all relevant datasets in a parent object against a given dataset specification. Warnings are raised for every non-format compliance found.

Parameters:
  • dataset_spec – The dataset specification
  • parent_group – The parent object that should contain one or more object with the given dataset specification.
Returns:

Boolean indicating compliance.

classmethod check_dataset_format_compliance_single(dataset_spec, hdf_dataset)

Internal helper function used to check a given hdf dataset against a given dataset specification. Warnings are raised for every non-format compliance found.

Parameters:
  • dataset_spec – The dataset specification
  • hdf_dataset – The hdf dataset to be validated against the spec.
Returns:

Boolean indicating compliance.

check_format_compliance(current_only=True)

Check if the HDF5 object assigned to the given instance of a subclass of ManagedObject is compliant with the corresponding format specification. Warnings are raised for every non-format compliance found. Use e.g, “with warnings.catch_warnings(record=True) as w:” when calling the function to record all warnings.

Parameters:current_only – If current only is set then the current object is validated. Set to false in order to force validation of all objects of the current type contained in the parent group of the current object. The current object is self.hdf_object.
Returns:Boolean indicating whether the hdf object is compliant with the format.
classmethod check_group_format_compliance(group_spec, group_list)

Internal helper function used to check all groups in a parent object against a given group specification. Warnings are raised for every non-format compliance found.

Parameters:
  • group_spec – The group specification
  • group_list – List of hdf group object to be validated against the given spec.
Returns:

Boolean indicating compliance.

classmethod check_group_format_compliance_contained(group_spec, parent_group)

Internal helper function used to check all relevant groups in a parent object against a given group specification. Warnings are raised for every non-format compliance found. This function uses the check_format_compliance(..) function.

Parameters:
  • group_spec – The group specification
  • parent_group (h5py.Group or h5py.File) – The parent object that should contain one or more object with the given group specification.
Returns:

Boolean indicating compliance.

classmethod check_group_format_compliance_single(group_spec, hdf_group)

Internal helper function used to check a given hdf_group against a given group specification. Warnings are raised for every non-format compliance found. This function used the check_format_compliance(..) function.

Parameters:
  • group_spec – The group specification
  • hdf_group – Th group object to be validated against the spec.
Returns:

Boolean indicating compliance.

close()

Close the HDF5 file associated with the managed object.

classmethod create(parent_object, object_id=None, dataset_args=None, force_creation=False, external=False, **kwargs)

Create new managed object in the given parent group. The type of object created is decided by the get_managed_object_type(...) f function defined implemented by derived classes (default is ‘group’). The The functions creates the new object and assigns common attributes. It then creates the manager object and calls the corresponding populate(...) function to initialize the new object.

NOTE See the populate method of the corresponding derived class for details on the additional keyword arguments.

Parameters:
  • parent_object (h5py.Group or String) – The h5py.Group, h5py.File parent object in which the managed object should be created. or the filename in case that the managed object is a file that needs to be created. This may also be a ManagedObject for which the corresponding h5py object will be used to construct the object.
  • object_id – The object id to be used. This should be a unique identifier to allow users to understand the origin of the data and to relate data with each other.
  • dataset_args – In case the managed object is a ‘dataset’ (see get_managed_object_type), then this dict argument can be used to specify additional arguments for the h5py.Group.require_dataset function.
  • force_creation (bool) – Force the creation of the managed object as part of the parent object, even if the parent object does not specify that the managed object to be created is part of its specification. (Default value: False)
  • external (bool) – Boolean indicating whether the ManagedObject should be created in an external file and linked to from the parent object (True) or whether the object should be created within the parent directly (False). Default value is False, i.e, the object is stored as part of the parent directly without creating a new file. NOTE The external option has not effect when the the object to be created is a file, i.e, get_managed_object_type(...) returns file. The behavior in this case is as follows: 1) If the object to be created is file and the parent_object is a string, then the file will be created as usual and the value of the external option has no effect. 2) If the object to be created is a file and the parent_object is a h5py.Group, then the file will be created externally using a filename automatically determined based on the name of the parent and the format specification and an external link to the root group of the new file will be created within the given parent, i.e, the behavior is as if external is True.
  • kwargs – Additional keyword arguments for the populate function implemented in the derived classes (see in derived classes for details).
Returns:

Instance of the derived ManagedObject responsible for management of the newly create h5py.Group or h5py.Dataset object.

Raises :

ValueError is raised in case that a conflicting object should exist, an illegal type is encountered, or if the creation of the object is not explicitly specified as permitted in the spec.

description_attribute_name = 'format_description'
format_spec_attribute_name = 'format_specification'
format_specification_changed()

Check whether the format specification as given in the file is the same as the format specification given by the current instance of the managed object.

classmethod get_all(parent_group, get_h5py=True)

Get all objects that are managed by the current class contained in the parent group.

Parameters:
  • parent_group (h5py.Group, ManagedGroup, or ManagedFile) – The h5py.Group, ManagedGroup, or ManagedFile object for which all objects managed by the current class should be retrieved.
  • get_h5py (bool) – Boolean indicating whether we want to get the h5py objects (True) or the instances of the manager class for the objects (False). Default is True.
Returns:

List of managed h5py.Group, h5py.Dataset objects managed by the current class or list of ManagedObject instances responsible for managing the found groups/datasets.

get_filename(absolute_path=False)

Get the name of the file containing the managed object. Alternatively we may also use self.file.filename

Parameters:absolute_path (bool) – Set to True to retrieve an absolute path
classmethod get_format_specification()

Return dictionary describing the specification of the format.

get_format_specification_from_file()

Get the format specification as given in the file.

Returns:Python dict with the format specification as given in the file or None if not available.
classmethod get_format_specification_recursive()

Recursively construct the format specification for the current class including the specification of all included managed objects. The specification of managed objects are inserted in the groups and dataset dicts respectively and are accordingly removed from the managed_objects lists (i.e., the managed_objects list is empty after all replacements have been completed.). The result is a full specification for the current class. The function also adds the specification for the format_type, description, ‘format_specification`, and object_id attributes as those are implicitly defined by the ManageObject class, and are implicitly part of the spec of all managed object (even though the specific spec of the different objects usually does not explicitly declare them).

Returns:Python dict with the full format specification for the current managed object class. The spec may be converted to JSON using json.dumps.
get_h5py()

Get the h5py object managed. This a convenience function that simply returns self.hdf_object.

classmethod get_managed_object(hdf_object)

Get an instance of the ManagedObject class for managing the given hdf_object. None is returned in case that the given hdf_object is not a managed object.

Parameters:

hdf_object – The h5py.Dataset, h5py.File, or h5py.Group for which the corresponding instance of the relevant derived class of ManagedObject should be generated. If a ManagedObject instance is provided as input, then the hdf_object will be returned. NOTE: The behavior fo the function is undefined for objects other then the mentioned h5py data objects. NOTE: A string to a valid HDF5 data file is allowed for convenience, in which case the file will be opened in append mode=a if possible and if that fails in read-only mode ‘r’. A warning will be issued if the file could only be opened in read-only mode.

Returns:

Instance of ManagedObject or None in case the object is not managed or the manager object cannot be constructed.

Raises:
  • NameError – In case that the indicated format_type class was not found.
  • ValueError – In case that an invalid hdf_object is given
classmethod get_managed_object_type()

Get whether a group or dataset is managed by the class. The default implementation assumes that a group is managed. The method must be overwritten in derived classes that manage datasets.

Returns:String ‘group’ or ‘dataset’ or ‘file’ indicating whether a group, file, or dataset is managed by the class.
classmethod get_num_dimensions_from_dataset_specification(dataset_spec)

Get the number of dimensions for a dataset based on the specifcation of dimension scales.

Returns:None in case that no dimensions scales are given and the dimensionality of the dataset is not fixed. Returns a tuple of two integer indicating the minimum number of dimensions and the maximum number of dimensions.
get_object_id()

Get the optional id of the object.

Returns:The id of the object or None if no id exists.
has_object_id()

Check whether the object has an id assigned to it.

classmethod is_managed(hdf_object)

Check whether the given hdf object is managed by a brain file API class.

Parameters:hdf_object – The hdf5 object to be checked (dataset or group)
Returns:Boolean indicating whether the given hdf5 object is managed or not.
classmethod is_managed_by(hdf_object)

Check whether the given hdf_object is managed by the current class.

Param :hdf_object: The hdf5 object to be checked
Returns:Boolean indicating whether the object is managed by the current class.
object_id_attribute_name = 'object_id'
populate(**kwargs)

The populate method is called by the create method after the basic common setup is complete. The function should be used to populate the managed object (e.g., add dimensions to a datadset or add required datasets to a group. The populate method is passed the kwargs handed to the create method.

Parameters:kwargs – Any additional keyword arguments supported by the specific implementation of the populate method.
set_object_id(object_id)

Define the id of the object.

Parameters:object_id (str, unicode, or None) – The object id to be used. If None is given, then the object id will be deleted.
type_attribute_name = 'format_type'
class brain.dataformat.base.ManagedObjectFile(hdf_object, mode='r')

Bases: brain.dataformat.base.ManagedFile

Container file used to store a single managed object in a separate external file that is then linked to other parent files. This container is used as part of the external storage feature available for all ManagedObject implementations.

NOTE: Parent files will link to the managed object contained in the file and NOT to the
root group of the ManagedObjectFile container.
__getitem__(item)

Enable slicing into the root group of the file

__module__ = 'brain.dataformat.base'
classmethod get_format_specification()

Get dictionary describing the format.

populate(**kwargs)

Populate the managed object file.

braindata Module

Module for specification of the BRAIN file format API.

class brain.dataformat.brainformat.BrainDataData(hdf_object)

Bases: brain.dataformat.base.ManagedGroup

Class for management of the data group for storage of brain recordings data.

Variables:hdf_object – See ManagedGroup

Implicit instance variables (i.e., these are mapped names but not stored explicitly)

Variables:
  • internal_# – Same as self.internal(#). Usually use internal_0
  • external_# – Same as self.external(#). Usually use external_0
__getattr__(item)
__init__(hdf_object)
Parameters:hdf_object – The h5py.Group object managed by the current instance.
Raises :ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function.
__module__ = 'brain.dataformat.brainformat'
external(index=0)

Get the external managed object with the external data. In principal the current format defines that there should be always exactly one external group. However, multiple could be supported so this function is prepared to handle this in case the format should change in the future.

Parameters:index – Optional input parameter (should be always 0 for now) to define which external group should be retrieved. Currently their should be always exactly one.
Returns:BrainDataExternalData object or None
classmethod get_format_specification()

Get dictionary describing the format.

internal(index=0)

Get the internal managed object with the internal data. In principal the current format defines that there should be always exactly one internal group. However, multiple could be supported so this function is prepared to handle this in case the format should change in the future.

Parameters:index – Optional input parameter (should be always 0 for now) to define which internal group should be retrieved. Currently their should be always exactly one.
Returns:BrainDataInternalData object or None
populate(**kwargs)

Populate the Brain file with the Data and Descriptors group.

class brain.dataformat.brainformat.BrainDataDescriptors(hdf_object)

Bases: brain.dataformat.base.ManagedGroup

Class for management of the descriptors group for storage of data descriptions and metadata.

Variables:hdf_object – See ManagedFile

Implicit instance variables (i.e., these are mapped names but not stored explicitly)

Variables:
  • static_# – Same as self.static(#) where # is the index of the static descriptor group. The current format assumes a single static group, i.e., only static_0 is typically valid.
  • dynamic_# – Same as self.dynamic(#) where # is the index of the static descriptor group. The current format assumes a single static group, i.e., only static_0 is typically valid.
__getattr__(item)
__init__(hdf_object)
Parameters:hdf_object – The h5py.Group object managed by the current instance.
Raises :ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function.
__module__ = 'brain.dataformat.brainformat'
dynamic(index)

Get the dynamic descriptors managed object with the dynamic descriptor data. In principal the current format defines that there should be always exactly one dynamic descriptor group. However, multiple could be supported so this function is prepared to handle this in case the format should change in the future.

Parameters:index – Optional input parameter to define the index of dynamic descriptors. Default is index=0.
Returns:BrainDataEcoGProcessed object or None
classmethod get_format_specification()

Get dictionary describing the format of the group.

populate(**kwargs)

Populate the ‘descriptors’ group with all required elements.

static(index)

Get the static descriptors managed object with the static descriptor data. In principal the current format defines that there should be always exactly one static descriptor group. However, multiple could be supported so this function is prepared to handle this in case the format should change in the future.

Parameters:index – Optional input parameter to define the index of static descriptors. Default is index=0.
Returns:BrainDataEcoGProcessed object or None
class brain.dataformat.brainformat.BrainDataDynamicDescriptors(hdf_object)

Bases: brain.dataformat.base.ManagedGroup

Class for management of the ‘dynamic’ group for storage of dynamic data descriptors.

__init__(hdf_object)
Parameters:hdf_object – The h5py.Group object managed by the current instance.
Raises :ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function.
__module__ = 'brain.dataformat.brainformat'
classmethod get_format_specification()

Get dictionary describing the format of the group.

populate(**kwargs)

Populate the ‘data’ group with all required elements.

class brain.dataformat.brainformat.BrainDataECoG(hdf_object)

Bases: brain.dataformat.base.ManagedGroup

Class for management of managed h5py.Group objects structured to store ECoG brain recording data.

Variables:hdf_object – See ManagedObject
__getitem__(item)

Read data.

__init__(hdf_object)
Parameters:hdf_object – The h5py.Group or h5py.Dataset object managed by the current instance.
Raises :ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function.
__module__ = 'brain.dataformat.brainformat'
__setitem__(key, value)

Write data.

anatomy(anatomy_names=None, anatomy_ids=None, format_spec=None, annotation_data_group=None, get_hdf=True)

Get the anatomy information associated with the dataset.

Parameters:
  • anatomy_names – 1D array of strings with the name of the region an electrode is located in. By setting this parameter we can add an anatomical description to a dataset if it does not has one.
  • anatomy_ids – 1D array of ints with the integer id’s of the regions the electrode. Note, if anatomy_names is given and anatomy_ids is empty, then the array will be autogenerated by creating an index for the unique values in the anatomy_names array. By setting this parameter we can add an anatomical description to a dataset if it does not has one.
  • format_spec – Format specification. Usually this is None, indicating that the specification defined by self.get_format_specification should be used. In some cases, derived classes will expand the base class specification or use the same specification, simply with different names and values. To allow the base classes to reuse the populate function, the function allows a different format specification to be provided as input. CAUTION! The format_spec most be compliant with the structure described by the spec for this class in order for the populate function to work correctly.
  • annotation_data_group (AnnotationDataGroup) – Optional AnnotationDataGroup object used to store anatomy description as annotations. This is only relevant when either the anatomy_names and/or anatomy_ids parameter is specified. If the annotation_data_group is given then the anatomy will automatically be transformed to annotations and added to the given annotations object.
  • get_hdf (bool) – Get the value in numpy format (set to False) or the h5py dataset for the layout data (set to True). Default value is True.
Raises KeyError:
 

A KeyError is generated if a new anatomy_names array or anatomy_ids array is given but a prior description already exists.

Returns:

1D Numpy array indicating for each recording the name of the region. The function also returns a second 1D numpy indicating for each recording the index of the region the electrode was located in. Note, if any of the anatomy arrays are missing, then None will be returned instead of the anatomy array.

annotations(index=None)

Get the requested annotation object.

Parameters:index – The index of the annotation object to be retrieved.
Returns:If index is specified to a value >=0 then a single AnnotationDataGroup object is returned (or None if the index is invalid). Otherwise a list of all AnnoationDataGroup object is returned.
dims(get_hdf=True, axis=None)

Get the h5py Dimension scales associated with the ecog dataset

Parameters:get_hdf – Get the h5py.DimensionScale associated with the ecog dataset if set to True (Default). If set to False a list of lists of dicts with a summary of the dimensions is returned instead which is formatted as follows: [axis_index][scale_index]. Each dict then contains the name, unit, dataset, and axis information for the dimensions scale.
ecog_data(get_hdf=True)

Get the h5py.Dataset object of the ecog data.

For get_hdf==True this is the same as self[‘raw_data’] and self[‘ecog_data’].

For get_hdf==False this is the same as self[‘raw_data’][:] and self[‘ecog_data’][:]

Parameters:get_hdf – Get the value in numpy format (set to False) or the h5py dataset for the sampling rate (set to True). Default value is True.
Returns:h5py.Dataset if get_hdf is True. numpy array of the full data if get_hdf is False. None in case that the ecog data should be missing
classmethod get_format_specification()

Get dictionary describing the format.

has_anatomy()

Check whether an anatomy description is available for the ECoG dataset.

Returns:Boolean indicating whether an anatomy description by name and or id is available.
has_annotations()

Check whether annotations exists for the ECoG dataset.

has_layout()

Check whether a layout description is available for the ECoG dataset.

Returns:Boolean indicating whether a layout dataset is available.
layout(get_hdf=True, new_layout=None)

Get the h5py.Dataset of the layout of the ecog_data. Same as self[‘layout’]

Parameters:
  • get_hdf – Get the value in numpy format (set to False) or the h5py dataset for the layout data (set to True). Default value is True.
  • new_layout – If a layout dataset has not been defined previously then we can generate a new layout by providing an according numpy data array (or h5py.Dataset) to this parameter.
Returns:

h5py.Dataset if get_hdf is True. Numpy array if get_hdf is False. None in case the layout data does not exist.

num_annotations()

Get the number of annotation objects associated with the ECoG data.

populate(ecog_data=None, ecog_data_shape=None, ecog_data_type=None, sampling_rate=16, start_time=0, layout=None, anatomy_names=None, anatomy_ids=None, format_spec=None)

The populate method is called by the create method after the basic common setup is complete. The function should be used to populate the managed object (e.g., add dimensions to a datadset or add required datasets to a group. The populate method is passed the kwargs handed to the create method.

Parameters:
  • ecog_data (2D numpy array of [#channels, #samplesPerChannel]) – The neurallogical recordings data to be written. If present then ecog_data_shape and ecog_data_dtype are ignored.
  • ecog_data_shape (2D tuple indicating the number of recordings and samples per recording.) – Shape of the neural data set. Required if ecog_data is not present. If ecog_data_shape is ignored.
  • ecog_data_type (numpy.dtype or string indicating the dtype) – Datatype of the neural data set. Required if ecog_data is not present. If ecog_data_shape is ignored.
  • sampling_rate (int or float) – Sampling rate in KHz
  • start_time (int64) – Start time of the recording in time since epoch (default behavior by time.time()) The Unix epoch (or Unix time or POSIX time or Unix timestamp) is the number of seconds that have elapsed since January 1, 1970 (midnight UTC/GMT), not counting leap seconds ( in ISO 8601: 1970-01-01T00:00:00Z). Literally speaking the epoch is Unix time 0 (midnight 1/1/1970), but ‘epoch’ is often used as a synonym for ‘Unix time’.
  • layout (numpy array of ints describing for each channel index its location in the layout matrix. Use -1 to indicate unoccupied channels. This strategy allows complex layouts to be described via a rectangular matrix.) – The spatial layout of the ecog_data.
  • anatomy_names – 1D array of strings with the name of the region an electrode is located in.
  • anatomy_ids – 1D array of ints with the integer id’s of the regions the electrode. Note, if anatomy_names is given and anatomy_ids is empty, then the array will be autogenerated by creating an index for the unique values in the anatomy_names array.
  • format_spec – Format specification. Usually this is None, indicating that the specification defined by self.get_format_specification should be used. In some cases, derived classes will expand the base class specification or use the same specification, simply with different names and values. To allow the base classes to reuse the populate function, the function allows a different format specification to be provided as input. CAUTION! The format_spec must be compliant with the structure described by the spec for this class in order for the populate function to work correctly.
Raises :

ValueError is raised in case that the object cannot be populated.

sampling_rate(get_hdf=True)

Get the h5py.Dataset object of the sampling rate data. Same as self[‘sampling_rate’]

Parameters:get_hdf – Get the value in numpy format (set to False) or the h5py dataset for the sampling rate (set to True). Default value is True.
Returns:h5py.Dataset if get_hdf is True. Float if get_hdf is False. None in case the sampling_rate data should be missing.
class brain.dataformat.brainformat.BrainDataECoGProcessed(hdf_object)

Bases: brain.dataformat.brainformat.BrainDataECoG

Class for management of h5py.Group objects structured to store processed ECoG brain recording data.

Variables:hdf_object – See ManagedObject and BrainDataECoG.
__init__(hdf_object)
Parameters:hdf_object – The h5py.Group or h5py.Dataset object managed by the current instance.
Raises :ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function.
__module__ = 'brain.dataformat.brainformat'
classmethod get_format_specification()

Get dictionary describing the format.

This function adapts the specification of the BrainDataECoG class.

Returns:Dictionary with the format specification.
original_name()

Get the original name specificed by the user for the dataset.

Returns:String indicating the original name of the dataset specified by the user or None in case no user-defined name was specified.
populate(ecog_data=None, ecog_data_shape=None, ecog_data_type=None, ecog_data_units=None, sampling_rate=16, start_time=0, layout=None, anatomy_names=None, anatomy_ids=None, original_name=None, frequency_bands=None)

The populate method is called by the create method after the basic common setup is complete. The function should be used to populate the managed object (e.g., add dimensions to a datadset or add required datasets to a group. The populate method is passed the kwargs handed to the create method.

Parameters:
  • ecog_data (2D numpy array of [#channels, #samplesPerChannel]) – The neurallogical recordings data to be written. If present then ecog_data_shape and ecog_data_dtype are ignored.
  • ecog_data_shape (2D tuple indicating the number of recordings and samples per recording.) – Shape of the neural data set. Required if ecog_data is not present. If ecog_data_shape is ignored.
  • ecog_data_type (numpy.dtype or string indicating the dtype) – Datatype of the neural data set. Required if ecog_data is not present. If ecog_data_shape is ignored.
  • ecog_data_units (String) – String indicating the units used for the processed ECoG recordings.
  • sampling_rate (int or float) – Sampling rate in KHz
  • start_time (int64) – Start time of the recording in time since epoch (default behavior by time.time()) The Unix epoch (or Unix time or POSIX time or Unix timestamp) is the number of seconds that have elapsed since January 1, 1970 (midnight UTC/GMT), not counting leap seconds ( in ISO 8601: 1970-01-01T00:00:00Z). Literally speaking the epoch is Unix time 0 (midnight 1/1/1970), but ‘epoch’ is often used as a synonym for ‘Unix time’.
  • layout (numpy array of ints describing for each channel index its location in the layout matrix.) – The spatial layout of the ecog_data
  • anatomy_names – 1D array of strings with the name of the region an electrode is located in.
  • anatomy_ids – 1D array of ints with the integer id’s of the regions the electrode. Note, if anatomy_names is given and anatomy_ids is empty, then the array will be
  • original_name (String) – Original name of the dataset as specified by the user
  • frequency_bands (1D Numpy array fo length ecog_data.shape[2] (or ecog_data_shape[2], whichever applies). Default value is None, in which case the dimensions scale for the third dimension is omitted (as it is optional in the spec).) – Numpy array indicating the center of the frequency bands stored in the third dimension of the processed data array. May be None if the bands are not known.
Raises :

ValueError is raised in case that the object cannot be populated.

class brain.dataformat.brainformat.BrainDataExternalData(hdf_object)

Bases: brain.dataformat.base.ManagedGroup

Class for management of the ‘external’ group for storage of recordings external to the brain.

__init__(hdf_object)
Parameters:hdf_object – The h5py.Group object managed by the current instance.
Raises :ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function.
__module__ = 'brain.dataformat.brainformat'
classmethod get_format_specification()

Get dictionary describing the format of the group.

populate(**kwargs)

Populate the ‘data’ group with all required elements.

class brain.dataformat.brainformat.BrainDataFile(hdf_object, mode='r')

Bases: brain.dataformat.base.ManagedFile

Class for management of HDF5 brain files.

Variables:hdf_object – See ManagedFile

Implicit instance variables (i.e., these are mapped names but not stored explicitly)

Variables:
  • data_# – Same as self.data(#). Usually use data_0
  • descriptors_# – Same as self.descriptors(#). Usually use descriptors_0
__getattr__(item)
__getitem__(item)

Enable slicing into the file

__init__(hdf_object, mode='r')
Parameters:
  • hdf_object – This can be either the h5py.File object, a h5py.Group or h5py.Dataset contained in the file of interest, or a string indicating the name the file to be opened.
  • mode – Used only if hdf_object is a string and a file is opened anew. Indicate the mode in which a file should be opened, e.g., ‘r’, ‘w’, ‘a’. See the h5py.File documentation for details.
__module__ = 'brain.dataformat.brainformat'
data(index=0)

Get the data managed object with the data. In principal the current format defines that there should be always exactly one descriptors group. However, multiple could be supported so this function is prepared to handle this in case the format should change in the future.

Parameters:index – Optional input parameter (should be always 0 for now) to define which data group should be retrieved. Currently their should be always exactly one.
Returns:BrainDataData object or None
descriptors(index=0)

Get the descriptors managed object with the descriptors. In principal the current format defines that there should be always exactly one descriptors group. However, multiple could be supported so this function is prepared to handle this in case the format should change in the future.

Parameters:index – Optional input parameter (should be always 0 for now) to define which descriptors group should be retrieved. Currently their should be always exactly one.
Returns:BrainDataDescriptors object or None
classmethod get_format_specification()

Get dictionary describing the format.

populate(**kwargs)

Populate the Brain file with the Data and Descriptors group.

class brain.dataformat.brainformat.BrainDataInternalData(hdf_object)

Bases: brain.dataformat.base.ManagedGroup

Class for management of the ‘internal’ group for storage of brain recordings data.

Variables:hdf_object – See ManagedGroup

Implicit instance variables (i.e., these are mapped names but not stored explicitly)

Variables:
  • ecog_data_# – Same as self.ecog_data(#) where # is the index of the raw ecog dataset.
  • ecog_data_processed_# – Same as self.ecog_data_processed(#) where # is the index of the processed ecog dataset.
__getattr__(item)
__init__(hdf_object)
Parameters:hdf_object – The h5py.Group object managed by the current instance.
Raises :ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function.
__module__ = 'brain.dataformat.brainformat'
ecog_data(index=0)

Get the ecog managed object with the ecog data.

Parameters:index – Optional input parameter to define the index of the ecog dataset.
Returns:BrainDataEcoG object or None
ecog_data_processed(index=0)

Get the ecog_processed_ managed object with the processed ecog data.

Parameters:index – Optional input parameter to define the index of the processed ecog dataset.
Returns:BrainDataEcoGProcessed object or None
classmethod get_format_specification()

Get dictionary describing the format of the group.

num_ecog_data()

Get the number of ecog datasets available.

num_ecog_data_processed()

Get the number of processed ecog datasets available.

populate(**kwargs)

Populate the ‘data’ group with all required elements.

class brain.dataformat.brainformat.BrainDataMultiFile(hdf_object, mode='r')

Bases: brain.dataformat.base.ManagedObjectFile

Container file used to store a collection of BrainDataFile collections of neural data. This container is used to organize multiple experiments, sessions, etc. stored in separate files into a single collection, allow a user to open a single file and interact with the collection of files as if they were all stored in the same file.

NOTE: This container typically stores external links to other BrainDataFile objects. However, the
container may also be self-contained.
__module__ = 'brain.dataformat.brainformat'
add_object(hdf_object, relative_link=True, force_creation=False)

Add the given BrainDataFile object to the collection.

Parameters:
  • hdf_object – The object to be added to the collection. This may either be: i) and instance of BrainDataFile, ii) and instance of h5py.Group pointing to a BrainDataFile object, iii) a tuple of two strings (filename, path) or) a dict {‘filename’:..., ‘path’:...} describing the location of the file and the path in the hdf5 file. NOTE
  • relative_link (bool) – Should we use relative links (i.e., relative to the BrainDataMultiFile where we are adding the link (True) or should we use absolute paths to external files (False). Default value is True.
  • force_creation (bool) – Force the creation of the link, even if it is not explicitly permitted by the format specification and/or the link cannot be verified. E.g, we may not be able to always open the linked object to verify that the link actually points to a BrainDataFile managed object or other support object.
Raises:
  • ValueError – A ValueError is raised in case that the description of the link is invalid.
  • IOError – In case that the link cannot be established
classmethod get_format_specification()

Get dictionary describing the format.

class brain.dataformat.brainformat.BrainDataStaticDescriptors(hdf_object)

Bases: brain.dataformat.base.ManagedGroup

Class for management of the ‘static’ group for storage of static data descriptors.

__init__(hdf_object)
Parameters:hdf_object – The h5py.Group object managed by the current instance.
Raises :ValueError in case the given hdf_object does not match the type expected by the class as indicated byt the get_managed_object_type() function.
__module__ = 'brain.dataformat.brainformat'
classmethod get_format_specification()

Get dictionary describing the format of the group.

populate(**kwargs)

Populate the ‘data’ group with all required elements.

annotation Module

Module with classes to help with the specification and management of and interaction with data annotations. Annotations are used to describe specific subsets of data in further detail. This is used, e.g., to describe the location of data channels or to mark events in time.

class brain.dataformat.annotation.Annotation(annotation_type, description, data_selection)

Bases: object

Annotate a particular dataset or subset of data.

Variables:
  • annotation_type – The type of annotation, e.g., anatomy, event, or general feature. See AnnotationCollection.annotation_type_indexes. Other user-defined types are permitted but it is recommended to chose from the predefined annotation_type_indexes if possible.
  • description – The description of the annotation
  • data_selection – The data subset the annotation refers to.
__and__(other)
__contains__(other)

Check whether a data_selection is a subset of the other data_selection.

__dict__ = <dictproxy object at 0x111582670>
__eq__(other)

Equals operator (==)

__ge__(other)

Greater than or equals operator (>=)

__gt__(other)

Greater than operator (>)

__init__(annotation_type, description, data_selection)
__invert__()
__le__(other)

Less than or equals operator (<=)

__len__()

Get number of selected elements.

__logical_operations__(other, operation)
__lshift__(other)

Check whether the data selection of the annotation precedes the selection of the other annotation. Same as self.data_selection >> other.data_selection. See DataSelection.__lshift__() for details.

__lt__(other)

Less than operator (<)

__module__ = 'brain.dataformat.annotation'
__ne__(other)

Not equals operator (!=)

__or__(other)
__rshift__(other)

Check whether the data selection of the annotation follows the selection of the other annotation. Same as self.data_selection >> other.data_selection. See DataSelection.__rshift__() for details.

__weakref__

list of weak references to the object (if defined)

__xor__(other)
data()

Load the data associated with this DataSelection.

class brain.dataformat.annotation.AnnotationCollection(data, annotation_type_indexes, annotation_types, descriptions, selection_refs, selections, collection_description='')

Bases: object

A collection of annotations

Instance Variables

Variables:
  • data_object – h5py.Dataset, managed object, or numpy array with the data
  • annotation_types – 1D (vlen str) Dataset with the list of all available types (h5py.Dataset or numpy array)
  • annotation_type_indexes – 1D h5py.Dataset or numpy array with the index into the annotation_types array indicating for each annotation the corresponding annotation type.
  • descriptions – h5py.Dataset or numpy array with the annotation descriptions
  • selection_indexes – h5py.Dataset or numpy array with the indexes of the selections that apply. This is a 2D dataset of shape (#annotations, #axes+1). The axes dimension is ordered as [-1,0,1,...].
  • selections – dict of h5py.Datasets or numpy arrays with all possible selections
  • collection_description – String with a description of the annotation collection

Filtering

Filtering provides means to locate annotations of interest. Filtering is performed via the provided set of filter functions, including:

  • index_filter(..) : Select all annotations with the given index.
  • axis_filter(..) : Find all annotations that select by a given set of axes
  • type_filter(..) : Find all annotations of the given type
  • description_filer(..) : Find all annotations with the given description
  • description_contains_filter(..) : Find all annotations where the description contains the given string

Filter functions generate a 1D bool-type numpy array, indicating which annotations are selected by the filter. As bool arrays, results of filters may be combined using bitwise logical operations, e.g.,

  • a & b : AND : Select all annotations selected by both filters a and b
  • a | b : OR : Select all annotations that are selected by either filter a or b
  • a ^ b XOR : Exclusive or, select all annotations where the filters a and b differ
  • ~a : NOT : Invert the selection of the filter, selecting all annotations not selected by a

Selecting:

Once we have identified a set of relevant annotations via filtering one can select the annotations of interest directly using standard array slicing. E.g,:

>>> a = AnnotationCollection(..)
>>> f = a.type_filter('event') & a.axis_filter([0])
>>> v = a[f]

The result of selecting annotations is reduced AnnotationCollection object. In addition to filter, one can also select annotations directly using standard data selection/slicing, e.g, a[0:10] to select the first 10 annotations.

NOTE: When sub-selection is performed, all data relevant to the selected annotations will be loaded into memory, whereas when an AnnotationCollection is constructed initially it may be initialized using h5py.Dataset objects where the data resides in file and is loaded by the filters as needed.

Getting the Annotations

From the AnnotationCollection we can retrieve a list of all selected Annotation objects via the get_annotations() functions. This will convert the annotations from the collective data structures used for filtering to individual Annotation objects. This is typically done after the filtering is complete.

Merging Annotations

Once we have selected and retrieved the Annotation objects of interest, the individual Annotations may be combined using standard bitwise logical operators and compared using standard comparison operators (see the documentation of the Annotation class for further details. As convenience functions, the AnnotationCollection class provides a set of merge functions which will generate a single combined Annotation by merging all the selected annotations using a given bitwise logical operation, e.g, merge_and(..), merge_or, and merge_xor

Other Operations

  • len : Get the number of annotations in the collection using the standartd Python len(a).

Example

>>> a = AnnotationDataGroup(...)  # Load annotation collection from file
>>> f = a.type_filter('event') & a.axis_filter([0])  # Find all annotations that define an event on the axis 0
>>> s = a[f]  # Select all relevant annotations
>>> s_all = s.get_annotations()  # Get all annotations
>>> s1 = a.merge_or()  # Define a single combined annotation
__dict__ = <dictproxy object at 0x1110d7c20>
__getitem__(selection)

Get a new AnnotationCollection for the subset of annotations selected.

NOTE: While most other variables are sub-selects (and loaded into memory) the self.selections collection of selections remains unmodified. This strategy i) allows us to keep the selections out-of-core in the HDF5 file, and ii) avoids complex updates of references to the selections.

__init__(data, annotation_type_indexes, annotation_types, descriptions, selection_refs, selections, collection_description='')
__len__()

Get the number of annotations in the collection.

__module__ = 'brain.dataformat.annotation'
__weakref__

list of weak references to the object (if defined)

add_annotation(annotation)

Add a new annotation to the collection.

NOTE: if the AnnotationCollection object was populated with h5py.Dataset objects, e.g., as is the case when using an AnnotationDataGroup instance, then the annotation will be written to file. If the AnnotationCollection was initialized as a pure in-memory collection using numpy arrays, then the arrays will only be updated in memory.

NOTE: The Annotation is assumed to refer to the same data object as this AnnotationCollection. If this is not the case, then the annotation will be either reassigned to the data object of the collection (as long as the data object’s shape match) or a ValueError is raised.

Parameters:annotation (Annotation) – The annotation to be added
Raises :ValueError in case the annotation can not be added to the collection
axis_filter(axis_list)

Get all annotations that filter by a given set of axes.

Parameters:axis_list – List of all axis relevant selections must filter by. Only selections that select by all specified axes will be retrieved. The axis may be specified by their integer index (recommended, -1,0,1, ... n) or by the name of the dimension in the data_object.

Examples:

>>> from brain.dataformat.brainformat.braindata import *
>>> f= BrainDataFile('testfile_real.h5')
>>> d = f.data().internal().ecog_data(0)
>>> d.annotations()
>>> a = d.annotations(0)
>>> a.axis_filter(0)  # Filter based on a single axis index
>>> a.axis_filter('space')  # Filter based on a single axis index using the name of the axis
>>> a.axis_filter(['space', 1])  # Filter based on multiple axis. We here can mix axis names and indexes

NOTE: Global selections are treated independently and are as such if the filter asks for axis=1 selections with axis=-1 (global) will not be included.

Returns:Bool array indicating for each annotation whether it has been selected (True) or not (False)
description_contains_filter(description)

Get all annotations for which the description contains the given text

Parameters:description – String of the partial description to be found.
Returns:Bool array indicating for each annotation whether it has been selected (True) or not (False)
description_filter(description)

Get all annotations with the given description.

Parameters:description – String of the description to be located.
Returns:Bool array indicating for each annotation whether it has been selected (True) or not (False)
get_annotations()

Get a list of all annotations.

Returns:List of Annotation objects
index_filter(selection)

Filter annotations based on their index.

Parameters:selection – Any valid selection supported by numpy, e.g., a slice object or an integer index. E.g., to select annotation 200 to 300 we could do index_filter(slice(200,301). NOTE: Following standard numpy selection schema the upper bound is not included in this example.
Returns:Bool array indicating for each annotation whether it has been selected (True) or not (False)
merge_and()

Get single AnnotationObject (or None) that combines all annotations in this collections via logical AND.

This is a convenience function and is equivalent to calling get_annotations(...) and combining all annotations in the returned list via logical & (AND).

merge_or()

Get single AnnotationObject (or None) that combines all annotations in this collections via logical OR.

This is a convenience function and is equivalent to calling get_annotations(...) and combining all annotations in the returned list via logical | (OR).

merge_xor()

Get single AnnotationObject (or None) that combines all annotations in this collections via logical XOR.

This is a convenience function and is equivalent to calling get_annotations(...) and combining all annotations in the returned list via logical ^ (XOR).

num_annotations()

Get the number of annotations in the collection.

type_contains_filter(annotation_type)

Get all annotations where the annotation type contains the given string.

Parameters:annotation_type – String with annotation type substring to look for
Returns:Bool array indicating for each annotation whether it has been selected (True) or not (False)
type_filter(annotation_type)

Get all annotations with the given type.

Parameters:annotation_type – Either the string of the annotation type or integer with the index of the annotation type.
Returns:Bool array indicating for each annotation whether it has been selected (True) or not (False)
class brain.dataformat.annotation.AnnotationDataGroup(hdf_object)

Bases: brain.dataformat.base.ManagedGroup, brain.dataformat.annotation.AnnotationCollection

Managed group for storage of annotations.

Parameters:hdf_object – The h5py.Group object managed by this class.

HDF5 file structure: -> ‘data_object’ –> Link to the HDF5 dataset/group we select from -> ‘annotation_types’ –> 1D Dataset with the list of all available types (vlen str) -> ‘annotation_type_indexes’ –> 1D Dataset with the index into the annotation_types array -> ‘descriptions’ –> 1D Dataset with the description of the annotations -> ‘selection_indexes’ –> (#selections, #axis+1) dataset, indicating for each axis the selection that applies or -1 if none. The axes are ordered as [-1, 0, 1, 2, ...] -> ‘selections_axis_#’ –> 2D dataset per each axis with all 1D selections. + (#axis+1) dataset for global selections -> axis_index : 1D dataset for the dimension-scale for selection_indexes -> collection_description: Attribute with string describing the collection

__getitem__(item)

If item is a string, then retrieve the corresponding object from the HDF5 file using the ManagedGroup.__getitem__(..) method, otherwise retrieve the corresponding AnnotationCollection using the implementation of AnnotationCollection.__getitem__(..)

__init__(hdf_object)
__module__ = 'brain.dataformat.annotation'
classmethod get_format_specification()

Return dictionary describing the specification of the format.

populate(data_object, annotation_types=None, annotation_type_indexes=None, descriptions=None, selection_indexes=None, selections=None, collection_description='undefined', annotation_collection=None)

The populate method is called by the create method after the basic common setup is complete. The function should be used to populate the managed object (e.g., add dimensions to a datadset or add required datasets to a group. The populate method is passed the kwargs handed to the create method.

Parameters:
  • data_object – The h5py.Dataset (or managed object with support for shape and slicing) the Annotations refer to. (Mandatory)
  • annotation_types – 1D numpy array of strings (or python list of strings, or h5py dataset of string) with the different possible types of annotations in this collection. May be None in case that AnnotationCollection is initialized as empty.
  • annotation_type_indexes – 1D h5py.Dataset or numpy array indicating for each annotation the index of the annotation_type that applies.
  • descriptions – h5py.Dataset or numpy array with the annotation descriptions
  • selection_indexes – h5py.Dataset or numpy array with the indexes of the selections that apply
  • collection_description – String with a description of the annotation collection
  • selections – Python dict of numpy arrays or h5py.Dataset objects. The keys are ints indicating the axis, which must be in the range of [-1,0,1, ... #axis]. Missing values are interpreted as empty (i.e., no selections are available for those axes).
  • annotation_collection (AnnotationCollection) – This parameter may be given instead of the other AnnotationDataGroup specific parameters, i.e., data_object, annotation_types, annotation_type_indexes, descriptions, selection_indexes, selections, collection_description. If given then those parameter will be initialized from the annotation collection instead.
Raises :

ValueError is raised in case that the object cannot be populated.

class brain.dataformat.annotation.DataSelection(data_object, selections=None, simplify=False, collapse=False)

Bases: object

A single data data_selection for a given dataset.

Basic Structure

The data_selection object defines a set of selections (one per axis). Axes for which no data_selection is specified are interpreted as an ALL data_selection, i.e., [1,1,1,...,1] is assumed. The individual selections are assumed to be combined via a binary AND operation, i.e., only objects selected by all selections are seen as being selected. A data_selection may also be applied to the -1 axis, indicating that the data_selection is applied to all axes. In the case of such a global data_selection, the data_selection is described via a bool matrix of the same shape as the data (rather than a vector). As such, specifying a global (-1) data_selection is generally something that should be avoided if possible, because this may result in very large data_selection arrays, as a bool value must be created for each data element.

Creating and Updating Data Selections

A default DataSelection my be created simply by specifying only the data object the selection applies to without specifying any explicit selection:

>>> data = np.arange(100).reshape((10, 10))
>>> d1 = DataSelection(data_object=data)

This will create a default DataSelection with an empty selection dict, i.e., d1.selections = {}. As missing selections are treated as ALL, this results in a selection data implicitly selects all objects.

We may now change the selection, simply using array slicing. In the slicing we specify the axis index first, and then the elements to be changed:

>>> d1[0, 1:3] = True  # axis=0, selection=1:3
>>> print d1[0]
[ True  True  True False False False False False False False]

NOTE: While missing selections are treated as ALL in general, if a selection is missing during assignment, we will initialize the missing selection as all False. This divergence from the standard scheme achieves that all values outside of the given first assignment are not selected . This divergence is made to: i) allow the more intuitive usage for users of selecting elements of interest via assignment rather then having to define all elements that are not of interest and ii) to avoid the problem that elements are automatically selected for the user.

NOTE: Instead of specifying the index of the axis, we may also use the label of the dimension if the data object the DataSelection applies is an h5py.Dataset with dimension scales. E.g, d1[‘time’, 1:3] = True

Instead of creating a default selection and then updating it, we can also directly create a custom selection, by directly initializing the selections dictionary during creation of the DataSelection object:

>>>t = np.zeros(10, ‘bool’) >>>t[0:3] = True >>>d1 = DataSelection(data_object=data, selections= {0: t})

Accessing Axis Selection

To access the selections associated with the different axes we can do one of the following two things:

>>> print d1[0]  # Get selection for axis 0 using array slicing
[ True  True  True False False False False False False False]
>>> print d1.selections[0]  # Get selection for axis 0 by accessing the selection dict directly
[ True  True  True False False False False False False False]

As shown above, we can retrieve the selection associated using standard slicing against the DataSelection directly or by slicing into the selections dict. For axis with an explicit selection, both methods result in the same behavior. However, for axes without a selection, the first approch will yield None, while the latter approach will result in a KeyError:

>>> print d1[1]
>>> print d1.selections[1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 1

Logical Operations / Combining Data Selection

DataSelection objects may be combined via bitwise binary operators:

  • & : AND operator defining the intersection of two DataSelections s1 & s2
  • | : OR operator defining the merge of two DataSelections s1 | s2
  • ^ : XOR of two data DataSelections s1 ^ s2
  • ~ : NOT operator inverting a DataSelection ~s1

Comparing Data Selections

  • in : s1 in s2’ Check if DataSelection s1 is a subset of s2 `s1 in s2
  • ‘>=` : s1 >= s2 : Check if the DataSelection s1 contains all elements of s2.
  • ‘>’ : s1 > s‘ : Check if the DataSelection s1 contains all elements of s2 and has additional elements
  • ‘==’ : s1 == s2 : Check if the DataSelection s1 selects exactly the same elements as s2
  • ‘!=’ : s1 != s2 : Check if the DataSelection s1 selects at least in parts different elements than s2
  • ‘<’ : s1 < s2 : Check if the DataSelection s1 is contained in s2 but s2 selects additional elements as well
  • ‘<=’ : s1 <= s2 : Check if the DataSelection s1 is a subset of s2
  • >> : s1 >> s2 : Check if s1 follows s2, i.e., all elements selected by s1 appear after all elements selected by s2.
  • << : s1 >> s2 : Check if s1 precedes s2, i.e., all elements selected by s1 appear before all elements selected by s2.

Other Selection Operations

  • validate() : Validate that a data_selection is well-defined/valid
  • len : The python len operator functions is equivalent to calling count() (see next).
  • count() : Count the total number of elements selected. This is equivalent to len(a)
  • simplify_selection() : Remove unnecessary components of the data_selection.
  • collapse_selection() : Collapse the data_selection to a single global data_selection.
Variables:
  • selections – Dict of numpy bool vector (True/False) indicating which values are selected for a given axis. The axis is the key and data_selection vectors are the values. Use -1 for a complex data_selection that is applied to all axes. As soon as a a global -1 data_selection is present could also be collapsed to a single global data_selection.
  • data_object – The object to which the data_selection is applied to. The object must support the .shape attribute and support h5py/numpy array slicing.
__and__(other)

Return a new DataSelection that defines a logical AND of the two selections. Selections are combined on a per-axis basis.

NOTE: The new data_selection will always be defined on the data_object of the left-hand
DataSelection A of (A and B).

NOTE: If only one of the two selections defines a data_selection for a given axes then the single data_selection that is defined will be used, implicitly interpreting the missing data_selection as an ALL [1,1,1....1] data_selection.

Parameters:other (DataSelection) – The right-hand DataSelection object B of the A & B operation.
__axis_bounds__()

Compute the smallest and largest selected index along all axes.

Returns:This function returns a tuple of two lists, one with the lower axis bounds and one with the upper axis bounds.
static __collapse_selection__(selection_dict, data_shape, global_only=True)

Internal helper function used to collapse a set of selections. A data_selection may be collapsed to a single global data_selection if a global data_selection is already present.

Parameters:
  • selection_dict (dict) – DataSelection.selections dictionary describing for all axes (keys) the selected objects (values).
  • data_shape (tuple) – Tuple with the shape of the data to be selected
  • global_only (bool) – Boolean indicating whether the data_selection should always be collapsed to a single global data_selection (False) or only in case that a global data_selection is already present (True). Default is True.
__comparison_operator__(other, operator)

Internal helper function used to implement a series of comparison operators.

NOTE: These operations checks are not just comparisons of one selection selecting
more or less items then the other, but are true overlap containment checks. E.g., a<b means that b must select all times that a selects and b must select additional items that a does not select. This also means that as soon as a selects a record that is not in b, a<b will be False.
Parameters:
  • other (DataSelection) – The other DataSelection object
  • operator – String indicating the comparison operator to be used, which include: ‘in’, ‘ge’ is >= , ‘gt’ is >, ‘eq’ is ==, ‘ne is !=, ‘le’ is <=, ‘lt’ is <.
Type :

str or unicode

__contains__(other)

Check whether a data_selection is a subset of the other data_selection.

__dict__ = <dictproxy object at 0x111582248>
__eq__(other)

Equals operator (==)

__ge__(other)

Greater than or equals operator (>=)

__getitem__(item)

Get the selection for the given axis. If no selection is present for the specified axis, then None is returned instead

__gt__(other)

Greater than operator (>)

__init__(data_object, selections=None, simplify=False, collapse=False)

Initialize a new DataSelection object.

Parameters:
  • selections – Dict of numpy bool vector (True/False) indicating which values are selected for a given axis. The axis is the key and data_selection vectors are the values. Use -1 for a complex data_selection that is applied to all axes. As soon as a a global -1 data_selection is present could also be collapsed to a single global data_selection. Use None or an empty dict {} to create a basic selection of the whole data.
  • data_object – The object to which the data_selection is applied to. The object must support the .shape attribute and support h5py/numpy array slicing.
  • simplify – Simplify the incoming data_selection if possible (see also simplify_selection(..))
  • collapse – Collapse the data_selection if possible (see also collapse_selection(..) with global_only=True)
__invert__()

Return a new DataSelection object that inverts the current data_selection.

NOTE: This may result in a complex, global data_selection. Only in the case where we have no or only have a single data_selection applied to one axes can we maintain a simple per-axis data_selection. Global selections may require large memory as we need to store one bool for each element in the dataset.

NOTE: The data_selection of this data_selection will be simplified first in an attempt to reduce the number of axes selections used with the goal to keep a simple per-axis data_selection object if possible.

__le__(other)

Less than or equals operator (<=)

__len__()

Get number of selected elements.

__lshift__(other)

a << b : Identify if a precedes b.

This checks for all selections applied whether all elements selected by a appear before the elements selected by b. This means both a >> b and b >> a can be False at the same time in the case that either a and b overlap in their selection ranges.

__lt__(other)

Less than operator (<)

__module__ = 'brain.dataformat.annotation'
__ne__(other)

Not equals operator (!=)

__or__(other)

Return a new DataSelection that defines the logical OR of the two selections.

NOTE: The new data_selection will always be defined on the data_object of the left-hand
DataSelection A of (A or B).

NOTE: Due to the complexity of the resulting data_selection, the returned DataSelection may define a complex, global data_selection for axis -1. Only in the case where both selections define just a single data_selection on the same axis do we keep a simple data_selection. Global selections may require large memory as we need to store one bool for each element in the dataset.

Parameters:other (DataSelection) – The right-hand DataSelection object B of the A & B operation.
__rshift__(other)

a >> b : Identify if a follows b.

This checks for all selections applied whether all elements selected by a appear after the elements selected by b. This means both a >> b and b >> a can be False at the same time in the case that either a and b overlap in their selection ranges.

__setitem__(key, value)

Define the selection for a given axis using the following syntax.

s[axis_index, value_selection] = True

The right-hand assignment value must be a bool or bool-array. The axis_index must be an integer indicating a valid axis. The selection may be any valid data selection along the given axis.

Note, if not selection exists yet for the given axis, then all values outside of the given assignment will be set to False. This is in contrast to the fact that non-existent selections are generally treated as all 1. This divergence is made to: i) allow the more intuitive usage for users of selecting elements of interest via assignment rather then having to define all elements that are not of interest and ii) avoid the problem that elements are automatically selected for the user.

__weakref__

list of weak references to the object (if defined)

__xor__(other)

Return a new data data_selection that defines an XOR between the two selections. NOTE: If an data_selection does not define a data_selection for an axis, then it is interpreted as having an ALL data_selection, i.e., [1,1,1,...1]. As such for axes for which only one side defines an explicit data_selection, the operation equivalent ot an invert/negation of the data_selection that exists. NOTE: Due to the complexity of the XOR this operation typically result in a global -1 axis data_selection in order to be able to represent the resulting data_selection, even if the inputs are all just per-axis selections. Only in the case where both selections define just a single data_selection on the same axis do we keep a simple data_selection. Global selections may require large memory as we need to store one bool for each element in the dataset.

axes(restricted_only=False, bounds=None)

Get the list of axes that are sub-selected by this selections. This may be an empty list.

Parameters:
  • restricted_only – If set to true, then only axes that are actually restricted by the selection are returned. This means axes where we find selected values along the whole axes are not included in the selection. This also means, that -1 is removed from the axes and resolved to identify the actually restricted axes.
  • bounds (Tuple of two lists indicating the lower and upper bounds. Same as output of self.__axis_bounds__) – Optional input used to indicate the lower and upper bounds selected along a given axis. If set to None, then the bounds will be computed using self.__axis_bounds__.
Returns:

List of integers with the axes indices.

collapse_selection(global_only=True)

Collapse the data_selection to a single global data_selection.

Parameters:global_only (bool) – Boolean indicating whether the data_selection should always be collapsed to a single global data_selection (False) or only in case that a global data_selection is already present (True). Default is True.
count()

Get the number of elements selected by the data_selection. Dimensions without an explicit data_selection are counted as having an ALL data_selection. This is equivalent to calling len() on the object.

counts()

Get the number of elements selected by the individual parts of the selections defined in the self.selections object. Dimensions without an explicit data_selection are interpreted as having an ALL data_selection (i.e, the length of that axes is returned)

Returns:Dictionary of {axis_index: count}. Axis -1 indicates the presence of a global data_selection.
data()

Load the data associated with this DataSelection.

simplify_selection()

Remove unnecessary entries from the data_selection dict, i.e., entries that select all elements for a given axis.

validate()

Check whether the data_selection is valid.

Table Of Contents

Previous topic

brain Package

Next topic

readers Package

This Page