spech5
: h5py-like API to SpecFile¶
This module provides a h5py-like API to access SpecFile data.
API description¶
Specfile data structure exposed by this API:
/
1.1/
title = "…"
start_time = "…"
instrument/
specfile/
file_header = "…"
scan_header = "…"
positioners/
motor_name = value
…
mca_0/
data = …
calibration = …
channels = …
preset_time = …
elapsed_time = …
live_time = …
mca_1/
…
…
measurement/
colname0 = …
colname1 = …
…
mca_0/
data -> /1.1/instrument/mca_0/data
info -> /1.1/instrument/mca_0/
…
sample/
ub_matrix = …
unit_cell = …
unit_cell_abc = …
unit_cell_alphabetagamma = …
2.1/
…
file_header
and scan_header
are the raw headers as they
appear in the original file, as a string of lines separated by newline (\n
) characters.
The title is the content of the #S
scan header line without the leading
#S
and without the scan number (e.g "ascan ss1vo -4.55687 -0.556875 40 0.2"
).
The start time is converted to ISO8601 format ("2016-02-23T22:49:05Z"
),
if the original date format is standard.
Numeric datasets are stored in float32 format, except for scalar integers which are stored as int64.
Motor positions (e.g. /1.1/instrument/positioners/motor_name
) can be
1D numpy arrays if they are measured as scan data, or else scalars as defined
on #P
scan header lines. A simple test is done to check if the motor name
is also a data column header defined in the #L
scan header line.
Scan data (e.g. /1.1/measurement/colname0
) is accessed by column,
the dataset name colname0
being the column label as defined in the #L
scan header line.
If a /
character is present in a column label or in a motor name in the
original SPEC file, it will be substituted with a %
character in the
corresponding dataset name.
MCA data is exposed as a 2D numpy array containing all spectra for a given analyser. The number of analysers is calculated as the number of MCA spectra per scan data line. Demultiplexing is then performed to assign the correct spectra to a given analyser.
MCA calibration is an array of 3 scalars, from the #@CALIB
header line.
It is identical for all MCA analysers, as there can be only one
#@CALIB
line per scan.
MCA channels is an array containing all channel numbers. This information is
computed from the #@CHANN
scan header line (if present), or computed from
the shape of the first spectrum in a scan ([0, … len(first_spectrum] - 1]
).
Accessing data¶
Data and groups are accessed in h5py
fashion:
from silx.io.spech5 import SpecH5
# Open a SpecFile
sfh5 = SpecH5("test.dat")
# using SpecH5 as a regular group to access scans
scan1group = sfh5["1.1"]
instrument_group = scan1group["instrument"]
# alternative: full path access
measurement_group = sfh5["/1.1/measurement"]
# accessing a scan data column by name as a 1D numpy array
data_array = measurement_group["Pslit HGap"]
# accessing all mca-spectra for one MCA device
mca_0_spectra = measurement_group["mca_0/data"]
SpecH5
files and groups provide a keys()
method:
>>> sfh5.keys()
['96.1', '97.1', '98.1']
>>> sfh5['96.1'].keys()
['title', 'start_time', 'instrument', 'measurement']
They can also be treated as iterators:
from silx.io import is_dataset
for scan_group in SpecH5("test.dat"):
dataset_names = [item.name in scan_group["measurement"] if
is_dataset(item)]
print("Found data columns in scan " + scan_group.name)
print(", ".join(dataset_names))
You can test for existence of data or groups:
>>> "/1.1/measurement/Pslit HGap" in sfh5
True
>>> "positioners" in sfh5["/2.1/instrument"]
True
>>> "spam" in sfh5["1.1"]
False
Note
Text used to be stored with a dtype numpy.string_
in silx versions
prior to 0.7.0. The type numpy.string_
is a byte-string format.
The consequence of this is that you had to decode strings before using
them in Python 3:
>>> from silx.io.spech5 import SpecH5
>>> sfh5 = SpecH5("31oct98.dat")
>>> sfh5["/68.1/title"]
b'68 ascan tx3 -28.5 -24.5 20 0.5'
>>> sfh5["/68.1/title"].decode()
'68 ascan tx3 -28.5 -24.5 20 0.5'
From silx version 0.7.0 onwards, text is now stored as unicode. This corresponds to the default text type in python 3, and to the unicode type in Python 2.
To be on the safe side, you can test for the presence of a decode attribute, to ensure that you always work with unicode text:
>>> title = sfh5["/68.1/title"]
>>> if hasattr(title, "decode"):
... title = title.decode()
Classes¶
-
class
SpecH5
(filename)[source]¶ Bases:
silx.io.commonh5.File
,silx.io.spech5.SpecH5Group
This class opens a SPEC file and exposes it as a h5py.File.
It inherits
silx.io.commonh5.Group
(viacommonh5.File
), which implements most of its API.-
__contains__
(name)¶ Returns true if name is an existing child of this group.
Return type: bool
-
__enter__
()¶
-
__exit__
(exc_type, exc_val, exc_tb)¶
-
__getitem__
(name)¶ Return a child from his name.
Parameters: name (str) – name of a member or a path throug members using ‘/’ separator. A ‘/’ as a prefix access to the root item of the tree. Return type: Node
-
__iter__
()¶ Iterate over member names
-
__len__
()¶ Returns the number of children contained in this group.
Return type: int
-
attrs
¶ Returns HDF5 attributes of this node.
Return type: dict
-
basename
¶ Returns the HDF5 basename of this node.
-
create_dataset
(name, shape=None, dtype=None, data=None, **kwds)¶ Create and return a sub dataset.
Parameters: - name (str) – Name of the dataset.
- shape – Dataset shape. Use “()” for scalar datasets. Required if “data” isn’t provided.
- dtype – Numpy dtype or string. If omitted, dtype(‘f’) will be used. Required if “data” isn’t provided; otherwise, overrides data array’s dtype.
- data (numpy.ndarray) – Provide data to initialize the dataset. If used, you can omit shape and dtype arguments.
- kwds – Extra arguments. Nothing yet supported.
-
create_group
(name)¶ Create and return a new subgroup.
Name may be absolute or relative. Fails if the target name already exists.
Parameters: name (str) – Name of the new group
-
file
¶ Returns the file node of this node.
Return type: Node
-
filename
¶
-
get
(name, default=None, getclass=False, getlink=False)¶ Retrieve an item or other information.
If getlink only is true, the returned value is always h5py.HardLink, because this implementation do not use links. Like the original implementation.
Parameters: - name (str) – name of the item
- default (object) – default value returned if the name is not found
- getclass (bool) – if true, the returned object is the class of the object found
- getlink (bool) – if true, links object are returned instead of the target
Returns: An object, else None
Return type: object
-
h5_class
¶ Returns the
h5py.File
class
-
h5py_class
¶ Returns the h5py classes which is mimicked by this class. It can be one of h5py.File, h5py.Group or h5py.Dataset
This should not be used anymore. Prefer using h5_class
Return type: Class
-
items
()¶ Returns items iterator containing name-node mapping.
Return type: iterator
-
keys
()¶ Returns an iterator over the children’s names in a group.
-
mode
¶
-
name
¶ Returns the HDF5 name of this node.
-
parent
¶ Returns the parent of the node.
Return type: Node
-
values
()¶ Returns an iterator over the children nodes (groups and datasets) in a group.
New in version 0.6.
-
visit
(func, visit_links=False)¶ Recursively visit all names in this group and subgroups. See the documentation for h5py.Group.visit for more help.
Parameters: func (callable) – Callable (function, method or callable object)
-
visititems
(func, visit_links=False)¶ Recursively visit names and objects in this group. See the documentation for h5py.Group.visititems for more help.
Parameters: - func (callable) – Callable (function, method or callable object)
- visit_links (bool) – If False, ignore links. If True, call func(name) for links and recurse into target groups.
-
-
class
SpecH5Group
[source]¶ Bases:
object
This convenience class is to be inherited by all groups, for compatibility purposes with code that tests for
isinstance(obj, SpecH5Group)
.This legacy behavior is deprecated. The correct way to test if an object is a group is to use
silx.io.utils.is_group()
.Groups must also inherit
silx.io.commonh5.Group
, which actually implements all the methods and attributes.
-
class
Group
(name, parent=None, attrs=None)[source]¶ Bases:
silx.io.commonh5.Node
This class mimics a h5py.Group.
-
get
(name, default=None, getclass=False, getlink=False)[source]¶ Retrieve an item or other information.
If getlink only is true, the returned value is always h5py.HardLink, because this implementation do not use links. Like the original implementation.
Parameters: - name (str) – name of the item
- default (object) – default value returned if the name is not found
- getclass (bool) – if true, the returned object is the class of the object found
- getlink (bool) – if true, links object are returned instead of the target
Returns: An object, else None
Return type: object
-
__getitem__
(name)[source]¶ Return a child from his name.
Parameters: name (str) – name of a member or a path throug members using ‘/’ separator. A ‘/’ as a prefix access to the root item of the tree. Return type: Node
-
__contains__
(name)[source]¶ Returns true if name is an existing child of this group.
Return type: bool
-
values
()[source]¶ Returns an iterator over the children nodes (groups and datasets) in a group.
New in version 0.6.
-
visit
(func, visit_links=False)[source]¶ Recursively visit all names in this group and subgroups. See the documentation for h5py.Group.visit for more help.
Parameters: func (callable) – Callable (function, method or callable object)
-
visititems
(func, visit_links=False)[source]¶ Recursively visit names and objects in this group. See the documentation for h5py.Group.visititems for more help.
Parameters: - func (callable) – Callable (function, method or callable object)
- visit_links (bool) – If False, ignore links. If True, call func(name) for links and recurse into target groups.
-
attrs
¶ Returns HDF5 attributes of this node.
Return type: dict
-
basename
¶ Returns the HDF5 basename of this node.
-
file
¶ Returns the file node of this node.
Return type: Node
-
h5py_class
¶ Returns the h5py classes which is mimicked by this class. It can be one of h5py.File, h5py.Group or h5py.Dataset
This should not be used anymore. Prefer using h5_class
Return type: Class
-
name
¶ Returns the HDF5 name of this node.
-
parent
¶ Returns the parent of the node.
Return type: Node
-
-
class
SpecH5Dataset
[source]¶ Bases:
object
This convenience class is to be inherited by all datasets, for compatibility purpose with code that tests for
isinstance(obj, SpecH5Dataset)
.This legacy behavior is deprecated. The correct way to test if an object is a dataset is to use
silx.io.utils.is_dataset()
.Datasets must also inherit
SpecH5NodeDataset
orSpecH5LazyNodeDataset
which actually implement all the API.
-
class
SpecH5NodeDataset
(name, data, parent=None, attrs=None)[source]¶ Bases:
silx.io.commonh5.Dataset
,silx.io.spech5.SpecH5Dataset
This class inherits
commonh5.Dataset
, to which it adds little extra functionality. The main additional functionality is the proxy behavior that allows to mimic the numpy array stored in this class.-
__getitem__
(item)¶ Returns the slice of the data exposed by this dataset.
Return type: numpy.ndarray
-
__iter__
()¶ Iterate over the first axis. TypeError if scalar.
-
__len__
()¶ Returns the size of the data exposed by this dataset.
Return type: int
-
attrs
¶ Returns HDF5 attributes of this node.
Return type: dict
-
basename
¶ Returns the HDF5 basename of this node.
-
chunks
¶ Returns chunks as provided by h5py.Dataset.
There is no chunks.
-
compression
¶ Returns compression as provided by h5py.Dataset.
There is no compression.
-
compression_opts
¶ Returns compression options as provided by h5py.Dataset.
There is no compression.
-
dtype
¶ Returns the numpy datatype exposed by this dataset.
Return type: numpy.dtype
-
file
¶ Returns the file node of this node.
Return type: Node
-
h5py_class
¶ Returns the h5py classes which is mimicked by this class. It can be one of h5py.File, h5py.Group or h5py.Dataset
This should not be used anymore. Prefer using h5_class
Return type: Class
-
name
¶ Returns the HDF5 name of this node.
-
parent
¶ Returns the parent of the node.
Return type: Node
-
shape
¶ Returns the shape of the data exposed by this dataset.
Return type: tuple
-
size
¶ Returns the size of the data exposed by this dataset.
Return type: int
-
value
¶ Returns the data exposed by this dataset.
Deprecated by h5py. It is prefered to use indexing [()].
Return type: numpy.ndarray
-