Contribute
This project follows the standard open-source project github workflow, which is described in other projects like scikit-image.
Testing
To run self-contained tests, from Python:
import hdf5plugin.test
hdf5plugin.test.run_tests()
Or, from the command line:
python -m hdf5plugin.test
To also run tests relying on actual HDF5 files, run from the source directory:
python test/test.py
This tests the installed version of hdf5plugin.
Building documentation
Documentation relies on Sphinx.
To build documentation, run from the project root directory:
python setup.py build
PYTHONPATH=build/lib.<os>-<machine>-<pyver>/ sphinx-build -b html doc/ build/html
Guidelines to add a compression filter
This briefly describes the steps to add a HDF5 compression filter to the zoo.
Add the source of the HDF5 filter and compression algorithm code in a subdirectory in
src/[filter]
. Best is to usegit subtree
else copy the files there (including the license file). A released version of the filter + compression library should be used.git subtree
command:git subtree add --prefix=src/[filter] [git repository] [release tag] --squash
Update
setup.py
to build the filter dynamic library by adding an extension using theHDF5PluginExtension
class (a subclass ofsetuptools.Extension
) which adds extra files and compile options to enable dynamic loading of the filter. The name of the extension should behdf5plugin.plugins.libh5<filter_name>
.In case of import errors related to HDF5-related undefined symbols, add eventual missing functions under
src/hdf5_dl.c
.Add a “CONSTANT” in
src/hdf5plugin/_filters.py
named with theFILTER_NAME_ID
which value is the HDF5 filter ID (See HDF5 registered filters).Add a compression options helper class named
FilterName
inhdf5plugins/_filters.py
which should inherit from_FilterRefClass
. This is intended to ease the usage ofh5py.Group.create_dataset
compression_opts
argument. It must have a filter_name class attribute with the same name as in the extension defined insetup.py
(without thelibh5
prefix) . This name is used to find the filter library.Add
FilterName
tohdf5plugin._filters.FILTER_CLASSES
.Add to
hdf5plugin/__init__.py
the import of the filter ID and helper class:from ._filters import FILTER_NAME_ID, FilterName # noqa
Add tests:
In
test/test.py
for testing reading a compressed file that was produced with another software.In
src/hdf5plugin/test.py
for tests that writes data using the compression filter and the compression options helper function and reads back the data.
Update the
doc/information.rst
file to document:The version of the HDF5 filter that is embedded in
hdf5plugin
.The license of the filter (by adding a link to the license file).
Update the
doc/usage.rst
file to document:The
hdf5plugin.<FilterName>
compression argument helper class.
Update
doc/contribute.rst
to document the format ofcompression_opts
expected by the filter (see h5py custom compression filters).
Low-level compression filter arguments
Compression filters can be configured with the compression_opts
argument of h5py.Group.create_dataset method by providing a tuple of integers.
The meaning of those integers is filter dependent and is described below.
bitshuffle
compression_opts: (block_size, compression, level)
block size: Number of elements (not bytes) per block. It MUST be a mulitple of 8. Default: 0 for a block size of about 8 kB.
compression:
0: No compression
2: LZ4
3: Zstd
level: Compression level, only used with Zstd compression.
By default the filter uses bitshuffle, but does NOT compress with LZ4.
blosc
compression_opts: (0, 0, 0, 0, compression level, shuffle, compression)
First 4 values are reserved.
compression level: From 0 (no compression) to 9 (maximum compression). Default: 5.
shuffle: Shuffle filter:
0: no shuffle
1: byte shuffle
2: bit shuffle
compression: The compressor blosc ID:
0: blosclz (default)
1: lz4
2: lz4hc
3: snappy
4: zlib
5: zstd
By default the filter uses byte shuffle and blosclz.
blosc2
compression_opts: (0, 0, 0, 0, compression level, filter, compression)
First 4 values are reserved.
compression level: From 0 (no compression) to 9 (maximum compression). Default: 5.
filter: Pre-compression filter:
0: no shuffle
1: byte shuffle
2: bit shuffle
3: delta: diff current block with first one
4: truncate precision: Truncate mantissa for floating point types
compression: The compressor blosc ID:
0: blosclz (default)
1: lz4
2: lz4hc
3: unused
4: zlib
5: zstd
By default the filter uses byte shuffle and blosclz.
bzip2
compression_opts: (block size,)
block_size: Size of the blocks as a multiple of 100k. It must be in the range [1, 9].
lz4
compression_opts: (block_size,)
block size: Number of bytes per block. Default 0 for a block size of 1GB. It MUST be < 1.9 GB.
sperr
compression_opts: (mode_quality_swap,)
mode_quality_swap: Store mode, quality and swap as a 32 bits unsigned integer: For details see the implementation of the C function: H5Z_SPERR_make_cd_values
sz
compression_opts:
error_bound_mode (int32)
abs_error high (big endian float64)
abs_error low
rel_error high (big endian float64)
rel_error low
pw_rel_error high (big endian float64)
pw_rel_error low
psnr high (big endian float64)
psnr low
The set_local function prepends:
For dim size from 2 to 5:
(dim size, data type, r1, r2, r3 (if dim size >= 3), r4 (if dim size >= 4), r5 (if dim size == 5))
rX are set up to dim size (e.g., For dim size == 2 only r1 and r2 are used)
For dim size == 1: r1 is stored on 64 bits:
(dim size, data type, r1 most-significant bytes, r1 least-significant bytes)
sz3
compression_opts:
mode
abs_error high (big endian float64)
abs_error low
rel_error high (big endian float64)
rel_error low
norm2 high (big endian float64)
norm2 low
psnr high (big endian float64)
psnr low
zfp
For more information, see zfp modes and hdf5-zfp generic interface.
The first value of compression_opts is mode. The following values depends on the value of mode:
Fixed-rate mode: (1, 0, rateHigh, rateLow, 0, 0) Rate, i.e., number of compressed bits per value, as a double stored as:
rateHigh: High 32-bit word of the rate double.
rateLow: Low 32-bit word of the rate double.
Fixed-precision mode: (2, 0, prec, 0, 0, 0)
prec: Number of uncompressed bits per value.
Fixed-accuracy mode: (3, 0, accHigh, accLow, 0, 0) Accuracy, i.e., absolute error tolerance, as a double stored as:
accHigh: High 32-bit word of the accuracy double.
accLow: Low 32-bit word of the accuracy double.
Expert mode: (4, 0, minbits, maxbits, maxprec, minexp)
minbits: Minimum number of compressed bits used to represent a block.
maxbits: Maximum number of bits used to represent a block.
maxprec: Maximum number of bit planes encoded.
minexp: Smallest absolute bit plane number encoded.
Reversible mode: (5, 0, 0, 0, 0, 0)
zstd
compression_opts: (clevel,)
clevel: Compression level from 1 (lowest compression) to 22 (maximum compression). Ultra compression extends from 20 through 22. Default: 3.