Usage
hdf5plugin
allows using additional HDF5 compression filters with h5py for reading and writing compressed datasets.
Read compressed datasets
In order to read compressed dataset with h5py, use:
import hdf5plugin
It registers hdf5plugin
supported compression filters with the HDF5 library used by h5py.
Hence, HDF5 compressed datasets can be read as any other dataset (see h5py documentation).
Write compressed datasets
As for reading compressed datasets, import hdf5plugin
is required to enable the supported compression filters.
To create a compressed dataset use h5py.Group.create_dataset and set the compression
and compression_opts
arguments.
hdf5plugin
provides helpers to prepare those compression options: Bitshuffle, Blosc, BZip2, FciDecomp, LZ4, SZ, SZ3, Zfp, Zstd.
Sample code:
import numpy
import h5py
import hdf5plugin
# Compression
f = h5py.File('test.h5', 'w')
f.create_dataset('data', data=numpy.arange(100), **hdf5plugin.LZ4())
f.close()
# Decompression
f = h5py.File('test.h5', 'r')
data = f['data'][()]
f.close()
Relevant h5py documentation: Filter pipeline and Chunked Storage.
Bitshuffle
- class hdf5plugin.Bitshuffle(nelems=0, cname=None, clevel=3, lz4=None)
h5py.Group.create_dataset
’s compression arguments for using bitshuffle filter.It can be passed as keyword arguments:
f = h5py.File('test.h5', 'w') f.create_dataset( 'bitshuffle_with_lz4', data=numpy.arange(100), **hdf5plugin.Bitshuffle(nelems=0, lz4=True)) f.close()
- Parameters:
nelems (int) – The number of elements per block. It needs to be divisible by eight (default is 0, about 8kB per block) Default: 0 (for about 8kB per block).
cname (str) – lz4 (default), none, zstd
clevel (int) – Compression level, used only for zstd compression. Can be negative, and must be below or equal to 22 (maximum compression). Default: 3.
- filter_id = 32008
- filter_name = 'bshuf'
Blosc
- class hdf5plugin.Blosc(cname='lz4', clevel=5, shuffle=1)
h5py.Group.create_dataset
’s compression arguments for using blosc filter.It can be passed as keyword arguments:
f = h5py.File('test.h5', 'w') f.create_dataset( 'blosc_byte_shuffle_blosclz', data=numpy.arange(100), **hdf5plugin.Blosc(cname='blosclz', clevel=9, shuffle=hdf5plugin.Blosc.SHUFFLE)) f.close()
- Parameters:
cname (str) – blosclz, lz4 (default), lz4hc, zlib, zstd Optional: snappy, depending on compilation (requires C++11).
clevel (int) – Compression level from 0 (no compression) to 9 (maximum compression). Default: 5.
shuffle (int) –
One of:
Blosc.NOSHUFFLE (0): No shuffle
Blosc.SHUFFLE (1): byte-wise shuffle (default)
Blosc.BITSHUFFLE (2): bit-wise shuffle
- BITSHUFFLE = 2
Flag to enable bit-wise shuffle pre-compression filter
- NOSHUFFLE = 0
Flag to disable data shuffle pre-compression filter
- SHUFFLE = 1
Flag to enable byte-wise shuffle pre-compression filter
- filter_id = 32001
- filter_name = 'blosc'
Blosc2
- class hdf5plugin.Blosc2(cname='blosclz', clevel=5, filters=1)
h5py.Group.create_dataset
’s compression arguments for using blosc2 filter.WARNING: This is a pre-release version of the HDF5 filter, only for testing purpose.
It can be passed as keyword arguments:
f = h5py.File('test.h5', 'w') f.create_dataset( 'blosc2_byte_shuffle_blosclz', data=numpy.arange(100), **hdf5plugin.Blosc2(cname='blosclz', clevel=9, filters=hdf5plugin.Blosc2.SHUFFLE)) f.close()
- Parameters:
cname (str) – blosclz (default), lz4, lz4hc, zlib, zstd
clevel (int) – Compression level from 0 (no compression) to 9 (maximum compression). Default: 5.
filters (int) –
One of:
Blosc2.NOFILTER (0): No pre-compression filter
Blosc2.SHUFFLE (1): Byte-wise shuffle (default)
Blosc2.BITSHUFFLE (2): Bit-wise shuffle
Blosc2.DELTA (3): Stores diff’ed blocks
Blosc2.TRUNC_PREC (4): Zeroes the least significant bits of the mantissa
- BITSHUFFLE = 2
Flag to enable bit-wise shuffle pre-compression filter
- DELTA = 3
Flag to store blocks inside a chunk diff’ed with respect to first block in the chunk
- NOFILTER = 0
Flag to disable pre-compression filter
- SHUFFLE = 1
Flag to enable byte-wise shuffle pre-compression filter
- TRUNC_PREC = 4
Flag to zeroes the least significant bits of the mantissa of float32 and float64 types
- filter_id = 32026
- filter_name = 'blosc2'
BZip2
- class hdf5plugin.BZip2(blocksize=9)
h5py.Group.create_dataset
’s compression arguments for using BZip2 filter.It can be passed as keyword arguments:
f = h5py.File('test.h5', 'w') f.create_dataset( 'bzip2', data=numpy.arange(100), **hdf5plugin.BZip2(blocksize=5)) f.close()
- Parameters:
blocksize (int) – Size of the blocks as a multiple of 100k
- filter_id = 307
- filter_name = 'bzip2'
FciDecomp
- class hdf5plugin.FciDecomp(*args, **kwargs)
h5py.Group.create_dataset
’s compression arguments for using FciDecomp filter.It can be passed as keyword arguments:
f = h5py.File('test.h5', 'w') f.create_dataset( 'fcidecomp', data=numpy.arange(100), **hdf5plugin.FciDecomp()) f.close()
- filter_id = 32018
- filter_name = 'fcidecomp'
LZ4
- class hdf5plugin.LZ4(nbytes=0)
h5py.Group.create_dataset
’s compression arguments for using lz4 filter.It can be passed as keyword arguments:
f = h5py.File('test.h5', 'w') f.create_dataset('lz4', data=numpy.arange(100), **hdf5plugin.LZ4(nbytes=0)) f.close()
- Parameters:
nbytes (int) – The number of bytes per block. It needs to be in the range of 0 < nbytes < 2113929216 (1,9GB). Default: 0 (for 1GB per block).
- filter_id = 32004
- filter_name = 'lz4'
SZ
- class hdf5plugin.SZ(absolute=None, relative=None, pointwise_relative=None)
h5py.Group.create_dataset
’s compression arguments for using SZ filter.It can be passed as keyword arguments:
f = h5py.File('test.h5', 'w') f.create_dataset( 'sz', data=numpy.random.random(100), **hdf5plugin.SZ()) f.close()
This filter provides different modes:
Absolute mode: To use, set the
absolute
argument. It ensures that the resulting values will be within the provided absolute tolerance.f.create_dataset( 'sz_absolute', data=numpy.random.random(100), **hdf5plugin.SZ(absolute=0.1))
Relative mode: To use, set the
relative
argument. It ensures that the resulting values will be within the provided relative tolerance. The tolerance will be computed by multiplying the provided argument by the range of the data values.f.create_dataset( 'sz_relative', data=numpy.random.random(100), **hdf5plugin.SZ(relative=0.01))
Point-wise relative mode: To use, set the
pointwise_relative
argument. It ensures that each grid point of the resulting values will be within the provided relative tolerance.f.create_dataset( 'sz_pointwise_relative', data=numpy.random.random(100), **hdf5plugin.SZ(pointwise_relative=0.01))
For more details about the compressor SZ.
- filter_id = 32017
- filter_name = 'sz'
SZ3
- class hdf5plugin.SZ3(absolute=None, relative=None, norm2=None, peak_signal_to_noise_ratio=None)
h5py.Group.create_dataset
’s compression arguments for using SZ3 filter.It can be passed as keyword arguments:
Absolute mode: To use, set the
absolute
argument. It ensures that the resulting values will be within the provided absolute tolerance.f.create_dataset( 'sz3_absolute', data=numpy.random.random(100), **hdf5plugin.SZ3(absolute=0.1))
For more details about the compressor, see SZ3.
- filter_id = 32024
- filter_name = 'sz3'
Zfp
- class hdf5plugin.Zfp(rate=None, precision=None, accuracy=None, reversible=False, minbits=None, maxbits=None, maxprec=None, minexp=None)
h5py.Group.create_dataset
’s compression arguments for using ZFP filter.It can be passed as keyword arguments:
f = h5py.File('test.h5', 'w') f.create_dataset( 'zfp', data=numpy.random.random(100), **hdf5plugin.Zfp()) f.close()
This filter provides different modes:
Fixed-rate mode: To use, set the
rate
argument. For details, see zfp fixed-rate mode.f.create_dataset( 'zfp_fixed_rate', data=numpy.random.random(100), **hdf5plugin.Zfp(rate=10.0))
Fixed-precision mode: To use, set the
precision
argument. For details, see zfp fixed-precision mode.f.create_dataset( 'zfp_fixed_precision', data=numpy.random.random(100), **hdf5plugin.Zfp(precision=10))
Fixed-accuracy mode: To use, set the
accuracy
argument For details, see zfp fixed-accuracy mode.f.create_dataset( 'zfp_fixed_accuracy', data=numpy.random.random(100), **hdf5plugin.Zfp(accuracy=0.001))
Reversible (i.e., lossless) mode: To use, set the
reversible
argument to True For details, see zfp reversible mode.f.create_dataset( 'zfp_reversible', data=numpy.random.random(100), **hdf5plugin.Zfp(reversible=True))
Expert mode: To use, set the
minbits
,maxbits
,maxprec
andminexp
arguments. For details, see zfp expert mode.f.create_dataset( 'zfp_expert', data=numpy.random.random(100), **hdf5plugin.Zfp(minbits=1, maxbits=16657, maxprec=64, minexp=-1074))
- Parameters:
rate (float) – Use fixed-rate mode and set the number of compressed bits per value.
precision (float) – Use fixed-precision mode and set the number of uncompressed bits per value.
accuracy (float) – Use fixed-accuracy mode and set the absolute error tolerance.
reversible (bool) – If True, it uses the reversible (i.e., lossless) mode.
minbits (int) – Minimum number of compressed bits used to represent a block.
maxbits (int) – Maximum number of bits used to represent a block.
maxprec (int) – Maximum number of bit planes encoded. It controls the relative error.
minexp (int) – Smallest absolute bit plane number encoded. It controls the absolute error.
- filter_id = 32013
- filter_name = 'zfp'
Zstd
- class hdf5plugin.Zstd(clevel=3)
h5py.Group.create_dataset
’s compression arguments for using FciDecomp filter.It can be passed as keyword arguments:
f = h5py.File('test.h5', 'w') f.create_dataset( 'zstd', data=numpy.arange(100), **hdf5plugin.Zstd()) f.close()
- Parameters:
clevel (int) – Compression level from 1 (lowest compression) to 22 (maximum compression). Ultra compression extends from 20 through 22. Default: 3.
f = h5py.File('test.h5', 'w') f.create_dataset( 'zstd', data=numpy.arange(100), **hdf5plugin.Zstd(clevel=22)) f.close()
- filter_id = 32015
- filter_name = 'zstd'
Get information about hdf5plugin
Constants:
- hdf5plugin.PLUGIN_PATH
Directory where the provided HDF5 filter plugins are stored.
Functions:
- hdf5plugin.get_filters(filters=('bshuf', 'blosc', 'blosc2', 'bzip2', 'fcidecomp', 'lz4', 'sz', 'sz3', 'zfp', 'zstd'))
Returns selected filter classes.
By default it returns all filter classes.
- Parameters:
filters (Union[str,int,Tuple[Union[str,int]]) – Filter name or ID or sequence of filter names or IDs (default: all filters). It also supports the value “registered” which selects currently available filters.
- Returns:
Tuple of filter classes
- hdf5plugin.get_config()
Provides information about build configuration and filters registered by hdf5plugin.
Manage registered filters
When imported, hdf5plugin initialises and registers the filters it embeds if there is no already registered filters for the corresponding filter IDs.
h5py gives access to HDF5 functions handling registered filters in h5py.h5z. This module allows checking the filter availability and registering/unregistering filters.
hdf5plugin provides an extra register function to register the filters it provides, e.g., to override an already loaded filters. Registering with this function is required to perform additional initialisation and enable writing compressed data with the given filter.
- hdf5plugin.register(filters=('bshuf', 'blosc', 'blosc2', 'bzip2', 'fcidecomp', 'lz4', 'sz', 'sz3', 'zfp', 'zstd'), force=True)
Initialise and register hdf5plugin embedded filters given their names or IDs.
- Parameters:
filters (Union[str,int,Tuple[Union[str,int]]) – Filter name or ID or sequence of filter names or IDs.
force (bool) – True to register the filter even if a corresponding one if already available. False to skip already available filters.
- Returns:
True if all filters were registered successfully, False otherwise.
- Return type:
bool
Use HDF5 filters in other applications
Non h5py or non-Python users can also benefit from the supplied HDF5 compression filters for reading compressed datasets by setting the HDF5_PLUGIN_PATH
environment variable the value of hdf5plugin.PLUGIN_PATH
, which can be retrieved from the command line with:
python -c "import hdf5plugin; print(hdf5plugin.PLUGIN_PATH)"
For instance:
export HDF5_PLUGIN_PATH=$(python -c "import hdf5plugin; print(hdf5plugin.PLUGIN_PATH)")
should allow MatLab or IDL users to read data compressed using the supported plugins.
Setting the HDF5_PLUGIN_PATH
environment variable allows already existing programs or Python code to read compressed data without any modification.