Usage

hdf5plugin allows using additional HDF5 compression filters with h5py for reading and writing compressed datasets.

Available constants:

hdf5plugin.FILTERS: A dictionary mapping provided filters to their ID
hdf5plugin.PLUGINS_PATH: The directory where the provided filters library are stored.

Read compressed datasets

In order to read compressed dataset with h5py, use:

import hdf5plugin

It registers hdf5plugin supported compression filters with the HDF5 library used by h5py. Hence, HDF5 compressed datasets can be read as any other dataset (see h5py documentation).

Write compressed datasets

As for reading compressed datasets, import hdf5plugin is required to enable the supported compression filters.

To create a compressed dataset use h5py.Group.create_dataset and set the compression and compression_opts arguments.

hdf5plugin provides helpers to prepare those compression options: Bitshuffle, Blosc, FciDecomp, LZ4, Zfp, Zstd.

Sample code:

import numpy
import h5py
import hdf5plugin

# Compression
f = h5py.File('test.h5', 'w')
f.create_dataset('data', data=numpy.arange(100), **hdf5plugin.LZ4())
f.close()

# Decompression
f = h5py.File('test.h5', 'r')
data = f['data'][()]
f.close()

Relevant h5py documentation: Filter pipeline and Chunked Storage.

Bitshuffle

class hdf5plugin.Bitshuffle(nelems=0, lz4=True)

h5py.Group.create_dataset’s compression arguments for using bitshuffle filter.

It can be passed as keyword arguments:

f = h5py.File('test.h5', 'w')
f.create_dataset(
    'bitshuffle_with_lz4',
    data=numpy.arange(100),
    **hdf5plugin.Bitshuffle(nelems=0, lz4=True))
f.close()

Parameters

nelems (int) – The number of elements per block. It needs to be divisible by eight (default is 0, about 8kB per block) Default: 0 (for about 8kB per block).
lz4 (bool) – Whether to use lz4 compression or not as part of the filter. Default: True

filter_id = 32008

Blosc

class hdf5plugin.Blosc(cname='lz4', clevel=5, shuffle=1)

h5py.Group.create_dataset’s compression arguments for using blosc filter.

It can be passed as keyword arguments:

f = h5py.File('test.h5', 'w')
f.create_dataset(
    'blosc_byte_shuffle_blosclz',
    data=numpy.arange(100),
    **hdf5plugin.Blosc(cname='blosclz', clevel=9, shuffle=hdf5plugin.Blosc.SHUFFLE))
f.close()

Parameters

cname (str) – blosclz, lz4 (default), lz4hc, zlib, zstd Optional: snappy, depending on compilation (requires C++11).
clevel (int) – Compression level from 0 (no compression) to 9 (maximum compression). Default: 5.
shuffle (int) – One of: - Blosc.NOSHUFFLE (0): No shuffle - Blosc.SHUFFLE (1): byte-wise shuffle (default) - Blosc.BITSHUFFLE (2): bit-wise shuffle

BITSHUFFLE = 2: Flag to enable bit-wise shuffle pre-compression filter

NOSHUFFLE = 0: Flag to disable data shuffle pre-compression filter

SHUFFLE = 1: Flag to enable byte-wise shuffle pre-compression filter

filter_id = 32001

FciDecomp

class hdf5plugin.FciDecomp(*args, **kwargs)

h5py.Group.create_dataset’s compression arguments for using FciDecomp filter.

It can be passed as keyword arguments:

f = h5py.File('test.h5', 'w')
f.create_dataset(
    'fcidecomp',
    data=numpy.arange(100),
    **hdf5plugin.FciDecomp())
f.close()

filter_id = 32018

LZ4

class hdf5plugin.LZ4(nbytes=0)

h5py.Group.create_dataset’s compression arguments for using lz4 filter.

It can be passed as keyword arguments:

f = h5py.File('test.h5', 'w')
f.create_dataset('lz4', data=numpy.arange(100),
    **hdf5plugin.LZ4(nbytes=0))
f.close()

Parameters: nbytes (int) – The number of bytes per block. It needs to be in the range of 0 < nbytes < 2113929216 (1,9GB). Default: 0 (for 1GB per block).

filter_id = 32004

Zfp

class hdf5plugin.Zfp(rate=None, precision=None, accuracy=None, reversible=False, minbits=None, maxbits=None, maxprec=None, minexp=None)

h5py.Group.create_dataset’s compression arguments for using ZFP filter.

It can be passed as keyword arguments:

f = h5py.File('test.h5', 'w')
f.create_dataset(
    'zfp',
    data=numpy.random.random(100),
    **hdf5plugin.Zfp())
f.close()

This filter provides different modes:

Fixed-rate mode: To use, set the rate argument. For details, see zfp fixed-rate mode.

f.create_dataset(
    'zfp_fixed_rate',
    data=numpy.random.random(100),
    **hdf5plugin.Zfp(rate=10.0))

Fixed-precision mode: To use, set the precision argument. For details, see zfp fixed-precision mode.

f.create_dataset(
    'zfp_fixed_precision',
    data=numpy.random.random(100),
    **hdf5plugin.Zfp(precision=10))

Fixed-accuracy mode: To use, set the accuracy argument For details, see zfp fixed-accuracy mode.

f.create_dataset(
    'zfp_fixed_accuracy',
    data=numpy.random.random(100),
    **hdf5plugin.Zfp(accuracy=0.001))

Reversible (i.e., lossless) mode: To use, set the reversible argument to True For details, see zfp reversible mode.

f.create_dataset(
    'zfp_reversible',
    data=numpy.random.random(100),
    **hdf5plugin.Zfp(reversible=True))

Expert mode: To use, set the minbits, maxbits, maxprec and minexp arguments. For details, see zfp expert mode.

f.create_dataset(
    'zfp_expert',
    data=numpy.random.random(100),
    **hdf5plugin.Zfp(minbits=1, maxbits=16657, maxprec=64, minexp=-1074))

Parameters

rate (float) – Use fixed-rate mode and set the number of compressed bits per value.
precision (float) – Use fixed-precision mode and set the number of uncompressed bits per value.
accuracy (float) – Use fixed-accuracy mode and set the absolute error tolerance.
reversible (bool) – If True, it uses the reversible (i.e., lossless) mode.
minbits (int) – Minimum number of compressed bits used to represent a block.
maxbits (int) – Maximum number of bits used to represent a block.
maxprec (int) – Maximum number of bit planes encoded. It controls the relative error.
minexp (int) – Smallest absolute bit plane number encoded. It controls the absolute error.

filter_id = 32013

Zstd

class hdf5plugin.Zstd(clevel=3)

h5py.Group.create_dataset’s compression arguments for using FciDecomp filter.

It can be passed as keyword arguments:

f = h5py.File('test.h5', 'w')
f.create_dataset(
    'zstd',
    data=numpy.arange(100),
    **hdf5plugin.Zstd())
f.close()

Parameters: clevel (int) – Compression level from 1 (lowest compression) to 22 (maximum compression). Ultra compression extends from 20 through 22. Default: 3.

f = h5py.File('test.h5', 'w')
f.create_dataset(
    'zstd',
    data=numpy.arange(100),
    **hdf5plugin.Zstd(clevel=22))
f.close()

filter_id = 32015

Use HDF5 filters in other applications

Non h5py or non-Python users can also benefit from the supplied HDF5 compression filters for reading compressed datasets by setting the HDF5_PLUGIN_PATH environment variable the value of hdf5plugin.PLUGINS_PATH, which can be retrieved from the command line with:

python -c "import hdf5plugin; print(hdf5plugin.PLUGINS_PATH)"

For instance:

export HDF5_PLUGIN_PATH=$(python -c "import hdf5plugin; print(hdf5plugin.PLUGINS_PATH)")

should allow MatLab or IDL users to read data compressed using the supported plugins.

Setting the HDF5_PLUGIN_PATH environment variable allows already existing programs or Python code to read compressed data without any modification.