histogram: Multidimensional histograms

This module provides a function and a class to compute multidimensional histograms.

Classes

  • Histogramnd : multi dimensional histogram.
  • HistogramndLut : optimized to compute several histograms from data sharing the same coordinates.

Examples

Single histogram

Given some 3D data:

>>> import numpy as np
>>> shape = (10**7, 3)
>>> sample = np.random.random(shape) * 500
>>> weights = np.random.random((shape[0],))

Computing the histogram with Histogramnd :

>>> from silx.math import Histogramnd
>>> n_bins = 35
>>> ranges = [[40., 150.], [-130., 250.], [0., 505]]
>>> histo, w_histo, edges = Histogramnd(sample, n_bins=n_bins, histo_range=ranges, weights=weights)

Histogramnd can accumulate sets of data that don’t have the same coordinates :

>>> from silx.math import Histogramnd
>>> histo_obj = Histogramnd(sample, n_bins=n_bins, histo_range=ranges, weights=weights)
>>> sample_2 = np.random.random(shape) * 200
>>> weights_2 = np.random.random((shape[0],))
>>> histo_obj.accumulate(sample_2, weights=weights_2)

And then access the results:

>>> histo = histo_obj.histo
>>> weighted_histo = histo_obj.weighted_histo

or even:

>>> histo, w_histo, edges = histo_obj

Accumulating histograms (LUT)

In some situations we need to compute the weighted histogram of several sets of data (weights) that have the same coordinates (sample).

Again, some data (2 sets of weights) :

>>> import numpy as np
>>> shape = (10**7, 3)
>>> sample = np.random.random(shape) * 500
>>> weights_1 = np.random.random((shape[0],))
>>> weights_2 = np.random.random((shape[0],))

And getting the result with HistogramLut :

>>> from silx.math import HistogramndLut
>>> n_bins = 35
>>> ranges = [[40., 150.], [-130., 250.], [0., 505]]
>>> histo_lut = HistogramndLut(sample, ranges, n_bins)

First call, with weight_1 :

>>> histo_lut.accumulate(weights_1)

Second call, with weight_2 :

>>> histo_lut.accumulate(weights_2)

Retrieving the results (this is a copy of what’s actually stored in this instance) :

>>> histo = histo_lut.histo
>>> w_histo = histo_lut.weighted_histo

Note that the following code gives the same result, but the HistogramndLut instance does not store the accumulated weighted histogram.

First call with weights_1

>>> histo, w_histo = histo_lut.apply_lut(weights_1)

Second call with weights_2

>>> histo, w_histo = histo_lut.apply_lut(weights_2, histo=histo, weighted_histo=w_histo)

Bin edges

When computing an histogram the caller is asked to provide the histogram range along each coordinates (parameter histo_range). This parameter must be given a [N, 2] array where N is the number of dimensions of the histogram.

In other words, the caller must provide, for each dimension, the left edge of the first (leftmost) bin, and the right edge of the last (rightmost) bin.

E.g. : for a 1D sample, for a histo_range equal to [0, 10] and n_bins=4, the bins ranges will be :

  • [0, 2.5[, [2.5, 5[, [5, 7.5[, [7.5, 10 [ if last_bin_closed = False
  • [0, 2.5[, [2.5, 5[, [5, 7.5[, [7.5, 10 ] if last_bin_closed = True

Classes

class silx.math.histogram.Histogramnd(sample, histo_range, n_bins, weights=None, weight_min=None, weight_max=None, last_bin_closed=False, wh_dtype=None)[source]

Computes the multidimensional histogram of some data.

__init__(sample, histo_range, n_bins, weights=None, weight_min=None, weight_max=None, last_bin_closed=False, wh_dtype=None)[source]
Parameters:
  • sample (numpy.array) –

    The data to be histogrammed. Its shape must be either (N,) if it contains one dimensional coordinates, or an (N,D) array where the rows are the coordinates of points in a D dimensional space. The following dtypes are supported : numpy.float64, numpy.float32, numpy.int32.

    Warning

    if sample is not a C_CONTIGUOUS ndarray (e.g : a non contiguous slice) then histogramnd will have to do make an internal copy.

  • histo_range (array_like) – A (N, 2) array containing the histogram range along each dimension, where N is the sample’s number of dimensions.
  • n_bins (scalar or array_like) –
    The number of bins :
    • a scalar (same number of bins for all dimensions)
    • a D elements array (number of bins for each dimensions)
  • weights (optional, numpy.array) –

    A N elements numpy array of values associated with each sample. The values of the weighted_histo array returned by the function are equal to the sum of the weights associated with the samples falling into each bin. The following dtypes are supported : numpy.float64, numpy.float32, numpy.int32.

    Note

    If None, the weighted histogram returned will be None.

  • weight_min (optional, scalar) –

    Use this parameter to filter out all samples whose weights are lower than this value.

    Note

    This value will be cast to the same type as weights.

  • weight_max (optional, scalar) –

    Use this parameter to filter out all samples whose weights are higher than this value.

    Note

    This value will be cast to the same type as weights.

  • last_bin_closed (optional, python.boolean) – By default the last bin is half open (i.e.: [x,y) ; x included, y excluded), like all the other bins. Set this parameter to true if you want the LAST bin to be closed.
  • wh_dtype (optional, numpy data type) – type of the weighted histogram array. If not provided, the weighted histogram array will contain values of type numpy.double. Allowed values are : numpy.double and numpy.float32
__getitem__(key)[source]

If necessary, results can be unpacked from an instance of Histogramnd : histogram, weighted histogram, bins edge.

Example :

histo, w_histo, edges = Histogramnd(sample, histo_range, n_bins, weights)
accumulate(sample, weights=None, weight_min=None, weight_max=None)[source]

Computes the multidimensional histogram of some data and accumulates it into the histogram held by this instance of Histogramnd.

Parameters:
  • sample (numpy.array) –

    The data to be histogrammed. Its shape must be either (N,) if it contains one dimensional coordinates, or an (N,D) array where the rows are the coordinates of points in a D dimensional space. The following dtypes are supported : numpy.float64, numpy.float32, numpy.int32.

    Warning

    if sample is not a C_CONTIGUOUS ndarray (e.g : a non contiguous slice) then histogramnd will have to do make an internal copy.

  • weights (optional, numpy.array) –

    A N elements numpy array of values associated with each sample. The values of the weighted_histo array returned by the function are equal to the sum of the weights associated with the samples falling into each bin. The following dtypes are supported : numpy.float64, numpy.float32, numpy.int32.

    Note

    If None, the weighted histogram returned will be None.

  • weight_min (optional, scalar) –

    Use this parameter to filter out all samples whose weights are lower than this value.

    Note

    This value will be cast to the same type as weights.

  • weight_max (optional, scalar) –

    Use this parameter to filter out all samples whose weights are higher than this value.

    Note

    This value will be cast to the same type as weights.

histo

Histogram array, or None if this instance was initialized without <sample> and accumulate has not been called yet.

Note

this is a reference to the array store in this Histogramnd instance, use with caution.

weighted_histo

Weighted Histogram, or None if this instance was initialized without <sample>, or no weights have been passed to __init__ nor accumulate.

Note

this is a reference to the array store in this Histogramnd instance, use with caution.

edges

Bins edges, or None if this instance was initialized without <sample> and accumulate has not been called yet.

class silx.math.histogram.HistogramndLut(sample, histo_range, n_bins, last_bin_closed=False, dtype=None)[source]

The HistogramndLut class allows you to bin data onto a regular grid. The use of HistogramndLut is interesting when several sets of data that share the same coordinates (sample) have to be mapped onto the same grid.

__init__(sample, histo_range, n_bins, last_bin_closed=False, dtype=None)[source]
Parameters:
  • sample (numpy.array) – The coordinates of the data to be histogrammed. Its shape must be either (N,) if it contains one dimensional coordinates, or an (N, D) array where the rows are the coordinates of points in a D dimensional space. The following dtypes are supported : numpy.float64, numpy.float32, numpy.int32.
  • histo_range (array_like) – A (N, 2) array containing the histogram range along each dimension, where N is the sample’s number of dimensions.
  • n_bins (scalar or array_like) –
    The number of bins :
    • a scalar (same number of bins for all dimensions)
    • a D elements array (number of bins for each dimensions)
  • dtype (numpy.dtype) – data type of the weighted histogram. If None, the data type will be the same as the first weights array provided (on first call of the instance).
  • last_bin_closed (optional, python.boolean) – By default the last bin is half open (i.e.: [x,y) ; x included, y excluded), like all the other bins. Set this parameter to true if you want the LAST bin to be closed.
clear()[source]

Resets the instance (zeroes the histograms).

lut[source]

Copy of the Lut

histo(copy=True)[source]

Histogram (a copy of it), or None if ~accumulate has not been called yet (or clear was just called). If copy is set to False then the actual reference to the array is returned (use with caution).

weighted_histo(copy=True)[source]

Weighted histogram (a copy of it), or None if ~accumulate has not been called yet (or clear was just called). If copy is set to False then the actual reference to the array is returned (use with caution).

histo_range[source]

Bins ranges.

n_bins[source]

Number of bins in each direction.

bins_edges[source]

Bins edges of the histograms, one array for each dimensions.

last_bin_closed[source]

Returns True if the rightmost bin in each dimension is close (i.e : values equal to the rightmost bin edge is included in the bin).

accumulate(weights, weight_min=None, weight_max=None)[source]

Computes the multidimensional histogram of some data and adds it to the current histogram stored by this instance. The results can be retrieved with the histo and weighted_histo properties.

Parameters:
  • weights – A numpy array of values associated with each sample. The number of elements in the array must be the same as the number of samples provided at instantiation time.
  • weight_min (optional, scalar) –

    Use this parameter to filter out all samples whose weights are lower than this value.

    Note

    This value will be cast to the same type as weights.

  • weight_max (optional, scalar) –

    Use this parameter to filter out all samples whose weights are higher than this value.

    Note

    This value will be cast to the same type as weights.

apply_lut(weights, histo=None, weighted_histo=None, weight_min=None, weight_max=None)[source]

Computes the multidimensional histogram of some data and returns the result (it is NOT added to the current histogram stored by this instance).

Parameters:
  • weights – A numpy array of values associated with each sample. The number of elements in the array must be the same as the number of samples provided at instantiation time.
  • histo (optional, numpy.array) – Use this parameter if you want to pass your own histogram array instead of the one created by this function. New values will be added to this array. The returned array will then be this one.
  • weighted_histo (optional, numpy.array) – Use this parameter if you want to pass your own weighted histogram array instead of the created by this function. New values will be added to this array. The returned array will then be this one (same reference).
  • weight_min (optional, scalar) –

    Use this parameter to filter out all samples whose weights are lower than this value.

    Note

    This value will be cast to the same type as weights.

  • weight_max (optional, scalar) –

    Use this parameter to filter out all samples whose weights are higher than this value.

    Note

    This value will be cast to the same type as weights.