Architecture and design decisions¶
This page is a brief overview of nabu architecture and design decisions.
Architecture overview¶
Nabu consists in a series of modules with a defined processing scope: pre-processing, reconstruction, I/Os, parameters estimation, pipelining.
Importantly, these modules are as decoupled as possible - it is generally possible to use the features of one module without using another module.
Modules breakdown¶
preproc
: processing happening before reconstructionreconstruction
: sinogram denoising, filtering, and FBPio
: read and write datamisc
: miscellaneous image processing functions (convolution, histogram, filtering)pipeline
: reconstruction pipeline, for now full-fieldapp
: command-line toolscuda
: CUDA-specific utilitiesopencl
: OpenCL-specific utilitiesstitching
: utilities related to volumes stitching. Might be moved to another module in the future.resources
: mostly dataset parsing and logger. May be removed/removed in the future.
Each module has a tests
submodules containing unit tests. These tests can be run either with pytest tests/file.py
, or with the nabu-test
CLI tool.
Backends and API¶
Each processing function/class is first implemented in python/numpy so that it can be tested easily. Additionally, some functions/classes can have other backends for performances (eg. Cuda, OpenCL). In this case, the API must be the same (possibly with additional specific keywords arguments).
See also¶
Design decisions¶
Nabu aims at being simple, versatile ; while offering high performance processing capabilities. These goals have an impact on the overall design. More generally, we try to avoid pitfalls commonly found in some scientific software.
The following design decision will be listed as a series of aphorisms.
Decouple I/O code from processing code¶
Many scientific codes mix reading/writing data with the processing part. Such codes are usually not re-usable, as they do assumptions on files path and formats.
More generally, decouple functions as much as possible, so that they can be used and tested separately.
Allocate resources once, use them many times (stateful computations)¶
It is common in scientific software to write functions which we “fire-and-forget” on data. It is also the default approach for most workflow engines: build a computational graph where each node is a state-less function.
However, when performance matters, this approach is not viable. Usually, memory has to be allocated, and some pre-computations have to be done. For example, Fast Fourier Transform (FFT) software internally rely on a “plan” - a data structure that pre-computes many things on the kind of data it will process. Doing these allocations and pre-computations each time dramatically hampers performances, especially if the function is to be used many times. For example, allocating a large chunk of memory for each function call can be costly, especially in GPU programming.
In nabu, the default approach is to
Instantiate a class with some data description
Use it many times
For example:
from nabu.preproc.phase import PaganinPhaseRetrieval
phase_retriever = PaganinPhaseRetrieval(
radio_shape,
distance=distance_m,
energy=energy_kev,
delta_beta=delta_beta,
pixel_size=1e-6
)
for radio in radios:
phase_retriever.apply_filter(radio, output=radio)
Minimize data transfers¶
A tomography pipeline oriented to high performance should avoid memory exchanges (CPU<->GPU, node<->node) whenever possible. Note that our stateful approach simplifies this issue, as we have more control on memory (it is bound to a current class instance).
Synchrotron X-rays have the nice property to form a parallel beam, so let’s use this many-millions euros investment: each horizontal slice/slab can be reconstructed independently without exchanging any data. For cone-beam geometry, excellent reconstruction software is available.
In its current state, nabu spends almost half of the total reconstruction time reading/writing data, even on GPFS or fast SSD, to reconstruct a volume on a single machine. This means that any optimization of the processing software can bring at most a factor of two speed-up. (this is less and less true as detector increase their number of pixels: FBP becomes a bottleneck).
Generally speaking, many high-performance scientific softwares are I/O bound rather than compute-bound. Even compute-critical parts will be about optimizing internal memory access in GPU or minimizing cache miss in CPU.
Don’t reinvent a generic processing pipeline, focus on what matters¶
Off-the-shelf solutions for distributing computations on many nodes are available, for example distributed/dask_jobqueue
.
Therefore, we focus on the added value of scientific software, which is data processing/analysis algorithms.
Writing yet another generic pipeline is a liability in the codebase, as it should be maintained additionally to the processing part.
Our first goal is to provide a collection of building blocks for tomography (processing functions and classes) as done by tomopy. But these have to be assembled to form a complete processing pipeline which can be used from the command line. The “assembling” part should be kept as simple as possible.
Admittedly, when writing a processing pipeline, the trade-off between simplicity/maintainability and versatility/complexity is difficult to find.
In nabu, we use a submodule nabu.pipeline
which tries to be as small as possible (and is probably already too complicated).
Minimize the barrier to entry for users and developers¶
The code should be accessible to “scientists who can write some code” ; not only to professional developers. A software that can be extended by many people has a higher life expectancy.
Use native data structures whenever possible (
dict
, lists, etc). No Enum/nametuple or other constructs that are abstruse for non-developers.Use a simple design: functions/classes as building blocks, and write a pipeline on top of them. No scheduler/core system and plugins all over the place. Most of the code should be about tomography processing.
Simplify code distribution¶
Prefer just-in-time compilation (pyopencl, pycuda, numba?) to ahead-of-time compilation (eg. Cython extensions). “pure-python” package are much easier to distribute on many platforms. By contrast, packages with native extensions require extensive efforts to be made work on many platforms.
“Explicit is better than implicit”¶
We take the opposite approach of “it’s a GPU array with the exact same interface than a numpy.ndarray
, please do as if it was one!”. This approach is used by cupy or reikna/cluda.
Although the duck typing practice has been a factor of Python’s success, it does have limitations.
The rising trend of using typing
in python codebases is an indication.
Using objects indifferently is powerful, but very difficult to debug when it goes wrong, especially when doing GPU programming.
In nabu, a GPU array is a GPU array, not a numpy array, and it has to be handled as such.