ZSpy - HyperSpy’s Zarr Specification#
Similarly to the hspy format, the .zspy format guarantees that no
information will be lost in the writing process and that supports saving data
of arbitrary dimensions. It is based on the Zarr project. Which exists as a drop in
replacement for hdf5 with the intention to fix some of the speed and scaling
issues with the hdf5 format and is therefore suitable for saving
big data. Example using HyperSpy:
>>> import hyperspy.api as hs
>>> s = hs.signals.BaseSignal([0])
>>> s.save('test.zspy') # will save in nested directory
>>> hs.load('test.zspy') # loads the directory
When saving to zspy, all supported objects in the signal’s
metadata is stored. This includes lists, tuples and signals.
Please note that in order to increase saving efficiency and speed, if possible,
the inner-most structures are converted to numpy arrays when saved. This
procedure homogenizes any types of the objects inside, most notably casting
numbers as strings if any other strings are present:
By default, a zarr.storage.NestedDirectoryStore is used, but other
zarr store can be used by providing a zarr.storage
instead as argument to the save() or the
load() function. If a .zspy file has been saved with a different
store, it would need to be loaded by passing a store of the same type:
>>> import zarr
>>> filename = 'test.zspy'
>>> store = zarr.LMDBStore(filename)
>>> signal.save(store) # saved to LMDB
To load this file again
>>> import zarr
>>> filename = 'test.zspy'
>>> store = zarr.LMDBStore(filename)
>>> s = hs.load(store) # load from LMDB
API functions#
- rsciio.zspy.file_reader(filename, lazy=False, **kwds)#
Read data from zspy files saved with the HyperSpy zarr format specification.
- Parameters:
filename (str, pathlib.Path) – Filename of the file to read or corresponding pathlib.Path.
lazy (bool, Default=False) – Whether to open the file lazily or not.
**kwds (optional) – Pass keyword arguments to the
zarr.open()function.
- Returns:
List of dictionaries containing the following fields:
’data’ – multidimensional numpy array
’axes’ – list of dictionaries describing the axes containing the fields ‘name’, ‘units’, ‘index_in_array’, and either ‘size’, ‘offset’, and ‘scale’ or a numpy array ‘axis’ containing the full axes vector
’metadata’ – dictionary containing the parsed metadata
’original_metadata’ – dictionary containing the full metadata tree from the input file
- Return type:
list of dicts
- rsciio.zspy.file_writer(filename, signal, close_file=True, **kwds)#
Writes data to HyperSpy’s zarr format.
- Parameters:
filename (str, pathlib.Path) – Filename of the file to write to or corresponding pathlib.Path.
signal (dict) –
Dictionary containing the signal object. Should contain the following fields:
’data’ – multidimensional numpy array
’axes’ – list of dictionaries describing the axes containing the fields ‘name’, ‘units’, ‘index_in_array’, and either ‘size’, ‘offset’, and ‘scale’ or a numpy array ‘axis’ containing the full axes vector
’metadata’ – dictionary containing the metadata tree
close_file (bool, default=True) – Close the file after writing. Only relevant for some zarr storages (
zarr.storage.ZipStore,zarr.storage.DBMStore) requiring store to flush data to disk. IfFalse, doesn’t close the file after writing. The file should not be closed if the data needs to be accessed lazily after saving.chunks (tuple of integer or None, default=None) – Define the chunking used for saving the dataset. If None, calculates chunks for the signal, with preferably at least one chunk per signal space.
compressor (numcodecs compression, optional) – A compressor can be passed to the save function to compress the data efficiently, see Numcodecs codec. The default is to use a Blosc compressor.
write_dataset (bool, default=True) – If
False, doesn’t write the dataset when writing the file. This can be useful to overwrite signal attributes only (for exampleaxes_manager) without having to write the whole dataset, which can take time.**kwds – The keyword arguments are passed to the
zarr.hierarchy.Group.require_dataset()function.
Examples
>>> from numcodecs import Blosc >>> compressor=Blosc(cname='zstd', clevel=1, shuffle=Blosc.SHUFFLE) # Used by default >>> file_writer('test.zspy', s, compressor = compressor) # will save with Blosc compression
Note
Lazy operations are often i-o bound. Reading and writing the data creates a bottle neck in processes due to the slow read write speed of many hard disks. In these cases, compressing your data is often beneficial to the speed of some operations. Compression speeds up the process as there is less to read/write with the trade off of slightly more computational work on the CPU.