HSpy - HyperSpy’s HDF5 Specification#

This is HyperSpy’s default format and for data processed in HyperSpy, it is the only format that guarantees that no information will be lost in the writing process and that supports saving data of arbitrary dimensions. It is based on the HDF5 open standard. The HDF5 file format is supported by many applications. Parts of the specifications are documented in Metadata structure.

New in version HyperSpy_v1.2: Enable saving HSpy files with the .hspy extension. Previously only the .hdf5 extension was recognised.

Changed in version HyperSpy_v1.3: The default extension for the HyperSpy HDF5 specification is now .hspy. The option to change the default is no longer present in preferences.

Only loading of HDF5 files following the HyperSpy specifications is supported by this plugin. Usually their extension is .hspy extension, but older versions of HyperSpy would save them with the .hdf5 extension. Both extensions are recognised by HyperSpy since version 1.2. However, HyperSpy versions older than 1.2 won’t recognise the .hspy extension. To work around the issue when using old HyperSpy installations simply change the extension manually to .hdf5 or directly save the file using this extension by explicitly adding it to the filename e.g.:

>>> import hyperspy.api as hs
>>> s = hs.signals.BaseSignal([0])
>>> s.save('test.hdf5')

When saving to .hspy, all supported objects in the signal’s hyperspy.signal.BaseSignal.metadata are stored. This includes lists, tuples and signals. Please note that in order to increase saving efficiency and speed, if possible, the inner-most structures are converted to numpy arrays when saved. This procedure homogenizes any types of the objects inside, most notably casting numbers as strings if any other strings are present:

>>> # before saving:
>>> somelist
[1, 2.0, 'a name']
>>> # after saving:
['1', '2.0', 'a name']

The change of type is done using numpy “safe” rules, so no information is lost, as numbers are represented to full machine precision.

This feature is particularly useful when using hyperspy._signals.eds.EDSSpectrum.get_lines_intensity():

>>> s = hs.datasets.example_signals.EDS_SEM_Spectrum()
>>> s.metadata.Sample.intensities = s.get_lines_intensity()
>>> s.save('EDS_spectrum.hspy')

>>> s_new = hs.load('EDS_spectrum.hspy')
>>> s_new.metadata.Sample.intensities
[<BaseSignal, title: X-ray line intensity of EDS SEM Signal1D: Al_Ka at 1.49 keV, dimensions: (|)>,
 <BaseSignal, title: X-ray line intensity of EDS SEM Signal1D: C_Ka at 0.28 keV, dimensions: (|)>,
 <BaseSignal, title: X-ray line intensity of EDS SEM Signal1D: Cu_La at 0.93 keV, dimensions: (|)>,
 <BaseSignal, title: X-ray line intensity of EDS SEM Signal1D: Mn_La at 0.63 keV, dimensions: (|)>,
 <BaseSignal, title: X-ray line intensity of EDS SEM Signal1D: Zr_La at 2.04 keV, dimensions: (|)>]

Chunking#

New in version HyperSpy_v1.3.1: chunks keyword argument

The HyperSpy HDF5 format supports chunking the data into smaller pieces to make it possible to load only part of a dataset at a time. By default, the data is saved in chunks that are optimised to contain at least one full signal. It is possible to customise the chunk shape using the chunks keyword. For example, to save the data with (20, 20, 256) chunks instead of the default (7, 7, 2048) chunks for this signal:

>>> s = hs.signals.Signal1D(np.random.random((100, 100, 2048)))
>>> s.save("test_chunks", chunks=(20, 20, 256))

Note that currently it is not possible to pass different customised chunk shapes to all signals and arrays contained in a signal and its metadata. Therefore, the value of chunks provided on saving will be applied to all arrays contained in the signal.

By passing True to chunks the chunk shape is guessed using the guess_chunk function of h5py For large signal spaces, the autochunking usually leads to smaller chunks as guess_chunk does not impose the constrain of storing at least one signal per chunk. For example, for the signal in the example above passing chunks=True results in chunks of (7, 7, 256).

Choosing the correct chunk-size can significantly affect the speed of reading, writing and performance of many HyperSpy algorithms. See the HyperSpy chunking section for more information.

Note

Also see the HDF5 utility functions for inspecting HDF5 files.

API functions#

rsciio.hspy.file_reader(filename, lazy=False, **kwds)#

Read data from hdf5-files saved with the HyperSpy hdf5-format specification (.hspy).

Parameters:

filename (str, pathlib.Path) – Filename of the file to read or corresponding pathlib.Path.
lazy (bool, Default=False) – Whether to open the file lazily or not.
**kwds –

Returns:

List of dictionaries containing the following fields:

’data’ – multidimensional numpy array
’axes’ – list of dictionaries describing the axes containing the fields ‘name’, ‘units’, ‘index_in_array’, and either ‘size’, ‘offset’, and ‘scale’ or a numpy array ‘axis’ containing the full axes vector
’metadata’ – dictionary containing the parsed metadata
’original_metadata’ – dictionary containing the full metadata tree from the input file

Return type:

list of dicts

rsciio.hspy.file_writer(filename, signal, close_file=True, **kwds)#

Writes data to HyperSpy’s hdf5-format (.hspy).

Parameters:

filename (str, pathlib.Path) – Filename of the file to write to or corresponding pathlib.Path.
signal (dict) –
Dictionary containing the signal object. Should contain the following fields:
- ’data’ – multidimensional numpy array
- ’axes’ – list of dictionaries describing the axes containing the fields ‘name’, ‘units’, ‘index_in_array’, and either ‘size’, ‘offset’, and ‘scale’ or a numpy array ‘axis’ containing the full axes vector
- ’metadata’ – dictionary containing the metadata tree
compression (None, 'gzip', 'szip', 'lzf', Default='gzip'.) – Compression can significantly increase the saving speed. If file size is not an issue, it can be disabled by setting compression=None. RosettaSciIO uses h5py for reading and writing HDF5 files and, therefore, it supports all compression filters supported by h5py. The default is 'gzip'. Also see notes below.
chunks (tuple of integer or None, Default=None) – Define the chunking used for saving the dataset. If None, calculates chunks for the signal, with preferably at least one chunk per signal space.
close_file (bool, Default=True) – Close the file after writing. The file should not be closed if the data needs to be accessed lazily after saving.
write_dataset (bool, Default=True) – If True, write the dataset, otherwise, don’t write it. Useful to overwrite attributes (for example axes_manager) only without having to write the whole dataset.
**kwds – The keyword argument are passed to the h5py.Group.require_dataset() function.

Notes

It is possible to enable other compression filters such as blosc by installing e.g. hdf5plugin. Similarly, the availability of 'szip' depends on the HDF5 installation. If not available an error will be raised. Be aware that loading those files will require installing the package providing the compression filter and it may thus not be possible to load it on some platforms. Only compression=None and compression='gzip' are available on all platforms. For more details, see the h5py documentation.