zea.data.file

zea H5 file functionality.

Functions

assert_key(file, key)

Asserts key is in a h5py.File.

dict_to_sorted_list(dictionary)

Convert a dictionary with sortable keys to a sorted list of values.

load_file(path[, data_type, indices, ...])

Loads a zea data files (h5py file).

load_file_all_data_types(path[, indices, ...])

Loads a zea data files (h5py file).

validate_file([path, file])

Validate the structure and data of a zea HDF5 file.

Classes

File(name[, mode])

h5py.File in zea format.

GroupProxy(group)

Lazy proxy for an h5py.Group that exposes children as attributes.

class zea.data.file.File(name, mode='r', *args, **kwargs)[source]

Bases: File

h5py.File in zea format.

Initialize the file.

Parameters:
  • name (str, Path, HFPath) – The path to the file. Can be a string or a Path object. Additionally can be a string with the prefix ‘hf://’, in which case it will be resolved to a huggingface path.

  • mode (str, optional) – The mode to open the file in. Defaults to “r”.

  • *args – Additional arguments to pass to h5py.File.

  • **kwargs – Additional keyword arguments to pass to h5py.File.

copy_key(key, dst)[source]

Copy a specific key to another file.

Will always copy the attributes and the scan data if it exists. Will warn if the key is not in this file or if the key already exists in the destination file.

Parameters:
  • key (str) – The key to copy.

  • dst (File) – The destination file to copy the key to.

classmethod create(path, data, scan=None, metadata=None, metrics=None, probe_name=None, us_machine=None, description=None, compression='gzip', overwrite=False)[source]

Create a new zea HDF5 file from data, scan, and optional metadata.

All inputs are validated against the FileSpec schema (dtypes, shapes, dimension consistency) before anything is written to disk.

Parameters:
  • path – Destination file path.

  • data (dict) – Data dict accepted by DataSpec.

  • scan (dict | None) – Scan-parameter dict accepted by ScanSpec.

  • metadata (dict | None) – Optional metadata dict accepted by MetadataSpec.

  • metrics (dict | None) – Optional metrics dict accepted by MetricsSpec.

  • probe_name (str | None) – Name of the probe.

  • us_machine (str | None) – Name of the ultrasound machine.

  • description (str | None) – Free-text description of the acquisition.

  • compression (str) – HDF5 compression filter (default "gzip").

  • overwrite (bool) – If False (default), raise if the file exists.

Returns:

The closed File handle (re-open with File(path) to read).

Return type:

File

>>> import os, tempfile
>>> import numpy as np
>>> from zea import File

>>> n_frames, n_tx, n_el, n_ax = 2, 4, 8, 64
>>> raw = np.zeros((n_frames, n_tx, n_ax, n_el, 1), dtype=np.float32)
>>> geom = np.zeros((n_el, 3), dtype=np.float32)
>>> scan = {
...     "probe_geometry": geom,
...     "sampling_frequency": np.float32(40e6),
...     "center_frequency": np.float32(5e6),
...     "demodulation_frequency": np.float32(5e6),
...     "initial_times": np.zeros(n_tx, dtype=np.float32),
...     "t0_delays": np.zeros((n_tx, n_el), dtype=np.float32),
...     "tx_apodizations": np.ones((n_tx, n_el), dtype=np.float32),
...     "focus_distances": np.full(n_tx, np.inf, dtype=np.float32),
...     "transmit_origins": np.zeros((n_tx, 3), dtype=np.float32),
...     "polar_angles": np.zeros(n_tx, dtype=np.float32),
...     "time_to_next_transmit": np.ones((n_frames, n_tx), dtype=np.float32) * 1e-4,
... }

>>> _, path = tempfile.mkstemp(suffix=".hdf5")
>>> f = File.create(
...     path, data={"raw_data": raw}, scan=scan, probe_name="L11-4v", overwrite=True
... )
>>> f.probe_name
'L11-4v'
>>> f.close()
>>> os.unlink(path)
property data: GroupProxy

Lazy proxy for the data group.

Returns a GroupProxy so individual datasets can be accessed as attributes without loading everything into RAM:

with File(path) as f:
    f.data.raw_data[:, :n_tx]  # read a slice
    f.data.image.values[0]  # nested group access
property description

Reads the description from the data file and returns it.

format_key(key)[source]

Format the key to match the data type.

get_parameters()[source]

Returns a dictionary of parameters to initialize a scan object that comes with the file (stored inside datafile).

If there are no scan parameters in the hdf5 file, returns an empty dictionary.

Returns:

The scan parameters.

Return type:

dict

get_probe_parameters()[source]

Returns a dictionary of probe parameters to initialize a probe object that comes with the file (stored inside datafile).

Returns:

The probe parameters.

Return type:

dict

get_scan_parameters()[source]

Returns a dictionary of scan parameters stored in the file.

Return type:

dict

classmethod get_shape(path, key)[source]

Get the shape of a key in a file.

Parameters:
  • path (str) – The path to the file.

  • key (str) – The key to get the shape of.

Returns:

The shape of the key.

Return type:

tuple

has_key(key)[source]

Check if the file has a specific key.

Parameters:

key (str) – The key to check.

Returns:

True if the key exists, False otherwise.

Return type:

bool

static key_to_data_type(key)[source]

Convert the key to a data type.

load_data(data_type, indices=None)[source]

Load data from the file.

Deprecated since version Use: file.data.<key> with standard h5py slice indexing instead::

with File(path) as f:

raw = f.data.raw_data[:] # all frames raw = f.data.raw_data[0] # first frame raw = f.data.raw_data[0, [0, 2]] # frame 0, transmits 0 and 2

The indices parameter can be used to load a subset of the data. This can be

  • 'all' or None to load all data

  • an int to load a single frame

  • a List[int] to load specific frames

  • a Tuple[Union[list, slice, int], ...] to index multiple axes (i.e. frames and transmits). Note that

    indexing with lists of indices for multiple axes is not supported. In that case, try to define one of the axes with a slice for optimal performance. Alternatively, slice the data after loading.

For more information on the indexing options, see indexing on ndarrays and fancy indexing in h5py.

Parameters:
  • data_type (str) – The type of data to load. Options are ‘raw_data’, ‘aligned_data’, ‘beamformed_data’, ‘envelope_data’, ‘image’ and ‘image_sc’.

  • indices (Union[Tuple[Union[list, slice, int], ...], List[int], int, None]) – The indices to load. Defaults to None in which case all data is loaded.

Return type:

ndarray

load_scan()[source]

Alias for get_scan_parameters.

load_transmits(key, selected_transmits)[source]

Load raw_data or aligned_data for a given list of transmits. :type key: str :param key: The type of data to load. Options are ‘raw_data’ and ‘aligned_data’. :type key: str :type selected_transmits: list, np.ndarray :param selected_transmits: The transmits to load. :type selected_transmits: list, np.ndarray

metadata()[source]

Return a validated MetadataSpec object from the file.

Returns:

The validated metadata spec.

Return type:

MetadataSpec

Raises:

KeyError – If the file has no metadata group.

Example:

>>> with File("my_file.hdf5") as f:
...     meta = f.metadata()
...     print(meta.subject.id)
metrics()[source]

Return a validated MetricsSpec object from the file.

Returns:

The validated metrics spec.

Return type:

MetricsSpec

Raises:

KeyError – If the file has no metrics group.

Example:

>>> with File("my_file.hdf5") as f:
...     met = f.metrics()
...     print(met.coherence_factor.shape)
property n_ax: int

Number of axial samples.

property n_frames

Return number of frames in a file.

property name

Return the name of the file.

property path

Return the path of the file.

probe()[source]

Returns a Probe object initialized with the parameters from the file.

Returns:

The probe object.

Return type:

Probe

>>> from zea import File
>>> path = (
...     "hf://zeahub/picmus/database/experiments/contrast_speckle/"
...     "contrast_speckle_expe_dataset_iq/contrast_speckle_expe_dataset_iq.hdf5"
... )
>>> with File(path) as f:
...     probe = f.probe()
>>> type(probe).__name__
'Verasonics_l11_4v'
property probe_name

Reads the probe name from the data file and returns it.

recursively_load_dict_contents_from_group(path)[source]

Load dict from contents of group

Values inside the group are converted to numpy arrays or primitive types (int, float, str).

Parameters:

path (str) – path to group

Returns:

dictionary with contents of group

Return type:

dict

scan(safe=True, **kwargs)[source]

Returns a Scan object initialized with the parameters from the file.

Parameters:
  • safe (bool, optional) – If True, will only use parameters that are defined in the Scan class. If False, will use all parameters from the file. Defaults to True.

  • **kwargs – Additional keyword arguments to pass to the Scan object. These will override the parameters from the file if they are present in the file.

Returns:

The scan object.

Return type:

Scan

>>> from zea import File
>>> path = (
...     "hf://zeahub/picmus/database/experiments/contrast_speckle/"
...     "contrast_speckle_expe_dataset_iq/contrast_speckle_expe_dataset_iq.hdf5"
... )
>>> with File(path) as f:
...     scan = f.scan()
>>> type(scan).__name__
'Scan'
shape(key)[source]

Return shape of some key.

Return type:

tuple

property stem

Return the stem of the file.

summary()[source]

Print the contents of the file.

to_iterator(key)[source]

Convert the data to an iterator over all frames.

property us_machine

Reads the ultrasound machine name from the data file and returns it.

validate()[source]

Lightweight structural validation — no array data is loaded into RAM.

Checks that the file has a data group and that all keys within it are recognised zea data types. For legacy files (before zea v0.1.0) a minimal key-name check is performed. For files created with zea v0.1.0 and later (via File.create()) the keys are checked against the DataSpec schema.

Use validate_spec() for a full validation that loads all data and checks dtypes, shapes, and cross-field dimension consistency.

Returns:

{"status": "success"} on success.

Return type:

dict

Raises:

AssertionError – If the file is missing required groups or contains unrecognised data keys.

validate_spec()[source]

Full schema validation — loads all data into RAM.

Reads every dataset in the file and runs dtype, shape, and cross-dimension consistency checks as defined by FileSpec. Use this to confirm a file is fully spec-compliant before sharing or processing it.

For a fast, zero-IO structural check use validate() instead.

Note

This method only works on files created with zea v0.1.0 and later. Files written before zea v0.1.0 should be re-saved through File.create().

Returns:

The fully validated spec object, with all data accessible as typed attributes (e.g. spec.data.raw_data, spec.scan.n_tx).

Return type:

FileSpec

Raises:

TypeError, ValueError – If the file does not conform to the spec.

>>> with File("my_file.hdf5") as f:
...     spec = f.validate_spec()
...     print(spec.scan.n_tx)
property zea_version: str | None

Return the zea version that wrote this file, or None for legacy files.

Files created with zea v0.1.0 and later store a zea_version root attribute. Files written before zea v0.1.0 return None.

class zea.data.file.GroupProxy(group)[source]

Bases: object

Lazy proxy for an h5py.Group that exposes children as attributes.

Datasets are returned as-is (h5py.Dataset supports slicing without loading everything into RAM). Sub-groups are wrapped in another GroupProxy so the dot-access pattern works recursively:

with File(path) as f:
    # returns h5py.Dataset – no data loaded yet
    f.data.raw_data
    # slicing triggers the actual read, just like plain h5py
    f.data.raw_data[:, :n_tx]
    # nested groups work too
    f.data.image.values[0]
keys()[source]

Return the keys of the underlying group.

zea.data.file.assert_key(file, key)[source]

Asserts key is in a h5py.File.

zea.data.file.dict_to_sorted_list(dictionary)[source]

Convert a dictionary with sortable keys to a sorted list of values.

Note

This function operates on the top level of the dictionary only. If the dictionary contains nested dictionaries, those will not be sorted.

Example

>>> from zea.data.file import dict_to_sorted_list
>>> input_dict = {"number_000": 5, "number_001": 1, "number_002": 23}
>>> dict_to_sorted_list(input_dict)
[5, 1, 23]
Parameters:

dictionary (dict) – The dictionary to convert. The keys must be sortable.

Returns:

The sorted list of values.

Return type:

list

zea.data.file.load_file(path, data_type='raw_data', indices=None, scan_kwargs=None)[source]

Loads a zea data files (h5py file).

Returns the data together with a scan object containing the parameters of the acquisition and a probe object containing the parameters of the probe.

Additionally, it can load a specific subset of frames / transmits.

The indices parameter can be used to load a subset of the data. This can be

  • 'all' or None to load all data

  • an int to load a single frame

  • a List[int] to load specific frames

  • a Tuple[Union[list, slice, int], ...] to index multiple axes (i.e. frames and transmits). Note that

    indexing with lists of indices for multiple axes is not supported. In that case, try to define one of the axes with a slice for optimal performance. Alternatively, slice the data after loading.

For more information on the indexing options, see indexing on ndarrays and fancy indexing in h5py.

Parameters:
  • path (str, pathlike) – The path to the hdf5 file.

  • data_type (str, optional) – The type of data to load. Defaults to ‘raw_data’. Other options are ‘aligned_data’, ‘beamformed_data’, ‘envelope_data’, ‘image’ and ‘image_sc’.

  • indices (Union[Tuple[Union[list, slice, int], ...], List[int], int, None]) – The indices to load. Defaults to None in which case all frames are loaded.

  • scan_kwargs (dict) – Additional keyword arguments to pass to the Scan object. These will override the parameters from the file if they are present in the file. Defaults to None.

Returns:

The raw data of shape (n_frames, n_tx, n_ax, n_el, n_ch). (Scan): A scan object containing the parameters of the acquisition. (Probe): A probe object containing the parameters of the probe.

Return type:

Tuple[ndarray, Scan, Probe]

zea.data.file.load_file_all_data_types(path, indices=None, scan_kwargs=None)[source]

Loads a zea data files (h5py file).

Returns all data types together with a scan object containing the parameters of the acquisition and a probe object containing the parameters of the probe.

Additionally, it can load a specific subset of frames / transmits.

The indices parameter can be used to load a subset of the data. This can be

  • 'all' or None to load all data

  • an int to load a single frame

  • a List[int] to load specific frames

  • a Tuple[Union[list, slice, int], ...] to index multiple axes (i.e. frames and transmits). Note that

    indexing with lists of indices for multiple axes is not supported. In that case, try to define one of the axes with a slice for optimal performance. Alternatively, slice the data after loading.

For more information on the indexing options, see indexing on ndarrays and fancy indexing in h5py.

Parameters:
  • path (str, pathlike) – The path to the hdf5 file.

  • indices (Union[Tuple[Union[list, slice, int], ...], List[int], int, None]) – The indices to load. Defaults to None in which case all frames are loaded.

  • scan_kwargs (dict) – Additional keyword arguments to pass to the Scan object. These will override the parameters from the file if they are present in the file. Defaults to None.

Returns:

A dictionary with all data types as keys and the corresponding data as values. (Scan): A scan object containing the parameters of the acquisition. (Probe): A probe object containing the parameters of the probe.

Return type:

(dict)

zea.data.file.validate_file(path=None, file=None)[source]

Validate the structure and data of a zea HDF5 file.

For files created with zea v0.1.0 and later this runs the full FileSpec schema validation (dtypes, shapes, and dimension consistency). Legacy files (before zea v0.1.0) are detected by the presence of scalar dataset scan/n_frames; for those only a lightweight structural data group check is performed.

Provide either path or file, but not both.

Parameters:
  • path (str) – Path to the HDF5 file.

  • file (File) – An already-open File instance.

Returns:

{"status": "success"} on success.

Return type:

dict

Raises:
  • AssertionError – If the file is missing the data group.

  • TypeError, ValueError – If spec validation fails on files created with zea v0.1.0 and later.