Tutorial 4: Loading existing `s_cube` objects and export options

Outline

Load existing s_cubeobjects
Creating a new HDF5 file for each exported fields
Appending fields to existing HDF5 files
Exporting data in batches or snapshot-by-snapshot

In this tutorial we will briefly look at different options when exporting the data from \(S^3\). This is especially useful when dealing with large datasets, for which \(S^3\) was originally designed for. The first steps are the same ass presented in tutorial 1.

Prerequisites: Execution of the cylinder2D simulation from tutorial 1.

1. Loading existing `s_cube`objects

[1]:

import sys
import torch as pt

from os import environ
from typing import Union
from os.path import join

environ["sparseSpatialSampling"] = join("..", "..", "..")
sys.path.insert(0, environ["sparseSpatialSampling"])

from sparseSpatialSampling.export import ExportData
from sparseSpatialSampling.utils import load_foam_data, load_original_Foam_fields

Warning: TecplotDataloader can't be loaded. Most likely, the 'paraview' module is missing.
Refer to the installation instructions at https://github.com/FlowModelingControl/flowtorch
If you are not using the TecplotDataloader, ignore this warning.

[2]:

# path to the CFD data and settings, assuming they are in the top-level of the repository
load_path = join("..", "..", "..", "run", "tutorials", "tutorial_1")
load_path_cfd = join("..", "..", "..", "flowTorch_Workshop_2025", "cylinder_2D_Re100")

# define the path to where we want to save the results and the name of the file
save_path = join("..","..", "..", "run", "tutorials", "tutorial_4")

[3]:

# load the s_cube object
s_cube = pt.load(join(load_path, "s_cube_cylinder2D_metric_0.75.pt"), weights_only=False)

# load the velocty and pressure field of the simulation
bounds = [[0, 0], [2.2, 0.41]]
field_U, coord, _, write_times = load_foam_data(load_path_cfd, bounds, field_name="U", t_start=8, scalar=False)
field_p, _, _, _ = load_foam_data(load_path_cfd, bounds, t_start=8)

[2026-02-19 15:39:43] INFO     Loading precomputed cell centers and volumes from processor0/constant
[2026-02-19 15:39:43] INFO     Loading precomputed cell centers and volumes from processor1/constant
[2026-02-19 15:39:43] INFO     Loading precomputed cell centers and volumes from processor0/constant
[2026-02-19 15:39:43] INFO     Loading precomputed cell centers and volumes from processor1/constant

2. Creating a new file for each field

In tutorial 1, we create a single HFD5 file containing all the data from our simulation. However, especially when dealing with large amounts of data, having a single large HDF5 file may be impractical. Instead, \(S^3\) allows us to create a single HDF5 file for each field so we end up with a few but smaller files which may be handled easier. The corresponding field name will be appended to each HDF5 file name.

[4]:

# instantiate an export object, here we want to create a new HDF5 file for each field, for demonstration purposes we only export a single snapshot
export = ExportData(s_cube, write_new_file_for_each_field=True, write_times=write_times[-1])

# we have to overwrite the save_path and save_name, since we want to save this in another directory
export.save_dir = save_path
export.save_name = "cylinder2D_Re100_new_file"

# now export the last snapshot of the velocity field
export.export(coord, field_U[:, :, -1].unsqueeze(-1), "U")

# now export the last snapshot of the pressure field into a new file
export.export(coord, field_p[:, -1].unsqueeze(-1).unsqueeze(-1), "p")

[2026-02-19 15:39:48] INFO     Initializing KNN and computing interpolation weights.
[2026-02-19 15:39:48] INFO     Starting interpolation and export of field U.
[2026-02-19 15:39:48] INFO     Writing HDF5 file for field U.
[2026-02-19 15:39:48] INFO     Writing XDMF file for file cylinder2D_Re100_new_file_U.h5
[2026-02-19 15:39:48] INFO     Finished export of field U in 0.034s.
[2026-02-19 15:39:48] INFO     Starting interpolation and export of field p.
[2026-02-19 15:39:48] INFO     Writing HDF5 file for field p.
[2026-02-19 15:39:48] INFO     Writing XDMF file for file cylinder2D_Re100_new_file_p.h5
[2026-02-19 15:39:48] INFO     Finished export of field p in 0.008s.

3. Appending fields to existing HDF5 files

We can also append a field to an existing HDF5 file, e.g., in case we forgot to export it, or if we want to export it at a later stage. To append a field to an existing file, we can just load the s_cube object and instantiate an export object as before. The only difference is, that we have to set the argument append_existing = True. The following code illustrates this example:

[5]:

# load Scube object from previous grid generation
pt.load(join(load_path, "s_cube_cylinder2D_metric_0.75.pt"), weights_only=False)

# here we don't overwrite the paths, since we want to append the field directly
export = ExportData(s_cube, write_times=write_times[-1], append_existing=True)

# now export the last snapshot of the pressure field into the same file, but we name it differently so we can check if everything worked
export.export(coord, field_p[:, -1].unsqueeze(-1).unsqueeze(-1), "p_appended")

[2026-02-19 15:39:52] INFO     Appending fields to file ../../../run/tutorials/tutorial_1/cylinder2D_metric_0.75.h5
[2026-02-19 15:39:52] INFO     Initializing KNN and computing interpolation weights.
[2026-02-19 15:39:52] INFO     Starting interpolation and export of field p_appended.
[2026-02-19 15:39:52] INFO     Writing HDF5 file for field p_appended.
[2026-02-19 15:39:52] INFO     Writing XDMF file for file cylinder2D_metric_0.75.h5
[2026-02-19 15:39:52] INFO     Finished export of field p_appended in 0.159s.

Note: If append_existing = True is passed, the argument write_new_file_for_each_field = True will be disabled automatically. This is due to the fact that it doesn’t make sense to append a field to an existing file but at the same time saving each new field in a separate file.

4. Exporting data in batches or snapshot-by-snapshot

So far we always loaded and exported the complete data matrix at once. However, for larger datasets it is very unlikely that all the data will fit into memory at once. To avoid this issue, instead of loading and exporting the complete data matrix at once, we can do it in batches or in case of very large snapshots, even snapshot-by-snapshot.

To make use of this functionality we only have to change the parameter n_snapshots_total in the export() method to n_snapshots_total=len(write_times). This is required, so that the export() method knows how many snapshots it is expecting.

The overall approach can be summarized as followed:

Load a certain number of snapshots \(N\), where \(1 \le N \le N_\mathrm{snapshots}\) and has to be chosen based on the memory requirements
Pass them to the export()method as before, but pass the additional argument n_snapshots_total=len(write_times) (total number of snapshots to export)
Continue with 1. until all snapshots are exported

This procedure will be shown in the following. The function export_fields_snapshot_wise below creates an abstraction for easier usage.

Note: If the memory is still a limiting factor, we can decrease the chunk_size. The chunk_size controls the number of cells interpolated at once. The memory requirements increases linearly with increasing chunk_size while the execution time decreases with increasing chunk_size. It defaults to chunk_size= 100 000 cells. We can set the chunk_size when calling the export() method as export.export(coord, field_p[:, -1].unsqueeze(-1).unsqueeze(-1), "p", chunk_size=XXX).

[6]:

def export_fields_snapshot_wise(load_dir: str, datawriter: ExportData, field_names: Union[str, list], boundaries: list,
                                write_times: Union[str, list], batch_size: int = 25) -> None:
    """
    For each field specified, interpolate all snapshots onto the generated grid and export it to HDF5 & XDMF. The
    interpolation and export of the data is performed snapshot-by-snapshot (batch_size = 1) or in batches to avoid out
    of memory issues for large datasets.

    :param load_dir: path to the simulation data
    :param datawriter: DataWriter class after executing the S^3 algorithm
    :param field_names: names of the fields to export
    :param boundaries: boundaries of the masked area of the domain (needs to be the same as used for loading the
                       vertices and computing the metric)
    :param write_times: the write times of the simulation
    :param batch_size: batch size, number of snapshots which should be interpolated and exported at once
    :return: None
    """
    # make sure the type is correct
    write_times = write_times if isinstance(write_times, list) else [write_times]
    field_names = field_names if isinstance(field_names, list) else [field_names]

    # set the write times in case we haven't done that already
    if datawriter.write_times is None:
        datawriter.write_times = write_times

    # now loop over all fields
    for f in field_names:
        counter = 1

        # compute the required number of batches
        if not len(datawriter.write_times) % batch_size:
            n_batches = int(len(datawriter.write_times) / batch_size)
        else:
            n_batches = int(len(datawriter.write_times) / batch_size) + 1

        # now loop over all batches
        for i in pt.arange(0, len(datawriter.write_times), step=batch_size).tolist():
            print(f"Exporting batch {counter} / {n_batches}")

            # load the required number of snapshots
            coordinates, data = load_original_Foam_fields(load_dir, datawriter.n_dimensions, boundaries, field_names=f,
                                                          write_times=datawriter.write_times[i:i + batch_size])

            # in case the field is not available, the export()-method will return None
            if data is not None:
                # export the current batch
                datawriter.export(coordinates, data, f, n_snapshots_total=len(datawriter.write_times))
            counter += 1

[7]:

# check how many snapshots we have
print(f"Number of snapshots: {len(write_times)}")

Number of snapshots: 101

[9]:

# now we want to export the data for the last 500 snapshots of the velocity field in batches
export = ExportData(s_cube)
export.save_name = "cylinder2D_Re100"
export.save_dir = save_path

# batch_size = 1 would mean we export the data snapshot-by-snapshot. Since our data is very small we choose a larger batch size
export_fields_snapshot_wise(load_path_cfd, export, "U", bounds, write_times[-500:], batch_size=50)

[2026-02-19 15:40:18] WARNING  Argument ``write_times`` is ``None``. Make sure to set the ``write_times`` before calling the ``export()`` method.
[2026-02-19 15:40:18] INFO     Loading precomputed cell centers and volumes from processor0/constant
[2026-02-19 15:40:18] INFO     Loading precomputed cell centers and volumes from processor1/constant

Exporting batch 1 / 3

[2026-02-19 15:40:19] INFO     Initializing KNN and computing interpolation weights.
[2026-02-19 15:40:19] INFO     Starting interpolation and export of field U.
[2026-02-19 15:40:19] INFO     Writing HDF5 file for field U.
[2026-02-19 15:40:19] INFO     Loading precomputed cell centers and volumes from processor0/constant
[2026-02-19 15:40:19] INFO     Loading precomputed cell centers and volumes from processor1/constant

Exporting batch 2 / 3

[2026-02-19 15:40:19] INFO     Loading precomputed cell centers and volumes from processor0/constant
[2026-02-19 15:40:19] INFO     Loading precomputed cell centers and volumes from processor1/constant
[2026-02-19 15:40:19] INFO     Writing XDMF file for file cylinder2D_Re100.h5
[2026-02-19 15:40:19] INFO     Finished export of field U in 0.68s.

Exporting batch 3 / 3

[ ]:

Tutorial 4: Loading existing s_cube objects and export options