{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9e954576-1b51-4afc-a7c4-fb55259f84ee",
   "metadata": {},
   "source": [
    "# Tutorial 4: Loading existing `s_cube` objects and export options\n",
    "## flowTorch workshop 29.09.2025 - 02.10.2025\n",
    "\n",
    "### Outline\n",
    "1. Load existing `s_cube`objects\n",
    "2. Creating a new HDF5 file for each exported fields\n",
    "3. Exporting data in batches or snapshot-by-snapshot\n",
    "\n",
    "In this tutorial we will briefly look at different options when exporting the data from $S^3$. This is especially useful when dealing with large datasets, for which $S^3$ was originally designed for. The first steps are the same ass presented in tutorial 1.\n",
    "\n",
    "**Prerequisites:** Execution of the cylinder2D simulation from tutorial 1.\n",
    "\n",
    "## 1. Loading existing `s_cube`objects"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "bac87c9d-3c95-4509-92d8-0182e6ba1ed7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Warning: TecplotDataloader can't be loaded. Most likely, the 'paraview' module is missing.\n",
      "Refer to the installation instructions at https://github.com/FlowModelingControl/flowtorch\n",
      "If you are not using the TecplotDataloader, ignore this warning.\n"
     ]
    }
   ],
   "source": [
    "import sys\n",
    "import torch as pt\n",
    "\n",
    "from typing import Union\n",
    "from stl import mesh\n",
    "from os.path import join\n",
    "from os import environ, system\n",
    "\n",
    "environ[\"sparseSpatialSampling\"] = \"../../..\"\n",
    "sys.path.insert(0, environ[\"sparseSpatialSampling\"])\n",
    "\n",
    "from sparseSpatialSampling.export import ExportData\n",
    "from sparseSpatialSampling.utils import load_foam_data, load_original_Foam_fields"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "2eec0519-e390-4101-86f2-a1c7725fe75d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# path to the CFD data and settings, assuming they are in the top-level of the repository\n",
    "load_path = join(\"..\", \"..\", \"..\", \"run\", \"tutorials\", \"tutorial_1\")\n",
    "load_path_cfd = join(\"..\", \"..\", \"..\", \"flowTorch_Workshop_2025\", \"cylinder_2D_Re100\")\n",
    "\n",
    "# define the path to where we want to save the results and the name of the file\n",
    "save_path = join(\"..\",\"..\", \"..\", \"run\", \"tutorials\", \"tutorial_4\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "3e2ba9d4-a04b-406f-b366-3a10eb5f77fc",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[2025-08-15 11:42:17] INFO     Loading precomputed cell centers and volumes from processor0/constant\n",
      "[2025-08-15 11:42:17] INFO     Loading precomputed cell centers and volumes from processor1/constant\n",
      "[2025-08-15 11:42:21] INFO     Loading precomputed cell centers and volumes from processor0/constant\n",
      "[2025-08-15 11:42:21] INFO     Loading precomputed cell centers and volumes from processor1/constant\n"
     ]
    }
   ],
   "source": [
    "# load the s_cube object\n",
    "s_cube = pt.load(join(load_path, \"s_cube_cylinder2D_metric_0.75.pt\"), weights_only=False)\n",
    "\n",
    "# load the velocty and pressure field of the simulation\n",
    "bounds = [[0, 0], [2.2, 0.41]]\n",
    "field_U, coord, _, write_times = load_foam_data(load_path_cfd, bounds, field_name=\"U\", t_start=8, scalar=False)\n",
    "field_p, _, _, _ = load_foam_data(load_path_cfd, bounds, t_start=8)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "83c1436f-5c2a-4c20-9ca3-4dce9af0807e",
   "metadata": {},
   "source": [
    "## 2. Creating a new file for each field\n",
    "In tutorial 1, we create a single HFD5 file containing all the data from our simulation. However, especially when dealing with large amounts of data, having a single large HDF5 file may be impractical. Instead, $S^3$ allows us to create a single HDF5 file for each field so we end up with a few but smaller files which may be handled easier. The corresponding field name will be appended to each HDF5 file name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "576acb85-81ad-4ed0-8ef1-f0d396fa7292",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[2025-08-15 11:42:23] INFO     Starting interpolation and export of field U.\n",
      "[2025-08-15 11:42:23] INFO     Writing HDF5 file for field U.\n",
      "[2025-08-15 11:42:23] INFO     Writing XDMF file for file cylinder2D_Re100_new_file_U.h5\n",
      "[2025-08-15 11:42:23] INFO     Finished export of field U in 0.101s.\n",
      "[2025-08-15 11:42:23] INFO     Starting interpolation and export of field p.\n",
      "[2025-08-15 11:42:23] INFO     Writing HDF5 file for field p.\n",
      "[2025-08-15 11:42:23] INFO     Writing XDMF file for file cylinder2D_Re100_new_file_p.h5\n",
      "[2025-08-15 11:42:23] INFO     Finished export of field p in 0.037s.\n"
     ]
    }
   ],
   "source": [
    "# instantiate an export object, here we want to create a new HDF5 file for each field\n",
    "export = ExportData(s_cube, write_new_file_for_each_field=True)\n",
    "\n",
    "# we have to overwrite the save_path and save_name, since we want to save this in another directory\n",
    "export.save_dir = save_path\n",
    "export.save_name = \"cylinder2D_Re100_new_file\"\n",
    "\n",
    "# for demonstration purposes we only export a single snapshot\n",
    "export.write_times = write_times[-1]\n",
    "\n",
    "# now export the last snapshot of the velocity field\n",
    "export.export(coord, field_U[:, :, -1].unsqueeze(-1), \"U\")\n",
    "\n",
    "# now export the last snapshot of the pressure field into a new file\n",
    "export.export(coord, field_p[:, -1].unsqueeze(-1).unsqueeze(-1), \"p\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9897b28f-b218-4121-9196-dbd7acaac919",
   "metadata": {},
   "source": [
    "## 3. Exporting data in batches or snapshot-by-snapshot\n",
    "So far we always loaded and exported the complete data matrix at once. However, for larger datasets it is very unlikely that all the data will fit into memory at once. To avoid this issue, instead of loading and exporting the complete data matrix at once, we can do it in batches or in case of very large snapshots, even snapshot-by-snapshot.\n",
    "\n",
    "To make use of this functionality we only have to change the parameter `n_snapshots_total` in the `export()` method to `n_snapshots_total=len(write_times)`. This is required, so that the `export()` method knows how many snapshots it is expecting.\n",
    "\n",
    "The overall approach can be summarized as followed:\n",
    "1. Load a certain number of snapshots $N$, where $1 \\le N \\le N_\\mathrm{snapshots}$ and has to be chosen based on the memory requirements\n",
    "2. Pass them to the `export()`method as before, but pass the additional argument `n_snapshots_total=len(write_times)` (total number of snapshots to export)\n",
    "3. Continue with *1.* until all snapshots are exported\n",
    "\n",
    "This procedure will be shown in the following. The function `export_fields_snapshot_wise` below creates an abstraction for easier usage. \n",
    "\n",
    "**Note:** The following code will create an HDF5 and XDMF file which can't be opened in ParaView when executed in a Jupyter notebook for some reason. In case you want to use this code productively, you have to copy it into a python script and execute it separately. Then everything works."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "258d2832-613d-4a5b-b4d4-e9803d522c9d",
   "metadata": {},
   "outputs": [],
   "source": [
    "def export_fields_snapshot_wise(load_dir: str, datawriter: ExportData, field_names: Union[str, list], boundaries: list,\n",
    "                                write_times: Union[str, list], batch_size: int = 25) -> None:\n",
    "    \"\"\"\n",
    "    For each field specified, interpolate all snapshots onto the generated grid and export it to HDF5 & XDMF. The\n",
    "    interpolation and export of the data is performed snapshot-by-snapshot (batch_size = 1) or in batches to avoid out\n",
    "    of memory issues for large datasets.\n",
    "\n",
    "    :param load_dir: path to the simulation data\n",
    "    :param datawriter: DataWriter class after executing the S^3 algorithm\n",
    "    :param field_names: names of the fields to export\n",
    "    :param boundaries: boundaries of the masked area of the domain (needs to be the same as used for loading the\n",
    "                       vertices and computing the metric)\n",
    "    :param write_times: the write times of the simulation\n",
    "    :param batch_size: batch size, number of snapshots which should be interpolated and exported at once\n",
    "    :return: None\n",
    "    \"\"\"\n",
    "    # make sure the type is correct\n",
    "    write_times = write_times if isinstance(write_times, list) else [write_times]\n",
    "    field_names = field_names if isinstance(field_names, list) else [field_names]\n",
    "\n",
    "    # set the write times in case we haven't done that already\n",
    "    if datawriter.write_times is None:\n",
    "        datawriter.write_times = write_times\n",
    "\n",
    "    # now loop over all fields\n",
    "    for f in field_names:\n",
    "        counter = 1\n",
    "\n",
    "        # compute the required number of batches\n",
    "        if not len(datawriter.write_times) % batch_size:\n",
    "            n_batches = int(len(datawriter.write_times) / batch_size)\n",
    "        else:\n",
    "            n_batches = int(len(datawriter.write_times) / batch_size) + 1\n",
    "\n",
    "        # now loop over all batches\n",
    "        for i in pt.arange(0, len(datawriter.write_times), step=batch_size).tolist():\n",
    "            print(f\"Exporting batch {counter} / {n_batches}\")\n",
    "\n",
    "            # load the required number of snapshots\n",
    "            coordinates, data = load_original_Foam_fields(load_dir, datawriter.n_dimensions, boundaries, field_names=f,\n",
    "                                                          write_times=datawriter.write_times[i:i + batch_size])\n",
    "\n",
    "            # in case the field is not available, the export()-method will return None\n",
    "            if data is not None:\n",
    "                # export the current batch\n",
    "                datawriter.export(coordinates, data, f, n_snapshots_total=len(datawriter.write_times))\n",
    "            counter += 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "bf1cd964-4e3d-4107-93ca-3a6dab08e696",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of snapshots: 1001\n"
     ]
    }
   ],
   "source": [
    "# check how many snapshots we have\n",
    "print(f\"Number of snapshots: {len(write_times)}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "00d77ed6-8702-44ca-b367-9a48074c54f2",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[2025-08-15 11:42:23] INFO     Loading precomputed cell centers and volumes from processor0/constant\n",
      "[2025-08-15 11:42:23] INFO     Loading precomputed cell centers and volumes from processor1/constant\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Exporting batch 1 / 5\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[2025-08-15 11:42:23] INFO     Starting interpolation and export of field U.\n",
      "[2025-08-15 11:42:24] INFO     Writing HDF5 file for field U.\n",
      "[2025-08-15 11:42:24] INFO     Loading precomputed cell centers and volumes from processor0/constant\n",
      "[2025-08-15 11:42:24] INFO     Loading precomputed cell centers and volumes from processor1/constant\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Exporting batch 2 / 5\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[2025-08-15 11:42:25] INFO     Loading precomputed cell centers and volumes from processor0/constant\n",
      "[2025-08-15 11:42:25] INFO     Loading precomputed cell centers and volumes from processor1/constant\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Exporting batch 3 / 5\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[2025-08-15 11:42:25] INFO     Loading precomputed cell centers and volumes from processor0/constant\n",
      "[2025-08-15 11:42:25] INFO     Loading precomputed cell centers and volumes from processor1/constant\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Exporting batch 4 / 5\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[2025-08-15 11:42:26] INFO     Loading precomputed cell centers and volumes from processor0/constant\n",
      "[2025-08-15 11:42:26] INFO     Loading precomputed cell centers and volumes from processor1/constant\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Exporting batch 5 / 5\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[2025-08-15 11:42:27] INFO     Writing XDMF file for file cylinder2D_Re100.h5\n",
      "[2025-08-15 11:42:27] INFO     Finished export of field U in 4.153s.\n"
     ]
    }
   ],
   "source": [
    "# now we want to export the data for the last 500 snapshots of the velocity field in batches\n",
    "export = ExportData(s_cube)\n",
    "export.save_name = \"cylinder2D_Re100\"\n",
    "export.save_dir = save_path\n",
    "\n",
    "# batch_size = 1 would mean we export the data snapshot-by-snapshot. Since our data is very small we choose a larger batch size\n",
    "export_fields_snapshot_wise(load_path_cfd, export, \"U\", bounds, write_times[-500:], batch_size=100)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}