Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mesh Timeseries Output #53

Merged
merged 3 commits into from
Jun 17, 2024
Merged

Mesh Timeseries Output #53

merged 3 commits into from
Jun 17, 2024

Conversation

thwllms
Copy link
Contributor

@thwllms thwllms commented Jun 11, 2024

Notes

  • DataArray is returned if reading a single timeseries output variable, e.g., plan_hdf.mesh_timeseries_output("MeshName", "Water Surface")
    • Dataset is returned if reading out multiple timeseries output variables, e.g., plan_hdf.mesh_timeseries_output_cells("MeshName"), plan_hdf.mesh_timeseries_output_faces("MeshName")
    • Note that output variables based on mesh cells are incompatible with outputs based on mesh faces, since they are based on different array indices
  • If dask is installed, DataArray/Dataset objects are based on Dask arrays. If not, default to numpy arrays.
    • When an array is read from the HDF, rashdf will set the dask array chunk size to match the HDF chunk size. I don't know if it's optimal to let the user set the dask array chunk size upfront.
  • I looked into Xvec a little bit to see if we could export geometry-aware Datasets / DataArrays. Seems possible, but with only two releases Xvec doesn't seem very mature yet. Importantly, doesn't seem like Xvec Datasets / DataArrays could be easily saved to disk.

Examples

Read cell water surface output as DataArray

>>> from rashdf import RasPlanHdf, TimeSeriesOutputVar
>>> plan_hdf.mesh_area_names()
['BaldEagleCr', 'Upper 2D Area']
>>> # read out a DataArray for a single timeseries output variable
>>> ws = plan_hdf.mesh_timeseries_output("BaldEagleCr", TimeSeriesOutputVar.WATER_SURFACE)
>>> ws  # note below that this is a dask-based DataArray, since dask was installed
<xarray.DataArray 'BaldEagleCr - Water Surface' (time: 37, cell_id: 3359)> Size: 497kB
dask.array<getitem, shape=(37, 3359), dtype=float32, chunksize=(12, 3359), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 296B 1999-01-01T12:00:00 ... 1999-01-04T12...
  * cell_id  (cell_id) int64 27kB 0 1 2 3 4 5 ... 3353 3354 3355 3356 3357 3358
Attributes:
    mesh_name:  BaldEagleCr
    variable:   Water Surface
    units:      ft
>>> # select the output for a single cell
>>> ws.sel(cell_id=123)
<xarray.DataArray 'BaldEagleCr - Water Surface' (time: 37)> Size: 148B
dask.array<getitem, shape=(37,), dtype=float32, chunksize=(12,), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 296B 1999-01-01T12:00:00 ... 1999-01-04T12...
    cell_id  int64 8B 123
Attributes:
    mesh_name:  BaldEagleCr
    variable:   Water Surface
    units:      ft
>>> # return the values for a single cell
>>> ws.sel(cell_id=123).values
array([537.2959 , 537.2959 , 537.2959 , 537.2959 , 537.2959 , 537.2959 ,
       537.2959 , 537.2959 , 537.2959 , 537.2959 , 537.2959 , 537.2959 ,
       537.2959 , 537.2959 , 537.2959 , 537.2959 , 537.2959 , 537.2959 ,
       537.2959 , 537.2959 , 537.2959 , 537.39996, 543.78345, 546.0193 ,
       547.2876 , 548.71246, 549.9208 , 550.59985, 550.90826, 550.9861 ,
       550.92456, 550.7785 , 550.5764 , 550.3424 , 550.0916 , 549.85077,
       549.6207 ], dtype=float32)
>>> # check out the time dimension
>>> ws.time
<xarray.DataArray 'time' (time: 37)> Size: 296B
array(['1999-01-01T12:00:00.000000000', '1999-01-01T14:00:00.000000000',
       '1999-01-01T16:00:00.000000000', '1999-01-01T18:00:00.000000000',
       '1999-01-01T20:00:00.000000000', '1999-01-01T22:00:00.000000000',
       '1999-01-02T00:00:00.000000000', '1999-01-02T02:00:00.000000000',
       '1999-01-02T04:00:00.000000000', '1999-01-02T06:00:00.000000000',
       '1999-01-02T08:00:00.000000000', '1999-01-02T10:00:00.000000000',
       '1999-01-02T12:00:00.000000000', '1999-01-02T14:00:00.000000000',
       '1999-01-02T16:00:00.000000000', '1999-01-02T18:00:00.000000000',
       '1999-01-02T20:00:00.000000000', '1999-01-02T22:00:00.000000000',
       '1999-01-03T00:00:00.000000000', '1999-01-03T02:00:00.000000000',
       '1999-01-03T04:00:00.000000000', '1999-01-03T06:00:00.000000000',
       '1999-01-03T08:00:00.000000000', '1999-01-03T10:00:00.000000000',
       '1999-01-03T12:00:00.000000000', '1999-01-03T14:00:00.000000000',
       '1999-01-03T16:00:00.000000000', '1999-01-03T18:00:00.000000000',
       '1999-01-03T20:00:00.000000000', '1999-01-03T22:00:00.000000000',
       '1999-01-04T00:00:00.000000000', '1999-01-04T02:00:00.000000000',
       '1999-01-04T04:00:00.000000000', '1999-01-04T06:00:00.000000000',
       '1999-01-04T08:00:00.000000000', '1999-01-04T10:00:00.000000000',
       '1999-01-04T12:00:00.000000000'], dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 296B 1999-01-01T12:00:00 ... 1999-01-04T12...

Read all available cell-based timeseries output as DataArray

>>> # Read all cell timeseries outputs as Dataset
>>> plan_hdf = RasPlanHdf.open_uri("s3://kanawha-pilot/FFRD_Kanawha_Compute/runs/13/ras/ElkMiddle/ElkMiddle.p01.hdf")
>>> ds = plan_hdf.mesh_timeseries_output_cells("ElkMiddle")
>>> ds
<xarray.Dataset> Size: 66MB
Dimensions:                              (time: 577, cell_id: 14188)
Coordinates:
  * time                                 (time) datetime64[ns] 5kB 1996-01-14...
  * cell_id                              (cell_id) int64 114kB 0 1 ... 14187
Data variables:
    Water Surface                        (time, cell_id) float32 33MB dask.array<chunksize=(3, 14188), meta=np.ndarray>
    Cell Cumulative Precipitation Depth  (time, cell_id) float32 33MB dask.array<chunksize=(3, 14188), meta=np.ndarray>
Attributes:
    mesh_name:  ElkMiddle
>>> # get a DataArray for a single output variable
>>> ws = ds["Water Surface"]
>>> ws
<xarray.DataArray 'Water Surface' (time: 577, cell_id: 14188)> Size: 33MB
dask.array<getitem, shape=(577, 14188), dtype=float32, chunksize=(3, 14188), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 5kB 1996-01-14T12:00:00 ... 1996-02-07T12:...
  * cell_id  (cell_id) int64 114kB 0 1 2 3 4 5 ... 14183 14184 14185 14186 14187
Attributes:
    mesh_name:  ElkMiddle
    variable:   Water Surface
    units:      ft
>>> # select cells from the DataArray
>>> ws.sel(cell_id=slice(100, 105))
<xarray.DataArray 'Water Surface' (time: 577, cell_id: 6)> Size: 14kB
dask.array<getitem, shape=(577, 6), dtype=float32, chunksize=(3, 6), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 5kB 1996-01-14T12:00:00 ... 1996-02-07T12:...
  * cell_id  (cell_id) int64 48B 100 101 102 103 104 105
Attributes:
    mesh_name:  ElkMiddle
    variable:   Water Surface
    units:      ft

Read all available mesh face timeseries outputs as Dataset

>>> plan_hdf = RasPlanHdf("/mnt/c/temp/Example_Projects_6_5/Example_Projects/2D Unsteady Flow Hydraulics/BaldEagleCrkMulti2D/BaldEagleDamBrk.p18.hdf")
>>> plan_hdf.mesh_timeseries_output_faces("Upper 2D Area")
<xarray.Dataset> Size: 4MB
Dimensions:                      (time: 37, face_id: 2286)
Coordinates:
  * time                         (time) datetime64[ns] 296B 1999-01-01T12:00:...
  * face_id                      (face_id) int64 18kB 0 1 2 3 ... 2283 2284 2285
Data variables:
    Face Courant                 (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Cumulative Volume       (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Eddy Viscosity          (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Flow                    (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Flow Period Average     (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Friction Term           (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Pressure Gradient Term  (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Shear Stress            (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Tangential Velocity     (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Velocity                (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Water Surface           (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
Attributes:
    mesh_name:  Upper 2D Area

@thwllms thwllms requested review from slawler, zherbz and sray014 June 11, 2024 16:23
This was linked to issues Jun 11, 2024
Copy link
Contributor

@sray014 sray014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a question about the testing of the dataset creation but I dont think its a big deal.

@thwllms thwllms mentioned this pull request Jun 13, 2024
@zherbz
Copy link
Contributor

zherbz commented Jun 13, 2024

Just one comment on the mentioned topic of saving DataArrays to disc. I recently ran into this issue dealing with large raster datasets and was able to get by using rioxarray: https://corteva.github.io/rioxarray/html/rioxarray.html#rioxarray.raster_dataset.RasterDataset.to_raster.

Using the windowed parameter of the .to_raster() method writes the xarray DataArray to disc in the same size as it was chunked when first being read.

@thwllms
Copy link
Contributor Author

thwllms commented Jun 13, 2024

Just one comment on the mentioned topic of saving DataArrays to disc. I recently ran into this issue dealing with large raster datasets and was able to get by using rioxarray: https://corteva.github.io/rioxarray/html/rioxarray.html#rioxarray.raster_dataset.RasterDataset.to_raster.

Using the windowed parameter of the .to_raster() method writes the xarray DataArray to disc in the same size as it was chunked when first being read.

Gotcha. I've used rioxarray as well. Since this isn't raster data I think we have to write out to Zarr or another format. I hope that Xvec continues to be developed since their data model seems like it would be ideal for our use case (i.e., vector GIS data with multidimensional values)

Base automatically changed from feature/summary-output to main June 17, 2024 15:34
@thwllms thwllms changed the title WIP / Preview: Mesh Timeseries Output Mesh Timeseries Output Jun 17, 2024
Copy link

codecov bot commented Jun 17, 2024

Codecov Report

Attention: Patch coverage is 97.82609% with 2 lines in your changes missing coverage. Please review.

Files Coverage Δ
src/rashdf/utils.py 90.62% <100.00%> (+0.51%) ⬆️
src/rashdf/plan.py 98.85% <97.70%> (-0.59%) ⬇️

@thwllms thwllms merged commit af72a1a into main Jun 17, 2024
5 checks passed
@thwllms thwllms deleted the feature/mesh-timeseries-output branch June 17, 2024 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Timeseries Output: Face Velocity Timeseries Output: Water Surface
3 participants