-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for distributed cholla datasets. #4702
base: main
Are you sure you want to change the base?
Conversation
you'll need to
Maybe that's a bug with
I think unit tests should be preferred whenever they suffice for a couple reasons:
That said, if what you need is some answer tests, go for it ! |
2806b17
to
4428058
Compare
from yt.geometry.api import Geometry | ||
from yt.geometry.grid_geometry_handler import GridIndex | ||
from yt.utilities.on_demand_imports import _h5py as h5py | ||
|
||
from .fields import ChollaFieldInfo | ||
|
||
|
||
def _split_fname_proc_suffix(filename: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you put a short note about how this is different from os.path.splitext
? Just to avoid future confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only minor stuff -- looks good otherwise
self.grid_left_edge[i] = left_frac | ||
self.grid_right_edge[i] = right_frac | ||
self.grid_dimensions[i] = dims_local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.grid_left_edge[i] = left_frac | |
self.grid_right_edge[i] = right_frac | |
self.grid_dimensions[i] = dims_local | |
self.grid_left_edge[i,:] = left_frac | |
self.grid_right_edge[i,:] = right_frac | |
self.grid_dimensions[i,:] = dims_local |
Just for clarity, could we make it obvious that it's setting a slice to the values?
def io_iter(self, chunks, fields): | ||
# this is loosely inspired by the implementation used for Enzo/Enzo-E | ||
# - those other options use the lower-level hdf5 interface. Unclear | ||
# whether that affords any advantages... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I think in the past it did because we avoided having to re-allocate temporary scratch space, but I am not sure that would hold up to current inquiries. I think the big advantage those have is tracking the groups within the iteration.
fh, filename = None, None | ||
for chunk in chunks: | ||
for obj in chunk.objs: | ||
if obj.filename is None: # unclear when this case arises... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likely it will not here, unless you manually construct virtual grids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, what is a virtual grid?
I realize this may be an involved answer - so if you could just point me to a frontend (or other area of the code) using virtual grids, I can probably investigate that on my own.
fh, filename = None, None | ||
for chunk in chunks: | ||
for obj in chunk.objs: | ||
if obj.filename is None: # unclear when this case arises... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likely it will not here, unless you manually construct virtual grids
My apologies for taking a while to follow up on this. I plan to circle back in the next week or so. |
PR Summary
This PR adds support for loading Cholla datasets that are distributed over multiple files. Previously, the frontend could only load Cholla datasets after they were concatenated into a single large dataset.
This functionality is currently a little inefficient right now - we need to read in every hdf5 file to figure out the mapping between spatial locations and locations on disk. This seems like something we can easily improve in the future (possibly by having Cholla write out an extra attribute how 3D locations are mapped into 1D).
PR Checklist
For this PR, I suspect that we will need to upload a new test dataset. I just had a few questions: