Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear interpolators updates: refactor and new functionality #4710

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

chrishavlin
Copy link
Contributor

I started using the linear interpolators yesterday and for my purposes it'd be a tab bit easier to re-use the interpolator objects but swap out different table data, so this PR adds that functionality. To do that though, it was easier to refactor the interpolators to have a common base class (and cut out some repetitive code structures in the process). Everything here should be fully backwards compatible. Also happy to split this PR to pull out the refactor from the new functionality (but the new functionality is pretty simple, most of the PR is the refactor).

@chrishavlin chrishavlin added enhancement Making something better refactor improve readability, maintainability, modularity labels Oct 20, 2023
@chrishavlin
Copy link
Contributor Author

chrishavlin commented Oct 20, 2023

Note that the test for the 4d interpolator is kinda slow cause it uses quite a big array, (random_data = np.random.random((64, 64, 64, 64))). I'm inclined to drop that down in size to speed it up, I can do that after an initial review (and after the tests pass as-is).

@matthewturk
Copy link
Member

Hmm, how slow is it? Is it a lot slower than doing a 3D interpolator on (256, 256, 256) data?

@chrishavlin
Copy link
Contributor Author

Will check if it's slower than the 3d case, but 4d with (64,64,64,64) took maybe 4-6 seconds if I remember correctly

@chrishavlin
Copy link
Contributor Author

@matthewturk so the 3D and 4D scale at the same rate with the total array size:
lin_interp_scaling

code here https://gist.github.com/chrishavlin/9370e2a4a1895745a40c46f39d6c44f4

The full test time (pytest --durations=10 yt/utilities/tests/test_interpolators.py):

3.20s call     yt/utilities/tests/test_interpolators.py::test_linear_interpolator_4d
0.31s call     yt/utilities/tests/test_interpolators.py::test_ghost_zone_extrapolation
0.25s call     yt/utilities/tests/test_interpolators.py::test_get_vertex_centered_data
0.05s setup    yt/utilities/tests/test_interpolators.py::test_linear_interpolator_1d
0.04s call     yt/utilities/tests/test_interpolators.py::test_linear_interpolator_3d

So only 3ish seconds, not as bad as I remember but 2 orders of magnitude longer than the other interpolator tests (and I don't see a particular reason to test with N=64 vs say N=16?).

matthewturk
matthewturk previously approved these changes Nov 2, 2023
Copy link
Member

@matthewturk matthewturk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as long as the tests pass, and the API is consistent/unbroken if not unchanged, this is good.

Copy link
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks sound. I left a couple minor questions and suggestions.

yt/utilities/tests/test_interpolators.py Outdated Show resolved Hide resolved
yt/utilities/tests/test_interpolators.py Outdated Show resolved Hide resolved
yt/utilities/tests/test_interpolators.py Outdated Show resolved Hide resolved
yt/utilities/tests/test_interpolators.py Outdated Show resolved Hide resolved
yt/utilities/tests/test_interpolators.py Outdated Show resolved Hide resolved
yt/utilities/linear_interpolators.py Outdated Show resolved Hide resolved
yt/utilities/linear_interpolators.py Outdated Show resolved Hide resolved
yt/utilities/linear_interpolators.py Outdated Show resolved Hide resolved
yt/utilities/linear_interpolators.py Outdated Show resolved Hide resolved
def __init__(self, table, boundaries, field_names, truncate=False):
class _LinearInterpolator(abc.ABC):
_ndim: int
_dim_i_type = "int32"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, I'm sure this is just a your refactor making this more obvious, but do you have any idea why we are not using the same dtype in all interpolators ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know :) I meant to look more closely, maybe there's some reason at the cython level? Or...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, looked back and the superficial reason is that the cython functions for the different interpolators have different int declarations in their call signature: UnilinearlyInterpolate and BilinearlyInterpolate are declared as int32_t, e.g., :

def BilinearlyInterpolate(np.ndarray[np.float64_t, ndim=2] table,
                          np.ndarray[np.float64_t, ndim=1] x_vals,
                          np.ndarray[np.float64_t, ndim=1] y_vals,
                          np.ndarray[np.float64_t, ndim=1] x_bins,
                          np.ndarray[np.float64_t, ndim=1] y_bins,
                          np.ndarray[np.int32_t, ndim=1] x_is,
                          np.ndarray[np.int32_t, ndim=1] y_is,
                          np.ndarray[np.float64_t, ndim=1] output):

while TrilinearlyInterpolate and QuadrilinearlyInterpolate use int_t:

def TrilinearlyInterpolate(np.ndarray[np.float64_t, ndim=3] table,
                           np.ndarray[np.float64_t, ndim=1] x_vals,
                           np.ndarray[np.float64_t, ndim=1] y_vals,
                           np.ndarray[np.float64_t, ndim=1] z_vals,
                           np.ndarray[np.float64_t, ndim=1] x_bins,
                           np.ndarray[np.float64_t, ndim=1] y_bins,
                           np.ndarray[np.float64_t, ndim=1] z_bins,
                           np.ndarray[np.int_t, ndim=1] x_is,
                           np.ndarray[np.int_t, ndim=1] y_is,
                           np.ndarray[np.int_t, ndim=1] z_is,
                           np.ndarray[np.float64_t, ndim=1] output):

As to why they differ, I'm really not sure...

I could try switching UnilinearlyInterpolate and BilinearlyInterpolate over to use int_t as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could try switching UnilinearlyInterpolate and BilinearlyInterpolate over to use int_t as well?

Now might be a good time to try indeed, but it's not critical !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just switched all the interpolators to use int_t

@chrishavlin
Copy link
Contributor Author

oh, one more push coming... i'm going to go through all the other tests and add make the booleans explicit in the interpolator calls.

@chrishavlin
Copy link
Contributor Author

Ok, I just pushed up a change that adds keywords to all the other test calls and I did drop the resolution in the 4d test case to (32, 32, 32, 32) (rather than (64, 64, 64, 64)).

Pretty sure that the only remaining question is that of the differing types for the index arrays inputs to the cython interpolators #4710 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Making something better refactor improve readability, maintainability, modularity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants