-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster implementation for get_points_in_sphere() using ogrid #196
base: main
Are you sure you want to change the base?
Conversation
When profiling the performance of diffraction pattern simulation, most time was spent in this function. This implementation runs more than three times faster since it reduces the calculation effort through broadcasting and re-uses coordinate and distance calculations. Instead of using diffpy, it performs the equivalent transformation to cartesian and calculation of the norm directly to allow these optimizations.
Thank you for this improvement, @uellue! The changes look good and the tests pass, so I'm happy to merge this PR. Is it important for you to have this released soon? We don't have a release planned in the near future, as far as I'm aware, as the only current unreleased changes are maintenance updates. I plan to add some functionality to downstream kikuchipy in the next months, though. If I find anything to add or update to diffsims, I'll probably make a diffsims v0.6.0 release in connection with that. A thought regarding speed of computations. While it's convenient to use class methods from diffpy.structure, they are not optimized for larger arrays, as you show here. If we replace more of these calls with our own computations, we could consider dropping diffpy.structure as a dependency. |
Releasing can wait until it is convenient, no worries! About replacing diffpy, their implementations didn't look so bad. This method was just a particular case because it still is THE hotspot of the entire simulation, intermediate values can be used for the return values instead of being recalculated, and the structure of the problem was perfect for broadcasting. The implementation here is pretty much equivalent to inlining the diffpy implementation. I also tried a Numba, but that didn't help. I guess that appending to a list of selected reflections depending on the norm is hard to vectorize for the compiler, so creating and then filtering the large intermediate arrays works well, too, in particular if they fit into the CPU caches. Broadcasting allows to only perform a minimum of operations on the full-size arrays. If one wanted to speed this up further, I'd probably look into the structure of the whole simulation. I was wondering, shouldn't be the set of points in the sphere be the same for each rotation of the lattice? It feels so symmetric... In that case one could calculate a base version, select once, and then generate the rotated versions by rotating the selected part of the base? This sounds like a perfect job for GPUs, by the way. |
...resp. for GEMM on whatever platform -- should perform very well since it is a straight dot product with a rotation matrix. |
Is there even a good reason that this is slow? I've been looking at this and I think @uellue is correct. If you actually look at the code: First the orientations are iterated through diffsims/diffsims/generators/library_generator.py Lines 119 to 128 in fae5ca5
Then the function diffsims/diffsims/generators/diffraction_generator.py Lines 243 to 259 in fae5ca5
and then it is rotated.... So we can just call |
@CSSFrancis, regardless, should we merge this and move on? |
@hakonanes The It might be worth looking at the |
When profiling the performance of diffraction pattern simulation, most time was spent in this function.
This implementation runs more than three times faster since it reduces the calculation effort through broadcasting and re-uses coordinate and distance calculations.
Instead of using diffpy, it performs the equivalent transformation to cartesian and calculation of the norm directly to allow these optimizations.
Description of the change
Equivalent calculation using
numpy.ogrid()
and broadcasting to calculate hkl map, cartesian vectors, norm and selection to radius.Progress of the PR
Code style is dead link:
Minimal example of the bug fix or new feature
For reviewers
__init__.py
.unreleased section in
CHANGELOG.rst
.credits
indiffsims/release_info.py
andin
.zenodo.json
.