Faster implementation for get_points_in_sphere() using ogrid #196

uellue · 2023-09-22T09:42:52Z

When profiling the performance of diffraction pattern simulation, most time was spent in this function.

This implementation runs more than three times faster since it reduces the calculation effort through broadcasting and re-uses coordinate and distance calculations.

Instead of using diffpy, it performs the equivalent transformation to cartesian and calculation of the norm directly to allow these optimizations.

Description of the change

Equivalent calculation using numpy.ogrid() and broadcasting to calculate hkl map, cartesian vectors, norm and selection to radius.

Progress of the PR

Docstrings for all functions
Unit tests with pytest for all lines
- Change should not change coverage
Clean code style by running black

Code style is dead link:

Minimal example of the bug fix or new feature

For reviewers

The PR title is short, concise, and will make sense 1 year later.
New functions are imported in corresponding __init__.py.
New features, API changes, and deprecations are mentioned in the
unreleased section in CHANGELOG.rst.
Contributor(s) are listed correctly in credits in diffsims/release_info.py and
in .zenodo.json.

When profiling the performance of diffraction pattern simulation, most time was spent in this function. This implementation runs more than three times faster since it reduces the calculation effort through broadcasting and re-uses coordinate and distance calculations. Instead of using diffpy, it performs the equivalent transformation to cartesian and calculation of the norm directly to allow these optimizations.

uellue · 2023-09-22T11:11:31Z

The previous tests were run with parameters that turned out unusual, i.e. way too many reflections. Here's a more typical result, twice as fast for the whole simulation:

hakonanes · 2023-09-23T09:42:33Z

Thank you for this improvement, @uellue! The changes look good and the tests pass, so I'm happy to merge this PR.

Is it important for you to have this released soon? We don't have a release planned in the near future, as far as I'm aware, as the only current unreleased changes are maintenance updates. I plan to add some functionality to downstream kikuchipy in the next months, though. If I find anything to add or update to diffsims, I'll probably make a diffsims v0.6.0 release in connection with that.

A thought regarding speed of computations. While it's convenient to use class methods from diffpy.structure, they are not optimized for larger arrays, as you show here. If we replace more of these calls with our own computations, we could consider dropping diffpy.structure as a dependency.

uellue · 2023-09-25T07:23:38Z

Releasing can wait until it is convenient, no worries!

About replacing diffpy, their implementations didn't look so bad. This method was just a particular case because it still is THE hotspot of the entire simulation, intermediate values can be used for the return values instead of being recalculated, and the structure of the problem was perfect for broadcasting. The implementation here is pretty much equivalent to inlining the diffpy implementation. I also tried a Numba, but that didn't help. I guess that appending to a list of selected reflections depending on the norm is hard to vectorize for the compiler, so creating and then filtering the large intermediate arrays works well, too, in particular if they fit into the CPU caches. Broadcasting allows to only perform a minimum of operations on the full-size arrays.

If one wanted to speed this up further, I'd probably look into the structure of the whole simulation. I was wondering, shouldn't be the set of points in the sphere be the same for each rotation of the lattice? It feels so symmetric... In that case one could calculate a base version, select once, and then generate the rotated versions by rotating the selected part of the base? This sounds like a perfect job for GPUs, by the way.

uellue · 2023-09-25T07:28:46Z

This sounds like a perfect job for GPUs, by the way.

...resp. for GEMM on whatever platform -- should perform very well since it is a straight dot product with a rotation matrix.

CSSFrancis · 2023-10-17T18:53:09Z

If one wanted to speed this up further, I'd probably look into the structure of the whole simulation. I was wondering, shouldn't be the set of points in the sphere be the same for each rotation of the lattice? It feels so symmetric... In that case one could calculate a base version, select once, and then generate the rotated versions by rotating the selected part of the base? This sounds like a perfect job for GPUs, by the way.

Is there even a good reason that this is slow? I've been looking at this and I think @uellue is correct.

If you actually look at the code:

First the orientations are iterated through

diffsims/diffsims/generators/library_generator.py

Lines 119 to 128 in fae5ca5

    
           for i, orientation in enumerate(tqdm(orientations, leave=False)): 
        
               simulation = diffractor.calculate_ed_data( 
        
                   structure, 
        
                   reciprocal_radius, 
        
                   rotation=orientation, 
        
                   with_direct_beam=with_direct_beam, 
        
                   max_excitation_error=max_excitation_error, 
        
                   shape_factor_width=shape_factor_width, 
        
                   debye_waller_factors=debye_waller_factors, 
        
               )

Then the function get_points_in_sphere is called a bunch of times

diffsims/diffsims/generators/diffraction_generator.py

Lines 243 to 259 in fae5ca5

    
           recip_latt = latt.reciprocal() 
        
           g_indices, cartesian_coordinates, g_hkls = get_points_in_sphere( 
        
               recip_latt, reciprocal_radius 
        
           ) 
        
           ai, aj, ak = ( 
        
               np.deg2rad(rotation[0]), 
        
               np.deg2rad(rotation[1]), 
        
               np.deg2rad(rotation[2]), 
        
           ) 
        
           R = euler2mat(ai, aj, ak, axes="rzxz") 
        
           cartesian_coordinates = np.matmul(R, cartesian_coordinates.T).T 
        
           # Identify the excitation errors of candidate points 
        
           r_sphere = 1 / wavelength 
        
           r_spot = np.sqrt(np.sum(np.square(cartesian_coordinates[:, :2]), axis=1)) 
        
           z_spot = cartesian_coordinates[:, 2]

and then it is rotated....

So we can just call get_points_in_sphere once? ....

hakonanes · 2023-10-25T15:19:16Z

@CSSFrancis, regardless, should we merge this and move on?

CSSFrancis · 2023-10-25T16:00:19Z

@hakonanes The get_points_in_sphere method is going to be removed and the bigger problem is related to the fact that in this method there are repeated calls to get_points_in_sphere when there don't need to be.

It might be worth looking at the ReciporicalLatticeVector.from_min_dspacing method to see if there are changes there that could be sped up.

uellue added 4 commits September 22, 2023 11:30

fix whitespace

2505bc9

Add to credits, Zenodo

c783f6f

Remove unused import

f072acc

hakonanes added the enhancement New feature, request, or improvement label Sep 23, 2023

hakonanes added this to the v0.6.0 milestone Sep 23, 2023

hakonanes removed this from the v0.6.0 milestone Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster implementation for get_points_in_sphere() using ogrid #196

Faster implementation for get_points_in_sphere() using ogrid #196

uellue commented Sep 22, 2023

uellue commented Sep 22, 2023

hakonanes commented Sep 23, 2023

uellue commented Sep 25, 2023

uellue commented Sep 25, 2023

CSSFrancis commented Oct 17, 2023 •

edited

Loading

hakonanes commented Oct 25, 2023

CSSFrancis commented Oct 25, 2023

Faster implementation for get_points_in_sphere() using ogrid #196

Are you sure you want to change the base?

Faster implementation for get_points_in_sphere() using ogrid #196

Conversation

uellue commented Sep 22, 2023

Description of the change

Progress of the PR

Minimal example of the bug fix or new feature

For reviewers

uellue commented Sep 22, 2023

hakonanes commented Sep 23, 2023

uellue commented Sep 25, 2023

uellue commented Sep 25, 2023

CSSFrancis commented Oct 17, 2023 • edited Loading

hakonanes commented Oct 25, 2023

CSSFrancis commented Oct 25, 2023

CSSFrancis commented Oct 17, 2023 •

edited

Loading