Bin-free conditional abundance matching #888
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces a new bin-free algorithm for conditional abundance matching, as well as tutorials on how to use it. The way the algorithm works is as follows.
nwin
~200 observed galaxies bracketing this matching galaxy;this window defines Prob(< y_obs | x), which allows us to calculate the rank-order y_obs-percentile for each galaxy in the window.
nwin
model galaxies; this window defines Prob(< y_halo | x), which allows us to calculate the rank-order y_halo-percentile of our model galaxy,r_1
.r_1
, and map its y_obs-value onto our model galaxy.The implementation is based on a cython kernel, bin_free_cam_kernel.pyx. The simplest way to compute rank-order-percentiles is just by sorting the window. However, this is prohibitively expensive when done for every window around every galaxy. And so the cython kernel has been implemented so that the windows are only sorted once at the beginning, and as the windows slide along the arrays with increasing i, elements are popped in and popped out so preserve the sorted order. The rank-order-percentile can then be calculated via a binary search of the sorted window, which is also part of the cython kernel. Finally, in order to reduce discreteness effects, sub-grid noise can optionally be added: rather than painting
y_obs
onto the model galaxy, instead we can paint a random number drawn from the interval (y_obs[r_1-1], y_obs[r_1+1]). This is a recommended option that comes at no loss of fidelity because the PDF is not being resolved on scales equal to 1/nwin anyway.For model galaxy samples with ~1e6 elements, the CAM calculation takes ~500ms - 1s, depending on the size of the window.
CC @manodeep @duncandc @h-aung @yymao