-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop quicksort earlier for faster indexing #29
Conversation
This is a great idea. So in the quick sort once it recurses to a value range <= The only comment I would make is for the case of |
@jbuckmccready sure, this was supposed to be complementary. |
@jbuckmccready let me know if you port this change over, curious to hear how much it improves performance for a real world case. |
I ported this change over. For my use case it's not really a measurable improvement as the other computations involved increase much more rapidly as the input size increases. However, in just benchmarking the indexing of the spatial index it is significant (e.g. 250 microseconds vs. 300 microseconds to index 4400 bounding boxes of line segments along a circle). Also |
@jbuckmccready interesting! |
Ah right because the sub sort buckets may not align with the nodes. I didn't catch it because the loss in query performance was tiny in the benchmark I was running due to difference in sorting - for that circle case it results in about a 1% increase in time to query (querying every value box + some delta) but a 12% decrease in time to index. I wonder if another sorting algorithm, e.g. a type of radix/bin sort, could be used to partially sort down to the nodeSize buckets more quickly. Since we're sorting integers and we know the bucket size ahead of time. |
@jbuckmccready I'll investigate radix sort! This implementation looks promising: https://erik.gorset.no/2011/04/radix-sort-is-faster-than-quicksort.html |
Instead of doing a full sort of the nodes, only sort them until we recurse down to values within a node (which don't have to be sorted), and stop there. Slightly improves performance of indexing, especially for higher
nodeSize
values. For the default (16
), it seems to improve indexing by ~5–8%. The only gotcha was that it changes the return order in kNN queries for equal-distance items (hence the test update), but this should be fine.@jbuckmccready your #28 PR got me the idea, check it out.