Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop quicksort earlier for faster indexing #29

Merged
merged 1 commit into from
Mar 30, 2020
Merged

Stop quicksort earlier for faster indexing #29

merged 1 commit into from
Mar 30, 2020

Conversation

mourner
Copy link
Owner

@mourner mourner commented Mar 30, 2020

Instead of doing a full sort of the nodes, only sort them until we recurse down to values within a node (which don't have to be sorted), and stop there. Slightly improves performance of indexing, especially for higher nodeSize values. For the default (16), it seems to improve indexing by ~5–8%. The only gotcha was that it changes the return order in kNN queries for equal-distance items (hence the test update), but this should be fine.

@jbuckmccready your #28 PR got me the idea, check it out.

@jbuckmccready
Copy link
Collaborator

This is a great idea. So in the quick sort once it recurses to a value range <= nodeSize it just stops since all of those values will be placed under a parent box, I like it!

The only comment I would make is for the case of numItems <= nodeSize it still allocates the array and computes all the Hilbert values when that step can be skipped, so I think including the change in #28 PR is still useful.

@mourner
Copy link
Owner Author

mourner commented Mar 30, 2020

so I think including the change in #28 PR is still useful.

@jbuckmccready sure, this was supposed to be complementary.

@mourner mourner merged commit afc6213 into master Mar 30, 2020
@mourner mourner deleted the partial-sort branch March 30, 2020 22:13
@mourner
Copy link
Owner Author

mourner commented Mar 30, 2020

@jbuckmccready let me know if you port this change over, curious to hear how much it improves performance for a real world case.

@jbuckmccready
Copy link
Collaborator

I ported this change over. For my use case it's not really a measurable improvement as the other computations involved increase much more rapidly as the input size increases.

However, in just benchmarking the indexing of the spatial index it is significant (e.g. 250 microseconds vs. 300 microseconds to index 4400 bounding boxes of line segments along a circle).

Also if (Math.floor(left / nodeSize) >= Math.floor(right / nodeSize)) return; can be changed to if (right - left < nodeSize) return;. In my c++ code that decreases the index time to 220 microseconds since integer division is expensive (relatively speaking).

@mourner
Copy link
Owner Author

mourner commented Mar 31, 2020

@jbuckmccready interesting! right - left < nodeSize isn't equivalent as far as I understand, because left and right may end up in different "nodes" even when the distance between them is small, e.g. left is at the end of the node and right is at the start of the next one.

@jbuckmccready
Copy link
Collaborator

Ah right because the sub sort buckets may not align with the nodes. I didn't catch it because the loss in query performance was tiny in the benchmark I was running due to difference in sorting - for that circle case it results in about a 1% increase in time to query (querying every value box + some delta) but a 12% decrease in time to index.

I wonder if another sorting algorithm, e.g. a type of radix/bin sort, could be used to partially sort down to the nodeSize buckets more quickly. Since we're sorting integers and we know the bucket size ahead of time.

@mourner
Copy link
Owner Author

mourner commented Mar 31, 2020

@jbuckmccready I'll investigate radix sort! This implementation looks promising: https://erik.gorset.no/2011/04/radix-sort-is-faster-than-quicksort.html

jbuckmccready referenced this pull request in FObermaier/NetTopologySuite.SandBox Apr 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants