Stop quicksort earlier for faster indexing #29

mourner · 2020-03-30T13:21:27Z

Instead of doing a full sort of the nodes, only sort them until we recurse down to values within a node (which don't have to be sorted), and stop there. Slightly improves performance of indexing, especially for higher nodeSize values. For the default (16), it seems to improve indexing by ~5–8%. The only gotcha was that it changes the return order in kNN queries for equal-distance items (hence the test update), but this should be fine.

@jbuckmccready your #28 PR got me the idea, check it out.

jbuckmccready · 2020-03-30T20:55:07Z

This is a great idea. So in the quick sort once it recurses to a value range <= nodeSize it just stops since all of those values will be placed under a parent box, I like it!

The only comment I would make is for the case of numItems <= nodeSize it still allocates the array and computes all the Hilbert values when that step can be skipped, so I think including the change in #28 PR is still useful.

mourner · 2020-03-30T22:13:23Z

so I think including the change in #28 PR is still useful.

@jbuckmccready sure, this was supposed to be complementary.

mourner · 2020-03-30T22:14:59Z

@jbuckmccready let me know if you port this change over, curious to hear how much it improves performance for a real world case.

jbuckmccready · 2020-03-31T01:33:02Z

I ported this change over. For my use case it's not really a measurable improvement as the other computations involved increase much more rapidly as the input size increases.

However, in just benchmarking the indexing of the spatial index it is significant (e.g. 250 microseconds vs. 300 microseconds to index 4400 bounding boxes of line segments along a circle).

Also if (Math.floor(left / nodeSize) >= Math.floor(right / nodeSize)) return; can be changed to if (right - left < nodeSize) return;. In my c++ code that decreases the index time to 220 microseconds since integer division is expensive (relatively speaking).

mourner · 2020-03-31T08:16:21Z

@jbuckmccready interesting! right - left < nodeSize isn't equivalent as far as I understand, because left and right may end up in different "nodes" even when the distance between them is small, e.g. left is at the end of the node and right is at the start of the next one.

jbuckmccready · 2020-03-31T20:35:35Z

Ah right because the sub sort buckets may not align with the nodes. I didn't catch it because the loss in query performance was tiny in the benchmark I was running due to difference in sorting - for that circle case it results in about a 1% increase in time to query (querying every value box + some delta) but a 12% decrease in time to index.

I wonder if another sorting algorithm, e.g. a type of radix/bin sort, could be used to partially sort down to the nodeSize buckets more quickly. Since we're sorting integers and we know the bucket size ahead of time.

mourner · 2020-03-31T22:51:43Z

@jbuckmccready I'll investigate radix sort! This implementation looks promising: https://erik.gorset.no/2011/04/radix-sort-is-faster-than-quicksort.html

stop quicksort earlier for faster indexing

c018075

mourner merged commit afc6213 into master Mar 30, 2020

mourner deleted the partial-sort branch March 30, 2020 22:13

jbuckmccready referenced this pull request in FObermaier/NetTopologySuite.SandBox Apr 22, 2020

Add reference to adaptation source.

81b0d28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop quicksort earlier for faster indexing #29

Stop quicksort earlier for faster indexing #29

mourner commented Mar 30, 2020

jbuckmccready commented Mar 30, 2020

mourner commented Mar 30, 2020

mourner commented Mar 30, 2020

jbuckmccready commented Mar 31, 2020

mourner commented Mar 31, 2020

jbuckmccready commented Mar 31, 2020

mourner commented Mar 31, 2020

Stop quicksort earlier for faster indexing #29

Stop quicksort earlier for faster indexing #29

Conversation

mourner commented Mar 30, 2020

jbuckmccready commented Mar 30, 2020

mourner commented Mar 30, 2020

mourner commented Mar 30, 2020

jbuckmccready commented Mar 31, 2020

mourner commented Mar 31, 2020

jbuckmccready commented Mar 31, 2020

mourner commented Mar 31, 2020