-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GEOS C API is slow when doing millions of things #1735
Comments
That would be extremely useful. Not sure if it's helpful, but looking around, it seems Postgis and Shapely use |
@mathieu1 We're doing Our own PiP for simple polygons would be straightforward, but people start putting in Multi* and polygons with holes, etc. and they expect it all to just work. |
pnpoly does that kind of stuff, although it requires polygon parts to be
wrapped in sentinels.
https://wrf.ecse.rpi.edu//Research/Short_Notes/pnpoly.html
Den 28. nov. 2017 6.54 AM skrev "Howard Butler" <[email protected]>:
… @mathieu1 <https://github.com/mathieu1> We're doing PreparedGeometry
right now, even with the C API. The issue has been GEOS C API demands
immutability and forces a bunch of copies of things. The C++ API doesn't
have this restriction, but it has others 😄
Our own PiP for simple polygons would be straightforward, but people start
putting in Multi* and polygons with holes, etc. and they expect it all to
just work.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1735 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFWTrWPeh9dGexXRaau5VihhFVB84jbgks5s6vcLgaJpZM4Qr5UO>
.
|
Yep, that was the one I was looking at too. |
I think the algorithm referenced above is slow if the polygon is large. If you're going to process lots of tests for the same polygon, which is our typical use case, some preprocessing can improve things a lot, at the expense of using some memory. I may be mistaken. Also, this may be a nice place to try some GPU processing. |
I've implemented a version of this: http://ieeexplore.ieee.org/document/6694387/ This algorithm divides the polygon space up in a grid - there’s some heuristics used to determine the grid size that could probably be tweaked with some experimentation. Each polygon edge that intersects a grid cell is “attached” to that grid cell. When a PiP request is made for a point, we determine which cell the point is in. The first time such a request is made for a cell, it chooses a randomish reference point in the cell and then determines whether the reference point is in/out by checking from outside the polygon or some other reference point by doing the standard crossing counting. In our case, though, we only need to test intersections of edges that are in cells that are traversed traveling to our reference point, rather than all the edges. The reference point is then marked either inside or outside. Once the reference point is established for a cell, it’s done forever. To actually test the point, there are two cases: If there are 0 edges in the cell, then the classification (in/out) of ANY point in the cell is the same as the reference point and we return it. If there are some edges, we just do the standard crossing counting from the requested point to the reference point using only the edges that are “attached” to the cell, which means you may check a couple of edges instead of 100 or 1000 or whatever. I think that for most polygons we deal with, there is a lot of empty space in the interior of the polygon, so my guess is that most cells don’t contain any edges at all, which means a simple lookup. I think the implementation is a little better than the paper because 1) it picks reference points such there are no degenerate cases to be dealt with 2) it postpones creating reference points until a request is made. If you never do a PnP check for a cell, there’s no reason to actually compute a reference point at all. The paper always used cell centers as reference points, but there’s nothing special about cell centers. |
Is there a third option?
|
This is already complete. If you want this implemented in GEOS, you should open a ticket there, but the last time I was aware they wanted GEOS to strictly be a port from JTS, so they may not be interested in alternative implementations. |
Yes, understood this is a closed issue. I am a dev on the GEOS & JTS projects. There's two threads here, both of which I'm interested in following up on:
|
If I were considering integrating this algorithm, I don't think I'd care about the single point in a single polygon use case, because it's probably "fast enough" in that such an operation isn't probably much of your processing time anyway. The only real case where performance might be an issue is where you're doing an operation like:
I wouldn't think this would be a common use case, but you know that SOMEONE will do it and complain. That said, I haven't benchmarked this case because it's not a normal case for our processing. One last note: I've looked at the GEOS code and it tends to use generalized functions. For example, the pip code uses generic polygon crossing code. The method used in that code may be necessary to generate some information users may want WRT crossing, but some of the computation isn't necessary for pip processing and better math can be used if all you care about is pip. |
The original comment on this ticket was that GEOS is slow because the API requires Point objects, which require a few allocs. If that's a barrier to using GEOS then we're interested in improving that. (This will probably benefit PostGIS as well, so it's directly in our target space). As for algorithm performance, understood that this algorithm is only better for large batch PIP scenarios. JTS/GEOS has a similar capability (IndexedPointInAreaLocator). I was assuming that's what PDAL was trying to use (otherwise would be very slow...). Anyway, we'll be looking at improving GEOS API for performance sometime soon, and PIP is a prime candidate for this. Not sure if/when we'll be able to compare the two fast PIP approaches. |
We were using PreparedGeometry and pushing point objects across the C API. There's no IndexedPointInAreaLocator exposed in the GEOS C API unless it is through PreparedGeometry. I'm not sure on the particulars...
IIRC, every PiP call was making at least three heap allocations of a CoordinateSequence objects in the GEOS C API. When we looked at things in the profiler, that's all we saw once we tried to do a million of them 😄 GEOS devs are aggressively dismissive of use of the C++ API in GEOS. They flat deprecated to require obnoxious #define to enable it a few releases ago. That's fine, but the C API is a heap hog and doesn't provide any convenience for users to pass in their own data structures. Any redesign effort of the C API to cut down on the heap thrashing is going to be met with a ton of resistance – the rails are already ensconced. As the JTS developer lead, you certainly have the credibility to overrule the GEOS design choices, and I hope you can make some headway. |
Thanks for the clarification, @hobu. I will do what I can to make everyone happy :). |
Point-in-Polygon in
filters.crop
is rather slow because the GEOS C API demands that we heap allocate/deallocate a CoordinateSequence and a Point for each point. There are two possibilities to attack the issue:pdal::geos
stuff to use the C++ API, and that would make this effort quite a bit larger.The text was updated successfully, but these errors were encountered: