-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compute takes ages to produce the result. #314
Comments
Please post the code you have used, not only its description. |
def fill_holes(geometry, min_hole_size):
"""
Fill holes in a geometry (Polygon or MultiPolygon) if they are smaller than min_hole_size.
"""
if geometry.geom_type == 'Polygon':
if geometry.interiors:
new_interiors = [interior for interior in geometry.interiors if Polygon(interior).area >= min_hole_size]
return Polygon(geometry.exterior, new_interiors)
else:
return geometry
elif geometry.geom_type == 'MultiPolygon':
return unary_union([fill_holes(poly, min_hole_size) for poly in geometry])
else:
return geometry
# Apply fill_holes function in parallel
filled = ddf.map_partitions(lambda ddf: ddf.geometry.apply(lambda geom: fill_holes(geom, min_hole_size)))
filled_ser=filled.compute() |
The reason is probably Gil contention, could you try creating an explicit cluster for Dask? i.e.
That will use all cores in parallel and you can use the printed link to observe what's going on |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have a Dask GeoDataFrame, from which I extracted the geometry and performed infill using Shapely. I used geometry.interiors to set an area threshold and fill the holes. After that, I created a new geometry DataFrame. However, I don’t understand why it takes so long when I try to convert the Dask GeoSeries into a GeoSeries. Whenever I use the .compute() command, it takes ages—more than 12 hours. I thought something might be wrong with my approach.
The text was updated successfully, but these errors were encountered: