-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally skip spatial bounds in read_parquet #203
Optionally skip spatial bounds in read_parquet #203
Conversation
Adds a new `gather_spatial_partitions` keyword to `read_parquet` to disable opening each file to get its spatial bounds. The name was chosen to mimic dask's `gather_statistics` keyword. Also adds a small docs section (I didn't see an easy way to insert a snippet in the docstring). Closes geopandas#194.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks nice! Thanks!
The |
OK, thanks. In that case, I think a For now I'll go with |
We talked about that a bit and |
This reverts commit 5efcc17.
OK, reverted to go back to |
@jorisvandenbossche or @martinfleis any chance you could merge this when you get a chance? And how hard are releases for dask-geopandas to do? We'll have a new dataset later this week / early next week that would benefit from this :) |
Hey, I'll have a look later tonight and we can even cut 0.2.0. We already talked about that last week with @jorisvandenbossche. |
I'll go ahead and merge this, then we should ideally get #205 in and then can cut 0.2.0. |
Adds a new
gather_spatial_partitions
keyword toread_parquet
to disable opening each file to get its spatial bounds. The name was
chosen to mimic dask's
gather_statistics
keyword.Also adds a small docs section (I didn't see an easy way to
insert a snippet in the docstring).
Closes #194.
One note of hesitation: I think Dask mid-transition for handling how it reads metadata. I'm wondering whether we should just rely on the behavior of dask's
gather_statistics
keyword. IIUC, both it and this newgather_spatial_partitions
control whether there's a per-file operation inread_parquet
.Maybe @jcrist or @rjzamora have a recommendation on whether adding a new keyword here is going against where Dask is headed.