You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using Dask GeoPandas to write a GeoDataFrame to Parquet format, the spatial partitions appear not to be persisted correctly. This issue is observed when storing GeoPandas data with spatial information using Dask and the Parquet format.
Expected Behavior
The spatial partitions of the GeoDataFrame should be correctly persisted in the resulting Parquet files. This means that the spatial properties of the GeoDataFrame, such as geometry information, should be preserved during the conversion process, making it much faster to query.
Steps to Reproduce
Create a GeoDataFrame with Dask GeoPandas.
Attempt to write the GeoDataFrame to Parquet format using Dask.
Read the Dask GeoDataFrame.
Observe that the resulting Parquet files do not seem to persist spatial partitions correctly.
Example Code
importdask_geopandasasdgimportgeopandasasgpd# Create a GeoDataFrame with Dask GeoPandassample_data=dg.from_geopandas(gpd.read_file('path/to/shapefile.shp'), npartitions=4)
sample_data=sample_data.spatial_shuffle(shuffle='tasks')
sample_data.spatial_partitions.explore() # visualize the spatial partitions here# Write the GeoDataFrame to Parquetsample_data.to_parquet('path/to/output', write_metadata_file=True)
sample_data_reloaded=dg.read_parquet('path/to/output', gather_spatial_partitions=True)
sample_data_reloaded.spatial_partitions# None
My end goal is to be able to query the data quickly and only grab the partitions that contain the bounds that I'm interested in, such as when using clip or cx
The text was updated successfully, but these errors were encountered:
Problem
When using Dask GeoPandas to write a GeoDataFrame to Parquet format, the spatial partitions appear not to be persisted correctly. This issue is observed when storing GeoPandas data with spatial information using Dask and the Parquet format.
Expected Behavior
The spatial partitions of the GeoDataFrame should be correctly persisted in the resulting Parquet files. This means that the spatial properties of the GeoDataFrame, such as geometry information, should be preserved during the conversion process, making it much faster to query.
Steps to Reproduce
Example Code
My end goal is to be able to query the data quickly and only grab the partitions that contain the bounds that I'm interested in, such as when using
clip
orcx
The text was updated successfully, but these errors were encountered: