-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken pipe error in long-lived write-only workload #315
Comments
I have been facing this error as well when directly using gcsfs and writing files very slowly. The size of the files aren't big for my case (around 200~700 MB), but we were writing a few bytes every few second to the GCSFS api (which buffers internally), this also led to having the connection open for a long time, and we noticed the same failures after 10 hrs or so. I haven't been able to fix this specific issue, but resorted to using the google-storage-client and writing smaller files periodically. That seemed to survive longer, but had a different set of connection errors. |
In attempting to run many workloads that write continuously to zarr archives on Google Storage (GS) for many hours (6-12 hrs for each), I noticed that a substantial fraction of them, maybe as high as 20%, are failing with an error like this:
I could provide some more details on what I'm doing, but I'm not sure how to make this easily reproducible anyways. I would assume this is a common experience for anybody trying to write to GS using dask and zarr via gcsfs/fsspec for a long period of time.
Has this already been reported? Are the aiohttp utilities in use here common enough to other file system implementations that this is a bug better logged against fsspec instead? Or perhaps Zarr? I assume the underlying issue is unavoidable but that some library in this call stack ought to be more resilient to it.
The text was updated successfully, but these errors were encountered: