-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration test suite before release #163
Comments
cc @rabernat, who would like to see some similar workflows at large scales in the context of pangeo-forge. |
It might make sense to also consider other downstream projects like spatialpandas and dask-geopandas to help catch issues like holoviz/spatialpandas#68 and geopandas/dask-geopandas#49. BTW, I really appreciate seeing the assistance those projects received to help address those issues, really cool to see that kind of community support. Big thanks to everybody contributing to and supporting this awesome ecosystem. |
It'd be interesting to think about how you pass a test suite like that. For instance, is performance a part of it? It would be very interesting to publish benchmarks with each release. It seems less common that a release actually breaks the read 10_000 files from parquet case, and more common that it introduces a performance regression. |
Would also suggest adding some Dask projects to this list like Dask-ML, Dask-Image, etc. At least with Dask-ML we have seen a couple breakages recently that probably could have been avoided with integration testing. |
Might even just be worthwhile to do runs of these tests every 24hrs or so. This can help identify issues a bit sooner than a release giving people more time to fix and update. Numba did some work in this space that we might be able to borrow from: texasbbq Also having nightlies ( #76 ) would help smooth out the integration testing process and aid in local reproducibility |
It looks like @jrbourbeau started getting dask set up with texasbbq a few years back :) https://github.com/jrbourbeau/dask-integration-testing |
A while ago, I requested a bunch of projects downstream of xarray to run their test suite regularly against xarray HEAD. It has really helped catch issues before release. Perhaps a bunch of downstream projects can do the same with dask HEAD. Here's the current xarray workflow: https://github.com/pydata/xarray/blob/main/.github/workflows/upstream-dev-ci.yaml It's really nice! It even opens an issue when tests fail with a nice summary. |
This raises another good point. Maybe it is worth just adding some jobs to the |
Yeah there are definitely ways to raise issues from GitHub Actions. I wonder where a good place to open the issue would be? For projects like |
Even if they are raised on the projects themselves that could also be useful. Basically just thinking of how we make the CI failure more visible. Red Xs can easily be missed |
We have already copied the xarray upstream infrastructure on dask/dask. There is an upstream action that runs every night and raises an issue with any failures. Here's the yaml for that https://github.com/dask/dask/blob/main/.github/workflows/upstream.yml |
We currently run a test suite on every commit. These tests are designed to be focused and fast.
However, when we release we may want to consider running some larger workflows that test holistic behavior. This might include, for example, reading a 10,000 partition dataset of parquet data from S3, something that is important, but not something that we want to put into our test suite. This might also be a good place to include workflows from downstream projects like RAPIDS and Xarray.
This would be something that the release manager would be in charge of kicking off.
Some things that we would need to do
cc @jacobtomlinson @quasiben @jrbourbeau
The text was updated successfully, but these errors were encountered: