Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve efficiency of CI checks (so we can add MORE!) #13845

Open
Tracked by #13813
alamb opened this issue Dec 19, 2024 · 20 comments
Open
Tracked by #13813

Improve efficiency of CI checks (so we can add MORE!) #13845

alamb opened this issue Dec 19, 2024 · 20 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Dec 19, 2024

Is your feature request related to a problem or challenge?

There is a tension between adding more tests and and PRs and code velocity (more tests --> longer CI)

DataFusion runs a many tests on every change to every PR. For example my most recent PR ran 24 tests (link) consuming over an hour of worker time.

This has several challenges

  • Increases CI cycle time and thus how fast contributors get feedback on their PRs
  • Limits new test we can add: As we contemplate adding even more testing (which we should) such as the sqllogictest suite adding them to CI would make the time even worse
  • wastes worker credits (though we don't run out of credits largely because the Apache Software Foundation has approximately unlimited time from github, but there are broader concerns, like the amount of CO2 generated by that waste 🙈 )

Another observation is that there are several tests also rarely fail in PRs, but offer important coverage such as the Windows and mac tests. We even disabled the Windows test due to its (lack of) speed.

Describe the solution you'd like

I would like to improve the efficiency of existing CI jobs as well as have a mechanism to run both new and existing tests that offer important coverage but take too long to run on each CI

Describe alternatives you've considered

Here are some options from my past lives:

Option 1: Change some tests to only run on merges to main

In this option, we could change some checks to only run on merges to main. For example, we could run the windows tests only on merges to main rather than also on PRs.

Instead of all jobs being triggered like

# trigger for all PRs and changes to main
on:
  push:
    branches:
      - main
  pull_request:

we would change some tests to run like

# trigger for only changes to main
on:
  push:
    branches:
      - main

pros:

  • simple to implement (change github branch matching rules)
    cons:
  • PRs could merge to main that broke the tests, so the tests would be brokenand someone would have to chase / fix the tests
  • Someone (maintainers?) would have to traige and fix these tests

We could probably add some sort of job that would automatically make revert PRs for any code that broke the main tests to help triage

Option 2: Implement more sophisticated merge flow

In this option, I would imagine a workflow like

  1. Open a PR
  2. Subset of tests pass
  3. PR is "marked for merge" somehow
  4. Additional longer CI checks run and if pass PR is merged, if fail PR is "unmarked for merge"

This might already be supported by github in Merge Queues

There are probably bots like https://github.com/rust-lang/homu that could automate something like this

pros:

  • more automation / less human intervention
    cons:
  • More complicated to setup / maintain

Option 3: Your idea here

Additional context

No response

@findepi
Copy link
Member

findepi commented Dec 19, 2024

In this option, we could change some checks to only run on merges to main. For example, we could run the windows tests only on merges to main rather than also on PRs.

Running tests after a merge has a problem: who gonna look at the results?
And who gonna fix this if something breaks.
Nightly tests seems suitable for stress tests and scale benchmarks, which inherently need to take time to be meaningful.

It seems to me that we should either give merge queue a shot, or at least understand why we're not doing that.

@comphead
Copy link
Contributor

I agree with @findepi, once it is merged it is a deferred problem and much harder to find the root cause.
My initial naive thought was to have smoke tests/correctness tests on PRs and more advanced things like benchmarks before the release, but this approach has the same flaw of deferred problem detection.

The challenge is we focused to move most of test cases to sqllogictests, which is good and doesn't have tests duplication anymore. But another side of the medal we cannot unit test some packages without end 2 end tests which are sqllogictest and the latter requires all DF to be compiled

@alamb
Copy link
Contributor Author

alamb commented Dec 19, 2024

I agree with @findepi, once it is merged it is a deferred problem and much harder to find the root cause.

Yes, I also 100% agree.

My initial naive thought was to have smoke tests/correctness tests on PRs and more advanced things like benchmarks before the release, but this approach has the same flaw of deferred problem detection.

Maybe we could still run the more advanced tests before merging a PR but would not run them all during development, review and iteration.

This would ensure that if a PR introduced a problem it wouldn't be merged to main so the error is not deferred

I expect in most cases the "extended" tests will simply pass and things will be fine. Occasionally the extended tests would fail and we would have to go dig into why before the PR was merged

It seems like wishful thinking though, as I am not sure there is capacity to create and maintain such a workflow 🤔

@alamb
Copy link
Contributor Author

alamb commented Dec 19, 2024

I was also thinking that this "when to run more extended tests" may have stopped the great work @2010YOUY01 did to run with Sqllancer.

I suspect it is somewhat thankless work to run, triage, and file those tickets and totally understandability interests move on

@findepi
Copy link
Member

findepi commented Dec 20, 2024

Maybe we could still run the more advanced tests before merging a PR but would not run them all during development, review and iteration.

This would ensure that if a PR introduced a problem it wouldn't be merged to main so the error is not deferred

That's exactly what merge queues are supposed to do.

benchmarks before the release, but this approach has the same flaw of deferred problem detection.

Note that benchmarks are not qualitative, but quantitive.
It's possible to automatically catch big regressions without false positives, but it's harder to catch small regressions.

if we have merge queue, we should be able to do some benchmarks, but not necessarily all benchmarks.
With a merge queue i still would hope for "interactive developer experience", ie that the PR gets merged not long after it's scheduled for a merge.

@alamb
Copy link
Contributor Author

alamb commented Dec 21, 2024

It sounds like then "implement a basic merge queue" would be a good next step!

@Omega359
Copy link
Contributor

Maintaining an extended workflow shouldn't be too bad tbh. I think having a workflow that runs outside of PR's (iow runs nightly) could be useful as well for expensive tests that rarely break (or just a take a long time to run). Think sqlite tests, etc

@alamb
Copy link
Contributor Author

alamb commented Dec 22, 2024

Maintaining an extended workflow shouldn't be too bad tbh. I think having a workflow that runs outside of PR's (iow runs nightly) could be useful as well for expensive tests that rarely break (or just a take a long time to run). Think sqlite tests, etc

I think the cost / annoyance will be when the tests start failing, someone has to care enough to look into the failure and triage / file tickets

@edmondop
Copy link
Contributor

Other alternatives:

Split test executions in multiple jobs

We can generate a docker image with the code and the tests compiled, and then having separate jobs that pulls the docker image from the embedded github docker reigstry and run a subset of the tests. This way the tests could be parallized

Using more powerful workers

I have no knowledge of the current status, but in general workers from Github are pretty slow. If the Apache Foundation Infra has a Kubernetes cluster, we could run Github hosted workers there (also useful if we want to use GPUs for testing)

@Omega359
Copy link
Contributor

Maintaining an extended workflow shouldn't be too bad tbh. I think having a workflow that runs outside of PR's (iow runs nightly) could be useful as well for expensive tests that rarely break (or just a take a long time to run). Think sqlite tests, etc

I think the cost / annoyance will be when the tests start failing, someone has to care enough to look into the failure and triage / file tickets

Ah yes, I can definitely see that as a problem for sure.

@alamb
Copy link
Contributor Author

alamb commented Dec 23, 2024

Thanks for the ideas @edmondop -- FWIW the ASF doens't have its own hosted workers (individual projects could do that -- arrow did for a time -- but it requires dedicated time / effort from someone / some company to maintain them, which I don't think we have in DataFusion at this time

@alamb
Copy link
Contributor Author

alamb commented Jan 1, 2025

BTW I think we may get our first experience with tests that don't run on PRs as part of

What I envision doing is running on each commit to main and then we'll have to have the discipline to look at any failures that happen and fix them. I am not 100% sure we'll be able to muster to ability to do so but hope to give it a try

@logan-keede
Copy link
Contributor

I just made a PR that moves a single .slt file in sqllogictests. #14254
On commit, I cant help but notice that many of them does not seem necessary.

@alamb Perhaps we can run github actions workflow only if the pushed files are in specific/relevant folder .
I think this method is used in many open source projects.

Rationale:-
why do datafusion-examples need to be tested when they have already been tested in previous merge and no changes are made to them jn this PR? and so on..

@alamb
Copy link
Contributor Author

alamb commented Jan 23, 2025

I just made a PR that moves a single .slt file in sqllogictests. #14254 On commit, I cant help but notice that many of them does not seem necessary.

@alamb Perhaps we can run github actions workflow only if the pushed files are in specific/relevant folder . I think this method is used in many open source projects.

Rationale:- why do datafusion-examples need to be tested when they have already been tested in previous merge and no changes are made to them jn this PR? and so on..

Thanks @logan-keede -- I agree it makes sense

The trick is figuring out and encoding what the dependencies are (in the sense that if a change would break a CI test, you must somehow ensure that CI test is run on the PR). I agree the current DataFusion aproach is likely an overly large hammer, but it does satisfy the requirement

Examples use the DataFusion API so ensuring that any PRs that changes the API tests them I think are needed

The idea of making a special case for changes only to .slt files is an excellent one.

We do have some custom rules like for docs only changes 🤔

@findepi
Copy link
Member

findepi commented Jan 24, 2025

Perhaps we can run github actions workflow only if the pushed files are in specific/relevant folder .
I think this method is used in many open source projects.

Yes! to test impact analysis.
Trino project uses it and it had tremendous positive impact on the CI throughput.
Maven projects have more standardized structure, but I am sure some good tooling for cargo exists (or will exist) to do this.

The idea of making a special case for changes only to .slt files is an excellent one.

This we can do too. Not sure how often PRs change only slt files, but may still be worth it.

@logan-keede
Copy link
Contributor

I think we can further boost run time of cargo tests and cargo build by upto 5 mins(at least theoretically) by using cache in docker image (I mean having an image with cargo build already executed). This depends on available bandwidth of github VM/OS(on which the tests run). so, it becomes compiling 20 GB(upto 8 min) vs downloading 20 GB(no idea) + selective installing(observed average of 100 sec).

one more issue might be that this image might need to be generated regularly but perhaps we can go with building/composing it once a day and that will work.

Image

I think github has its own cache so that might make download of even this chunky image faster in future.
I am not sure how this all plays out in practice so let me know your views.

cc @alamb @findepi

@alamb
Copy link
Contributor Author

alamb commented Jan 24, 2025

Thank you for the good ideas @logan-keede

Given we have

  1. Very limited bandwidth for maintenance (and even less for infrastructure maintenance)
  2. Effectively unlimited github workers (thank you github for contirbution in kind!)

I think we should priortize ease of maintenance / understability in the CI scripts

While using pre-built docker images seems like it would indeed improve performance, we would have to be careful with how much extra complexity it woudl add to the build system

@findepi
Copy link
Member

findepi commented Jan 24, 2025

GitHub cache is great, but 20GB cached state is pretty sizeable, so could imply quick invalidation.
The cache isn't maintenance free. For example the build should handle the case where cache state is no longer there.

@logan-keede do you maybe know how to do test impact analysis with cargo?

@logan-keede
Copy link
Contributor

Thank you for the good ideas @logan-keede

Given we have

  1. Very limited bandwidth for maintenance (and even less for infrastructure maintenance)
  2. Effectively unlimited github workers (thank you github for contirbution in kind!)

I think we should priortize ease of maintenance / understability in the CI scripts

While using pre-built docker images seems like it would indeed improve performance, we would have to be careful with how much extra complexity it woudl add to the build system

maybe we can revisit it in the future when we have more bandwidth for maintenance,

GitHub cache is great, but 20GB cached state is pretty sizeable, so could imply quick invalidation. The cache isn't maintenance free. For example the build should handle the case where cache state is no longer there.

hmm I am not sure but seems like this might require a lot of exploration and testing before we can use such approach, even if we have more workforce focusing on maintenance.

@logan-keede do you maybe know how to do test impact analysis with cargo?

This is the first time I ever heard of this term, but now that I have looked into it has certainly piqued my interest, will read up on that.

@Omega359
Copy link
Contributor

GitHub cache is limited to 10Gb. I've tried using the rust cache action in the past and while it sometimes helped it often didn't do a whole lot thanks to cargo/rust's incredibly picky build system.

The largest unit of work still left for a PR checkin to complete testing is still cargo test doc. Perhaps that would be a good place to start to see if it could be avoided or reduced. For example, non-source code changes might not trigger that job? I have ideas on how to improve that but it would likely require the 2024 edition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants