-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf Tests: Type 450 times for type
metric instead of 20
#47852
base: trunk
Are you sure you want to change the base?
Conversation
Size Change: 0 B Total Size: 1.33 MB ℹ️ View Unchanged
|
Flaky tests detected in 17ea27e. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/4237827291
|
type
metric instead of 20
type
metric instead of 20
714949b
to
eee76d9
Compare
@sgomes @youknowriad seems like the new-page-per-test dramatically improved the shape of the distribution on the |
What happens to the bias if we disable/comment all the other metrics. I mean a job that only compares and logs the "type" metric. Is this something you tested in the past? The hypothesis is that if a single metric job does better than multiple ones, maybe we could rewrite the workflow to perform the metrics in different jobs. |
I tried this once and it didn't seem to have a notable impact, but that was before the new-page-per-test change. with the reported mean difference here being 0.32 ms, less than 1% difference, it concerns me far less from a data-quality standpoint. I'll update this PR to only track the
yeah this is probably a good idea overall. if I were spending more time on the tests I think I would try and do this more broadly, particularly since the recent introduction of theme perf tests bumped up the runtime again. the upside to splitting is that things will move that much faster in PR flows, but the downside is of course that local runs of the perf tests might get more complicated; instead of running one command we would potentially have to run a suite of commands. |
@youknowriad results for only collecting I'll rebase this PR and upload again, but for completeness sake, here is the distribution when only running the |
By typing 200 times instead of 20 we will examine if this impacts the measured variability and bias in the `type` performance metric.
In this commit we're storing the raw results of all the performance test runs as an artifact in GitHub Actions so that we can perform more extensive analysis outside of workflow runs. For every _round_ of testing we end up with a copy of the results in a JSON file with the branch ref, the test suite name, and the test run index in its name. This makes it possible to analyze each run separately without making it hard to analyze everything together.
0a6df2e
to
17ea27e
Compare
Status
Please ignore for now, this is for testing.
Saw an 8% difference in same-code branches for
type
. Completed in 36 min for 1 round.Description
By typing 200 times instead of 20 we will examine if this impacts the measured variability and bias in the
type
performance metric.Results