Skip to content

Commit

Permalink
Improve the performance of the formatter instability check job (#14471)
Browse files Browse the repository at this point in the history
We should probably get rid of this entirely and subsume it's
functionality in the normal ecosystem checks? I don't think we're using
the black comparison tests anymore, but maybe someone wants it?

There are a few major parts to this:

1. Making the formatter script idempotent, so it can be run repeatedly
and is robust to changing commits
2. Reducing the overhead of the git operations, minimizing the data
transfer
3. Parallelizing all the git operations by repository

This reduces the setup time from 80s to 16s (locally).

The initial motivation for idempotency was to include the repositories
in the GitHub Actions cache. I'm not sure it's worth it yet — they're
about 1GB and would consume our limited cache space. Regardless, it
improves correctness for local invocations.

The total runtime of the job is reduced from ~4m to ~3m.

I also made some cosmetic changes to the output paths and such.
  • Loading branch information
zanieb authored Nov 20, 2024
1 parent 942d6ee commit 3c52d2d
Show file tree
Hide file tree
Showing 2 changed files with 76 additions and 45 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -563,12 +563,12 @@ jobs:
run: rustup show
- name: "Cache rust"
uses: Swatinem/rust-cache@v2
- name: "Formatter progress"
- name: "Run checks"
run: scripts/formatter_ecosystem_checks.sh
- name: "Github step summary"
run: cat target/progress_projects_stats.txt > $GITHUB_STEP_SUMMARY
run: cat target/formatter-ecosystem/stats.txt > $GITHUB_STEP_SUMMARY
- name: "Remove checkouts from cache"
run: rm -r target/progress_projects
run: rm -r target/formatter-ecosystem

check-ruff-lsp:
name: "test ruff-lsp"
Expand Down
115 changes: 73 additions & 42 deletions scripts/formatter_ecosystem_checks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
# errors.
#
# This script will first clone a diverse set of (mostly) black formatted
# repositories with fixed revisions to target/progress_projects. Each project
# repositories with fixed revisions to target/formatter-ecosystem. Each project
# gets formatted (without modifying the files on disk) to check how
# similar our style is to black. It also catches common issues such as
# unstable formatting, internal formatter errors and printing invalid syntax.
Expand All @@ -18,72 +18,103 @@
set -e

target=$(git rev-parse --show-toplevel)/target
dir="$target/progress_projects"
dir="$target/formatter-ecosystem"
mkdir -p "$dir"

# Perform an idempotent clone and checkout of a commit
clone_commit() {
local repo="$1"
local name="$2"
local ref="$3"

if [ -z "$repo" ] || [ -z "$name" ] || [ -z "$ref" ]; then
echo "Usage: clone_commit <repo> <name> <ref>"
return 1
fi

local target="$dir/projects/$name"

if [ ! -d "$target/.git" ]; then
echo "Cloning $repo to $name"
# Perform a minimal clone, we only need a single commit
git clone --filter=blob:none --depth=1 --no-tags --no-checkout --single-branch "$repo" "$target"
fi

echo "Using $repo at $ref"
git -C "$target" fetch --filter=blob:none --depth=1 --no-tags origin "$ref"
git -C "$target" checkout -q "$ref"
}

# small util library
if [ ! -d "$dir/twine/.git" ]; then
git clone --filter=tree:0 https://github.com/pypa/twine "$dir/twine"
fi
git -C "$dir/twine" checkout -q ae71822a3cb0478d0f6a0cccb65d6f8e6275ece5
clone_commit \
"https://github.com/pypa/twine" \
"twine" \
"ae71822a3cb0478d0f6a0cccb65d6f8e6275ece5" &

# web framework that implements a lot of magic
if [ ! -d "$dir/django/.git" ]; then
git clone --filter=tree:0 https://github.com/django/django "$dir/django"
fi
git -C "$dir/django" checkout -q ee5147cfd7de2add74a285537a8968ec074e70cd
clone_commit \
"https://github.com/django/django" \
"django" \
"ee5147cfd7de2add74a285537a8968ec074e70cd" &

# an ML project
if [ ! -d "$dir/transformers/.git" ]; then
git clone --filter=tree:0 https://github.com/huggingface/transformers "$dir/transformers"
fi
git -C "$dir/transformers" checkout -q ac5a0556f14dec503b064d5802da1092e0b558ea
clone_commit \
"https://github.com/huggingface/transformers" \
"transformers" \
"ac5a0556f14dec503b064d5802da1092e0b558ea" &

# type annotations
if [ ! -d "$dir/typeshed/.git" ]; then
git clone --filter=tree:0 https://github.com/python/typeshed "$dir/typeshed"
fi
git -C "$dir/typeshed" checkout -q d34ef50754de993d01630883dbcd1d27ba507143
clone_commit \
"https://github.com/python/typeshed" \
"typeshed" \
"d34ef50754de993d01630883dbcd1d27ba507143" &

# python 3.11, typing and 100% test coverage
if [ ! -d "$dir/warehouse/.git" ]; then
git clone --filter=tree:0 https://github.com/pypi/warehouse "$dir/warehouse"
fi
git -C "$dir/warehouse" checkout -q 5a4d2cadec641b5d6a6847d0127940e0f532f184
clone_commit \
"https://github.com/pypi/warehouse" \
"warehouse" \
"5a4d2cadec641b5d6a6847d0127940e0f532f184" &

# zulip, a django user
if [ ! -d "$dir/zulip/.git" ]; then
git clone --filter=tree:0 https://github.com/zulip/zulip "$dir/zulip"
fi
git -C "$dir/zulip" checkout -q ccddbba7a3074283ccaac3bde35fd32b19faf042
clone_commit \
"https://github.com/zulip/zulip" \
"zulip" \
"ccddbba7a3074283ccaac3bde35fd32b19faf042" &

# home-assistant, home automation with 1ok files
if [ ! -d "$dir/home-assistant/.git" ]; then
git clone --filter=tree:0 https://github.com/home-assistant/core "$dir/home-assistant"
fi
git -C "$dir/home-assistant" checkout -q 3601c531f400255d10b82529549e564fbe483a54
clone_commit \
"https://github.com/home-assistant/core" \
"home-assistant" \
"3601c531f400255d10b82529549e564fbe483a54" &

# poetry, a package manager that uses black preview style
if [ ! -d "$dir/poetry/.git" ]; then
git clone --filter=tree:0 https://github.com/python-poetry/poetry "$dir/poetry"
fi
git -C "$dir/poetry" checkout -q 36fedb59b8e655252168055b536ead591068e1e4
clone_commit \
"https://github.com/python-poetry/poetry" \
"poetry" \
"36fedb59b8e655252168055b536ead591068e1e4" &

# cpython itself
if [ ! -d "$dir/cpython/.git" ]; then
git clone --filter=tree:0 https://github.com/python/cpython "$dir/cpython"
fi
git -C "$dir/cpython" checkout -q 28aea5d07d163105b42acd81c1651397ef95ea57
clone_commit \
"https://github.com/python/cpython" \
"cpython" \
"28aea5d07d163105b42acd81c1651397ef95ea57" &

# wait for the concurrent clones to complete
wait

# Uncomment if you want to update the hashes
#for i in "$dir"/*/; do git -C "$i" switch main && git -C "$i" pull; done
#for i in "$dir"/*/; do echo "# $(basename "$i") $(git -C "$i" rev-parse HEAD)"; done

time cargo run --bin ruff_dev -- format-dev --stability-check \
--error-file "$target/progress_projects_errors.txt" --log-file "$target/progress_projects_log.txt" --stats-file "$target/progress_projects_stats.txt" \
--files-with-errors 3 --multi-project "$dir" || (
--error-file "$dir/errors.txt" \
--log-file "$dir/log.txt" \
--stats-file "$dir/stats.txt" \
--files-with-errors 3 --multi-project "$dir/projects" \
|| (
echo "Ecosystem check failed"
cat "$target/progress_projects_log.txt"
cat "$dir/log.txt"
exit 1
)
cat "$target/progress_projects_stats.txt"

cat "$dir/stats.txt"

0 comments on commit 3c52d2d

Please sign in to comment.