Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Improve the git status speed. #359

Merged
merged 5 commits into from
Sep 26, 2024
Merged

Conversation

matglas
Copy link
Contributor

@matglas matglas commented Sep 24, 2024

What this PR does / why we need it

By using the native git client its possible to improve the speed significantly.
The default approach is using go-git to no rely on a git client that is installed. But if git is found it will use the git client instead which is much faster compared to the go-git library.

Which issue(s) this PR fixes (optional)

Fixes #358

Acceptance Criteria Met

  • Docs changes if needed
  • Testing changes if needed
  • All workflow checks passing (automatically enforced)
  • All review conversations resolved (automatically enforced)
  • DCO Sign-off

Special notes for your reviewer:

I have large mono repo. For posterity here are the results of a simple run that does not change anything and then with the changes. Basically: witness run --step build -- echo "hello"

Repository stats of the working tree.

$ git count-objects -vH

count: 219
size: 1.13 MiB
in-pack: 2468342
packs: 29
size-pack: 1.36 GiB
prune-packable: 25
garbage: 0
size-garbage: 0 bytes

v0.6.0: 82.82s user 96.35s system 63% cpu 4:40.58 total
change: 55.27s user 66.71s system 66% cpu 3:02.30 total

By using the native git client its possible to improve the speed
significantly.

Signed-off-by: Matthias Glastra <[email protected]>
@ChaosInTheCRD
Copy link
Collaborator

Hey @matglas!

This all looks good to me from a first look. Of course the question of switching from go-git to git is a question in of itself, one that I am sure that others will have opinions on.

What I will say is that I have seen the use of the git CLI inside go-releaser (here is the internal package inside that project). I think it's a safe assumption that git will be a present in nearly all environments, but as you said this change supports the minority use cases as well.

@colek42
Copy link
Member

colek42 commented Sep 24, 2024

Do you think it makes sense to record the hash and location of the git binary being used?

@matglas
Copy link
Contributor Author

matglas commented Sep 24, 2024

I had been thinking about that indeed but had not added it yet. It might be valuable to add the following indeed.

{
  "git-tool": "go-git+go-bin",
  "git-bin-path": "/usr/local/bin",
  "git-bin-hash": "sha256:0abc123..."
}

@matglas
Copy link
Contributor Author

matglas commented Sep 24, 2024

Besides the earlier mentioned points. Does anyone think this needs any specific tests or documentation. I could not really think of anything.

@matglas
Copy link
Contributor Author

matglas commented Sep 24, 2024

I added the output. The data looks like this now. I'm following the naming convention that is used by the other fields.

{
         ...
         "gittool": "go-git+git-bin",
         "gitbinpath": "/opt/homebrew/bin/git",
         "gitbinhash": "sha256:e4c3a12a8af0e9cced8b3a2993d18a5c8ac994cfd4713e7d73d8d2c3c1ea3eba",
         ...
}

I am in doubt about the gitbinhash. If I should use the digestset output. But it feels like a lot of overhead to add. I could be wrong and the structure would be better because then it becomes easier to write validators. Let me know your opinions on that.

When the git binary is used to for status we also include
the path to the binary and the hash of the binary files.

Also by default the git tool used to generate the attestation
data is added as contextual information.

Signed-off-by: Matthias Glastra <[email protected]>
kairoaraujo
kairoaraujo previously approved these changes Sep 25, 2024
Copy link
Collaborator

@kairoaraujo kairoaraujo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit thing

attestation/git/git_bin.go Outdated Show resolved Hide resolved
Co-authored-by: Kairo Araujo <[email protected]>
Signed-off-by: Matthias Glastra <[email protected]>
kairoaraujo
kairoaraujo previously approved these changes Sep 25, 2024
Copy link
Collaborator

@kairoaraujo kairoaraujo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I will wait for others review.

@colek42
Copy link
Member

colek42 commented Sep 25, 2024

I added the output. The data looks like this now. I'm following the naming convention that is used by the other fields.

{
         ...
         "gittool": "go-git+git-bin",
         "gitbinpath": "/opt/homebrew/bin/git",
         "gitbinhash": "sha256:e4c3a12a8af0e9cced8b3a2993d18a5c8ac994cfd4713e7d73d8d2c3c1ea3eba",
         ...
}

I am in doubt about the gitbinhash. If I should use the digestset output. But it feels like a lot of overhead to add. I could be wrong and the structure would be better because then it becomes easier to write validators. Let me know your opinions on that.

I think digest set is the way to go, mainly since it supports multiple algos and is pluggable. cc @mikhailswift

@mikhailswift
Copy link
Member

Yeah, probably prefer the digest set as the output of the git binary hash, but otherwise I think this looks good.

@matglas
Copy link
Contributor Author

matglas commented Sep 25, 2024

Output looks like this now.

{
          "gittool": "go-git+git-bin",
          "gitbinpath": "/opt/homebrew/bin/git",
          "gitbinhash": {
            "sha256": "e4c3a12a8af0e9cced8b3a2993d18a5c8ac994cfd4713e7d73d8d2c3c1ea3eba"
          },
}

mikhailswift
mikhailswift previously approved these changes Sep 25, 2024
Copy link
Member

@mikhailswift mikhailswift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, Matias!

@mikhailswift
Copy link
Member

Just saw that the verify schema CI job failed -- a quick make schema should fix it :)

@matglas
Copy link
Contributor Author

matglas commented Sep 25, 2024

Will do that tomorrow. Something went wrong in my local rebase.

@mikhailswift mikhailswift merged commit 80f9b23 into in-toto:main Sep 26, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feat]: Speed up the git status part of the git attestor
5 participants