Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of fetching git dependencies by rev #10078

Closed
dtolnay opened this issue Nov 13, 2021 · 7 comments · Fixed by #10079
Closed

Improve performance of fetching git dependencies by rev #10078

dtolnay opened this issue Nov 13, 2021 · 7 comments · Fixed by #10079
Labels
A-git Area: anything dealing with git C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-blocked-external Status: ❌ blocked on something out of the direct control of the Cargo project, e.g., upstream fix

Comments

@dtolnay
Copy link
Member

dtolnay commented Nov 13, 2021

Problem

GitHub recently enabled support for fetching individual commits by commit hash (uploadpack.allowReachableSHA1InWant on the server side).

$ GIT_TRACE_PACKET=1 git fetch | head -1
17:49:50.750968 pkt-line.c:80           packet:        fetch< 458d3459cfb0f923fbe968e6868a4893af34ba69 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed allow-tip-sha1-in-want allow-reachable-sha1-in-want symref=HEAD:refs/heads/master filter object-format=sha1 agent=git/github-g6409641ef0c2

Notice the presence of allow-reachable-sha1-in-want in the advertised protocol capabilities.

Example of using "reachable sha1 in want" on the CLI:

$ mkdir cargo

$ cd cargo

$ git fetch --depth=1 https://github.com/rust-lang/cargo 88117505b8b691e0e7892630a71a85bb5e9945de
remote: Enumerating objects: 782, done.
remote: Counting objects: 100% (782/782), done.
remote: Compressing objects: 100% (685/685), done.
remote: Total 782 (delta 136), reused 316 (delta 65), pack-reused 0
Receiving objects: 100% (782/782), 2.08 MiB | 3.34 MiB/s, done.
Resolving deltas: 100% (136/136), done.
From https://github.com/rust-lang/cargo
 * branch                88117505b8b691e0e7892630a71a85bb5e9945de -> FETCH_HEAD

$ git log -1 88117505b8b691e0e7892630a71a85bb5e9945de
commit 88117505b8b691e0e7892630a71a85bb5e9945de (grafted)
Author: Eric Huss <[email protected]>
Date:   Fri Oct 22 07:53:17 2021 -0700

    Bump to 0.59.0

Notice in the above log that only 2.08 MiB total were downloaded. This is significantly less than the 50+ MiB of the whole Cargo repo. For larger repos the difference can be even more significant.

Unfortunately today Cargo doesn't make use of "reachable sha1 in want". Instead, when you specify a git dependency like cargo = { git = "https://github.com/rust-lang/cargo", rev = "88117505b8b691e0e7892630a71a85bb5e9945de" }, Cargo fetches all branches and tags and their entire history, hoping that the requested commit id is somewhere among that potentially enormous pile of commits:

// We don't know what the rev will point to. To handle this
// situation we fetch all branches and tags, and then we pray
// it's somewhere in there.
refspecs.push(String::from("+refs/heads/*:refs/remotes/origin/*"));
refspecs.push(String::from("+HEAD:refs/remotes/origin/HEAD"));
tags = true;

Proposed Solution

I've opened a PR containing the Cargo side of the implementation:

However libgit2, which is the C library wrapped by Cargo's git2 dependency, does not yet support "reachable sha1 in want" as far as I can tell (the git cli does, which is not based on libgit2, and is why the git fetch above is able to use it).

Someone will need to send a PR to libgit2 implementing "reachable sha1 in want", then pull the changes into https://github.com/rust-lang/git2-rs, and finally land the Cargo change to use it.

Notes

As a side benefit, this change will make rev = "..." dependencies support revs which are not in the history of any upstream branch or tag. But the performance or disk usage improvement will be the more noticeable benefit to most Cargo users.

@dtolnay dtolnay added the C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` label Nov 13, 2021
@dtolnay
Copy link
Member Author

dtolnay commented Nov 13, 2021

Here is a test program to validate any change to libgit2:

// [dependencies]
// git2 = "0.13"

fn main() -> Result<(), git2::Error> {
    let _ = std::fs::remove_dir_all("cargo");
    eprintln!("done rmdir");
    let repo = git2::Repository::init("cargo")?;
    eprintln!("done init");
    let mut remote = repo.remote_anonymous("https://github.com/rust-lang/cargo")?;
    eprintln!("done remote");
    remote.fetch(&["+88117505b8b691e0e7892630a71a85bb5e9945de:refs/success"], None, None)?;
    eprintln!("done fetch");
    let thing = repo.revparse_single("refs/success")?;
    println!("{}", thing.id());
    Ok(())
}

Currently, the fetch returns Ok but does nothing. That's because libgit2 implements fetching by filtering the list of refs offered by the server:

https://github.com/libgit2/libgit2/blob/043f3123e3c63b634d9bfd7276e70d86b8accadc/src/fetch.c#L57

If the server never offers the ref you are looking for, the filter never passes, and nothing gets fetched. Implementing "reachable sha1 in want" in libgit2 will involve sending the server a want followed by the SHA-1 being requested. See where it discusses "want" in https://git-scm.com/book/en/v2/Git-Internals-Transfer-Protocols.

@ehuss
Copy link
Contributor

ehuss commented Dec 6, 2021

I posted libgit2/libgit2#6135 for upstream request.

@ehuss ehuss added A-git Area: anything dealing with git S-blocked-external Status: ❌ blocked on something out of the direct control of the Cargo project, e.g., upstream fix labels Dec 6, 2021
@dtolnay
Copy link
Member Author

dtolnay commented Feb 14, 2022

libgit2 1.4.0 was just released and contains the functionality needed for this. We're now blocked on a libgit2 bump in the git2 crate: rust-lang/git2-rs#804.

@dtolnay
Copy link
Member Author

dtolnay commented Jun 20, 2022

The libgit2 functionality for this is now supported by the git2 crate in versions 0.14.0 and newer. With that, I believe this issue is unblocked and the remaining work just needs to be done in Cargo.

@dtolnay
Copy link
Member Author

dtolnay commented Jun 20, 2022

In fact a bump of cargo's git2 dependency to 0.14 has already happened (#10442 + #10479). I'll reopen #10079 and try to rebase and test.

@weihanglo
Copy link
Member

Out of curious. Does it help even without shallow fetch support?

@dtolnay
Copy link
Member Author

dtolnay commented Jun 21, 2022

Yes; see #10079.

Before After
objects downloaded 69704 21481
time 7.0 sec 2.2 sec
disk usage 41 MB 17 MB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-git Area: anything dealing with git C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-blocked-external Status: ❌ blocked on something out of the direct control of the Cargo project, e.g., upstream fix
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants