Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clone/pull on large repositories/submodules: Recursion and depth #196

Open
gerardbosch opened this issue Apr 27, 2020 · 2 comments
Open

Comments

@gerardbosch
Copy link

Thanks for your project Anders. I've just started using Homeshick and find it really useful (still WIP: https://github.com/gerardbosch/dotfiles)

I see the homeshick clone and homeshick pull commands initializes recursively submodules.

  • In case of clone: git clone --recursive
  • In case of pull: git pull && git submodule update --recursive --init

Do you think it would make sense to perform the following with --depth=1 instead?

  • clone case: git clone && git submodule update --init --recursive --depth=1 (here clone and update are split to achieve depth=1)
  • pull case: git pull && git submodule update --init --recursive --depth=1

(I'm not very sure if other commands would be affected).

Rationale:

Sometimes submodules can come with a huge history, adding unnecessary MB.

I see that for example, Zsh package manager Antigen, clones packages (they call them bundles) by doing --depth=1. I've thought it may make sense for Homeshick as well.

For example, I added bash-it as a submodule to my dotfiles. In the Github's bash-it README, they instruct you to clone the project with --depth=1. The problem here is that now that this submodule sits in my dot files, it grows from 4,8MB to 44MB if I do homeshick pull (or if I do homeshick clone in another account/machine). This is just an example, but will happen with any other project with heavy-sized histories OR heavy recursive modules.

I could provide a PR if necessary.

--

I also have a side question (more anecdotically): Do you know if is there any way to configure Git to "ignore initialization" of certain submodules (like test libraries) on git submodule update? Following the same example, bash-it comes with some other submodules for testing the project:

bash-it/test_lib/bats-assert
bash-it/test_lib/bats-core
bash-it/test_lib/bats-file
bash-it/test_lib/bats-support

I mean, nested submodules in repos that are out of my control. I guess adding ignore = dirty and update = none in its .gitmodules would make it, but not very sure.

They are lightweight in this case (even though could be not), but I don't think I actually need these test dependencies (they are actually not initialized if I just do git clone .../bash-it.git). Maybe this would be already mitigated also using the --depth=1 in homeshick clone/pull.

Thanks!

@andsens
Copy link
Owner

andsens commented May 12, 2020

Apologies for the late answer, and thank you for the kind words.

This is a brilliant idea! I struggle with the same issues sometimes because I use prezto.

There are some challenges though:

  • We can't make this the default. git fetch --unshallow is not a well known thing, so people would not know how to switch to full depth.
  • In order to update a repo, homeshick needs to git fetch --depth=1 && git reset --hard origin/current-branch. But first we need to detect if we are in shallow or full mode. The current-branch thing is also non-trivial, what if the user has checked out a different commit somehow? Also reset --hard will discard any changes, so we need error messages and warnings.
  • With git submodules we repeat all of the problems above, with the added layer of complication that is, well... submodules. We would also need to handle mixed mode, i.e. a submodule is in full, but the parent is in shallow. Or vice versa. (Updating shallow submodules themselves is luckily rather easy, it's just git submodule update --init --recursive --depth=1)

I'm not saying that these are insurmountable problems. But they would indicate that we'd need quite a lot of code and user interaction to handle this. If that is the case, it would be a no-go, since the strength of homeshick lies in the transparency of what is going on and the simplicity in how you set it up / configure it.

Do you know if is there any way to configure Git to "ignore initialization" of certain submodules

Hm, if it were in the root repo I'd just go with something that consumes the git submodule output, but when talking about a sub-submodule we don't even have that info. It'd be like preventing clone --recursive from initializing some specific submodules before you have any data. Unless you split the initialization up into stages I don't see how :-/

@gerardbosch
Copy link
Author

Thanks for your reply. As per your comments I think this could be more complex than I initially guessed, so maybe it is more complex than what I suggested of changing the homeshick clone/pull commands to

git clone && git submodule update --init --recursive --depth=1
git pull && git submodule update --init --recursive --depth=1 

Just to understand better, which would be the downside of this switch? If I understand well, this would make a full clone of user dotfiles castle, but a shallow clone of castle's submodules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants