Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pip freeze to use modern format for git repos #9822

Merged
merged 2 commits into from
Jun 4, 2021
Merged

Fix pip freeze to use modern format for git repos #9822

merged 2 commits into from
Jun 4, 2021

Conversation

bwoodsend
Copy link
Contributor

@bwoodsend bwoodsend commented Apr 21, 2021

Pip dropped support for git+ssh@ style requirements (see #7554) in favour of git+ssh:// but didn't propagate the change to pip freeze which resultantly returns invalid requirements. Fix this behaviour.

Fixes #9625.

@bwoodsend bwoodsend marked this pull request as ready for review April 21, 2021 10:14
@uranusjr
Copy link
Member

We will need to figure out first how Git displays port in the scp-like format to come up with a reasonable regex.

@bwoodsend
Copy link
Contributor Author

Quoting from here:

The SSH Protocol

A common transport protocol for Git when self-hosting is over SSH. This is because SSH access to servers is already set up in most places — and if it isn’t, it’s easy to do. SSH is also an authenticated network protocol and, because it’s ubiquitous, it’s generally easy to set up and use.

To clone a Git repository over SSH, you can specify an ssh:// URL like this:

$ git clone ssh://[user@]server/project.git

Or you can use the shorter scp-like syntax for the SSH protocol:

$ git clone [user@]server:project.git

In both cases above, if you don’t specify the optional username, Git assumes the user you’re currently logged in as.

So yes, it looks like my regex is a bit naive. It doesn't recognise that the user is optional.

I also see that both of these are valid for locally hosting repos:

$ git clone /srv/git/project.git
$ git clone file:///srv/git/project.git

Although I have no idea if they are ever likely to be used by pip users.

@uranusjr
Copy link
Member

I’m more concerned about how those different syntaxes would show up in git remote (which is the thing pip parses). If I understand correctly, git would convert a remote cloned with ssh:// to the scp-like format, but I am not sure how the conversion logic works. For example, how is ssh://mydomain.com:2323/my/repo.git converted? (scp does not accept port in the path syntax, so git must do something)

@bwoodsend
Copy link
Contributor Author

bwoodsend commented Apr 21, 2021

From what I can tell, irregardless of whatever it must do internally, git remote always matches what you put in without any conversion.

> git clone -q ssh://[email protected]/pypa/pip-test-package.git
> git -C pip-test-package/ remote -v
origin	ssh://[email protected]/pypa/pip-test-package.git (fetch)
origin	ssh://[email protected]/pypa/pip-test-package.git (push)
> git clone -q [email protected]:pypa/pip-test-package.git
> git -C pip-test-package/ remote -v
origin	[email protected]:pypa/pip-test-package.git (fetch)
origin	[email protected]:pypa/pip-test-package.git (push)

So as long as we can handle the various input formats I think git remote will behave itself?

I've absolutely no idea about ports. Could we maybe just call that bit outside the scope of this PR? If someone needs that feature then they probably know a lot more about it than I do and would be better suited to fixing it than me.

@uranusjr
Copy link
Member

From what I can tell, irregardless of whatever it must do internally, git remote always matches what you put in without any conversion.

Oh, awesome!

I've absolutely no idea about ports. Could we maybe just call that bit outside the scope of this PR? If someone needs that feature then they probably know a lot more about it than I do and would be better suited to fixing it than me.

Yeah, I think that’s reasonable. If people want ports to work they can always use the more obvious ssh:// format anyway instead of hacking it into the scp syntax. I think people know they’re going off the trail when they need to clone a repository served from a non-standard SSH port anyway (it’s even hard to just describe the scenario).

@bwoodsend bwoodsend marked this pull request as draft April 21, 2021 21:30
@bwoodsend
Copy link
Contributor Author

I've pushed another attempt at the regex (I'll squash the commit later once were happy).

It seems to me that, since git remote echos back whatever git clone receives, and pip install git+[some-remote]@commit-hash simply needs to use git clone [some-remote], pip should never need to unpack or validate a remote name. The only reason anything in this PR is needed is because pip install git+[...] seems to be too fussy. I don't see why any valid git remote url should not be accepted by pip install.

For example, why isn't:

pip install 'pip-test-package @ [email protected]:pypa/pip-test-package.git'

valid even though:

git clone -q [email protected]:pypa/pip-test-package.git /tmp/temp-dir
pip install /tmp/temp-dir

still works?

@uranusjr
Copy link
Member

I feel this may be complicating things too much; would it be easier if we just do

  1. Check for ://; if it’s in the value, this is a URL and we don’t need to parse any further.
  2. Check with os.path.exists(); no further parsing if this is a local path.
  3. OK this looks scp-like, do a simple regex replace.

@bwoodsend
Copy link
Contributor Author

Yeah actually I think that could work. I had dismissed your point 1 because I thought that the : after the server name could be used anywhere in git but it turns out it can't. i.e. These are both valid to git:

[email protected]:pypa/pip-test-package.git
ssh://[email protected]/pypa/pip-test-package.git

But this isn't and I assumed it was:

ssh://[email protected]/pypa:pip-test-package.git

In fact what I say here is total rubbish...

@bwoodsend
Copy link
Contributor Author

Ok I've tried as you said. I'm not wildly fond of the using os.path.exists() to identify local remotes. It'd silently break if you deleted the original remote but left the package installed in non-editable mode. But that's a relatively small kink...

PS: Can you give CI the prod its asking for?

@uranusjr
Copy link
Member

I’m not yet sure if I like this new “Approve and run” thing, but it’s pretty cool 🙂

@uranusjr
Copy link
Member

I'm not wildly fond of the using os.path.exists() to identify local remotes. It'd silently break if you deleted the original remote but left the package installed in non-editable mode. But that's a relatively small kink...

Good point, I didn’t consider the possibility of the original remote going away (which would make us misidentify a path as scp syntax, which sounds pretty wrong). But yeah, the practical difference is pretty small (if the original remove went away, the freeze file is broken beyond repair anyway), so we can afford to wait until someone actually comes complaining.

@uranusjr
Copy link
Member

BTW feel free to mark this as ready when you think it is.

@bwoodsend
Copy link
Contributor Author

Will do.... once Windows' CI is appeased...

@bwoodsend
Copy link
Contributor Author

bwoodsend commented Apr 24, 2021

Actually that CI failure is more interesting than it looks. It boils down to the fact that both:

file://R:/path/to/repo

and:

file:///R:/path/to/repo

and even:

file:///////////////R:/path/to/repo

are all legal to both git and pip. So really the test is being too fussy in demanding the 1st form, even though git (bizarrely) has opted for the second.

Also this looks like an exception to what I said above about git remote -v echoing back the input to git clone without normalisation - the input that caused file:///R:/path/to/repo was a plain windows path: git clone R:\path\to\repo.


Edit: I misunderstood URI's when I wrote this. Whilst it's still true that neither pip nor git care how many slashes you use, the correct syntax for absolute paths on Windows is to use 3 slashes so the file:///R:/path/to/repo generated by git actually makes sense.

@bwoodsend
Copy link
Contributor Author

OK, provided CI doesn't unearth anything else then I deem this ready (although I'd squash the commits unless you feel that the intermediate attempts are of any historical value).

@bwoodsend bwoodsend marked this pull request as ready for review April 24, 2021 14:19
@bwoodsend
Copy link
Contributor Author

bwoodsend commented Apr 24, 2021

Perhaps you could have an approve-ci label you can add onto PRs from new contributors once you've verified that the PR isn't just a spelling fix. Then on seeing that, CI can just get on with it without requiring re-approval.

@bwoodsend
Copy link
Contributor Author

Yay! Even Windows is happy. I'm going to squash commits 3 and 4 together. Do you think commits 1 and 2 are worth keeping or shall I squash the lot into 1 commit?

@bwoodsend
Copy link
Contributor Author

I suppose maybe the correct handling would be:

  1. Check for :// as before.
  2. Regex match/check for SCP-like syntax and use it to convert to ssh:// form.
  3. Check with os.path.exists() as before.
  4. Raise some kind of error saying that either this is local repo which has been deleted or some other invalid syntax.

That would handle all cases I can think of but... only if step 2 works. It all boils down to can we trust an SCP regex based on guesswork given that I can't find any formal definition of the syntax anywhere.

@uranusjr
Copy link
Member

What you have in mind looks good to me.

The scp syntax should never be present in a pip environment; it is only there for old environments (populated by old pip versions that played things too loosely), so in the worst case people can convert the environment to use the ssh:// URL manually. So I wouldn’t bother too much the logic isn’t perfect; as long as we can cover the most common cases, it’s good enough.

@bwoodsend
Copy link
Contributor Author

In the event of a case 4 (invalid/deleted repo), what should be the bail out behavior? I'm guessing some kind of logger.warning(f"Remote url '{url}' from '{dist_name}' is invalid ... lah deh dah"). But what should be returned by freeze? Just the invalid url? A series of question marks? Should the distribution be omitted entirely so that requirements.txt files are still valid?

@bwoodsend
Copy link
Contributor Author

Actually I see there is already a RemoteNotFoundError for no remote which propagates to a comment in pip freeze's output. I'll just mirror that behaviour.

@bwoodsend
Copy link
Contributor Author

bwoodsend commented May 1, 2021

Done it. And I squashed some of the older commits together. I thought I'd have to play ping pong with Window's CI so I enabled CI on my fork so that I wouldn't have to keep prodding you to run it for me. But to my great surprise it actually worked 1st go. Here's the CI run. Run it again here if you want...

I think I'm done so if both you and CI are happy then it's ready to merge.

@uranusjr uranusjr changed the title Fix pip freeze to use modern format for git repos (#9822) Fix pip freeze to use modern format for git repos May 2, 2021
@bwoodsend
Copy link
Contributor Author

@uranusjr Anything else you need me to do?

Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, I wasn’t able to find much time for OSS work the past weekend.

src/pip/_internal/operations/freeze.py Outdated Show resolved Hide resolved
src/pip/_internal/operations/freeze.py Outdated Show resolved Hide resolved
src/pip/_internal/vcs/git.py Show resolved Hide resolved
@bwoodsend
Copy link
Contributor Author

I've amended the last commit as per your suggestions 1 and 2. Here is another CI run.

On another note: would you mind if I added this line? Without it, the only way to use CI testing is via pull request.

@uranusjr
Copy link
Member

uranusjr commented May 8, 2021

Without it, the only way to use CI testing is via pull request.

One trick I do often is to send a PR to my own forked repository. GitHub will happily run checks in them.

src/pip/_internal/vcs/versioncontrol.py Outdated Show resolved Hide resolved
tests/unit/test_vcs.py Outdated Show resolved Hide resolved
@bwoodsend
Copy link
Contributor Author

And the latest whirl on CI...

src/pip/_internal/vcs/git.py Outdated Show resolved Hide resolved
@bwoodsend
Copy link
Contributor Author

@uranusjr ?

tests/unit/test_vcs.py Outdated Show resolved Hide resolved
tests/unit/test_vcs.py Outdated Show resolved Hide resolved
tests/unit/test_vcs.py Outdated Show resolved Hide resolved
Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming tests pass

@bwoodsend
Copy link
Contributor Author

Yippee. Got there eventually... 😛

@uranusjr uranusjr added this to the 21.2 milestone Jun 1, 2021
src/pip/_internal/vcs/git.py Outdated Show resolved Hide resolved
bwoodsend added 2 commits June 3, 2021 17:45
Pip dropped support for `git+ssh@` style requirements (see #7554)
in favour of `git+ssh://` but didn't propagate the change to
 `pip freeze` which resultantly returns invalid requirements.
Fix this behaviour.

Fixes #9625.
@uranusjr uranusjr merged commit b38b925 into pypa:main Jun 4, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'pip freeze' generates VCS pseudo URL a la git+git@
3 participants