-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix is_url from splitting the scheme incorrectly when using PEP 440's direct references #6203
Conversation
Sorry for the newbie fails on the linting :( |
@uranusjr hi, could you please take a look at this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this approach is backwards: instead of treating @
in a URL-like string as a special case, the parser function should be able to exclude that case before the is_url
check is even done.
I need to think about this more in detail to figure out what the right approach is, but this is probably not it.
@uranusjr would it be a better idea to call Or maybe use a regex to strip an URL from the line? (So you won't treat just the |
src/pip/_internal/download.py
Outdated
return scheme in ['http', 'https', 'file', 'ftp'] + vcs.all_schemes | ||
|
||
|
||
def split_scheme_from_url(url): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like generic parsing functions like this should go in misc.py
with the other URL parsing functions. (Incidentally, I also think that path_to_url()
and friends shouldn't be in download.py
either.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at download.py
, these URL related functions should stay together, either at misc.py
or at download.py
. But misc.py
looks really polluted to me. Maybe creating an utils
package that contains a file for URL related functions would be a better idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, probably. :) But I don’t want to sidetrack this PR further. For functions other than new functions you’re adding here, it would need to be done as a separate PR. I’m also not sure what type of function you’ll wind up needing after your conversation with @uranusjr. (I haven’t thought about it myself.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I would really like to tackle this up :)
I'll add this function to the misc.py
then, if I still have to use it.
I’m thinking maybe the function should be reorganised somehow. Instead of checking for URL-like, path-like, and finally as a name, it should check for name first (using PEP 508’s definition; maybe |
@uranusjr do you have any idea on how to do it this way? I could only think about doing it the other way around, by elimination (e.g. if something is not a file or a url, it is probably a name). |
A formal syntax definition is included in the PEP 508 document, and (I believe) implemented by
I’m not sure if that would work, but it could be worth a try. |
@uranusjr thanks for the explanation. I'll try to understand this class more in-depth. I see that there are many characters and regex that are probably used for parsing here, e.g.
but I don't understand how pyparse works. Maybe the bug that I reported can be fixed here instead? (by adding proper validation) |
I don’t believe that matters, since the rule is only used as part of the >>> Requirement('foo@http://[email protected]')
<Requirement('foo@ http://[email protected]')>
>>> Requirement('http://[email protected]')
Traceback (most recent call last):
[snipped]
pip._vendor.packaging.requirements.InvalidRequirement: Parse error at "'://user@'": Expected stringEnd So I think you can do something like try:
Requirement(name)
except InvalidRequirement:
pass # Maybe a nameless URL or a path
else:
return ... # Create InstallRequirement from name
if is_url(name):
return ... # Create InstallRequirement from URL
return ... # Create InstallRequirement from path |
Err I read the parser code in whole, and it’s… a mess 😭 Let’s start over. So the code currently parses like this:
The problem now is that PEP 440 URL reqs should go 1b-2b-4, but currently falls into 1b-2a-3-4. So we need to find a distinctive characteristic between a path and a PEP 440 URL req (the name req variant poses no problems), and fix the condition in 2.
We can conclude: A URL req must contain at least one Now the fix becomes clear. The condition near line 235 should be modified to something like this: def _looks_like_path(name):
return (
os.path.sep in name or
(os.path.altsep is not None and os.path.altsep in name) or
name.startswith('.')
)
if is_url(name):
link = Link(name)
else:
...
elif is_archive_file(p):
if os.path.isfile(p):
link = Link(path_to_url(p))
else:
url_req_parts = p.split('@', 1)
if not _looks_like_path(url_req_parts[-1]):
logger.warning(...) I know, this change make the code even more messy than before, but this is the best I can come up with without taking the whole thing apart 😞 |
I played with
It looks like it parses named requirements here, which is what is expected But when using the I played a little bit with some validations:
|
It occurs to me just now that we need another test case for URLs with authentication.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more quick comments.
"Directory %r is not installable. Neither 'setup.py' " | ||
"nor 'pyproject.toml' found." % name | ||
) | ||
if is_archive_file(path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the "early return" pattern here again to reduce indentation by doing if not is_archive_file(path):
and then returning None. Then the rest doesn't need to be indented.
tests/unit/test_req.py
Outdated
|
||
@patch('pip._internal.req.req_install.os.path.isdir') | ||
@patch('pip._internal.req.req_install.os.path.isfile') | ||
def test_get_path_to_archive_pep440_url(isdir_mock, isfile_mock): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see you start writing these tests! A couple comments:
First, it's a helpful convention if when testing a function or method named my_method
, the test function starts with the string test_my_method
. That way it makes it easy to locate all the tests of a given function. So in this case, all of these should start with test_get_path_to_url_...
(you don't need to include the leading underscore). Also, if you have more than one test function for a certain function, you can add a suffix describing the special case, like test_get_path_to_url__archive_pep440_url()
. (I like to separate the function name portion from the suffix with a double underscore so someone can tell where the function name portion ends.)
Also, if you're testing multiple cases of a simple function, it helps to use @pytest.mark.parametrize
to cut down on the amount of repetition. Take a look at test_make_vcs_requirement_url
and the test functions following that for some examples. In this case, your inputs and outputs are strings (along with booleans to set your mocks), so it should be amenable to test parametrization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more comment: I like to put the test functions in the same order as the original functions appear in the module. This also makes it easier to locate test functions when you're scrolling around. The test module has a parallel structure to the module it's testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjerdonek is it ok to use a noqa
on test names? Just in case they get too big.
(Turns out I didn't need it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect they would ever get too big. You can put the arguments on the next line if it ever started to get too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3e57673
There, I had some issues parametrizing the tests that had URLs though.
@vinicyusmacedo Are you still working on your changes, or were you waiting for another review? I noticed at least one (easy) comment wasn't addressed, which is why I was waiting. |
@cjerdonek sorry, I forgot about some of the comments. I'm pushing them right now and I think that's it :) |
@vinicyusmacedo Can you also review the pip docs to see if anything needs changing / updating? For example, there is this part from the section on Requirements Specifiers that looks like it needs to be updated:
Maybe you can add a paragraph after the "Since version 6.0," paragraph saying, "Since version 19.1," describing the change you're adding. |
@cjerdonek requirements file format and examples need changing as well. Should I use |
@vinicyusmacedo You can leave out mention of the version for now in those other sections. |
Does this mean you can also delete the parentheses here:
|
@uranusjr Now that the code and tests for this PR are more in shape, and because @vinicyusmacedo followed the approach that you suggested, can you review this carefully, and also see if any test cases are missing or should be added? Like, would it be good to have any test cases anywhere with a space missing before and/or after the |
@cjerdonek hey, sorry for bothering, is there anything that I missed on this PR? |
Ping @uranusjr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use some squashing, but code-wise 👍
if os.path.altsep is not None and os.path.altsep in name: | ||
return True | ||
if name.startswith('.'): | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realised this if
does not work as intended, and probably should be removed. A ./whatever
string would’ve been caught in previous checks. This only matters for strings like .whatever
, which I guess still does look like a path…? (but then the docstring is not accurate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uranusjr that's exactly it. I don't really know why some package would start with .
, but I'll add a test case for it and add it to the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Packages can’t start with a dot, so this check doesn’t really matter either way :p But it’s better to remove it since its mere existence can be confusing to future readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I have removed it then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uranusjr oops, actually you can use .
to install a package. You can use it to install the current directory as a package if it has a setup.py
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have a problem, though: the Windows tests might fail with the ==
since the path separator is different. I'll go with name.startswith
then (I could make separate test cases for Windows, but that doesn't sound so good).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the sep and altsep parts cover the different separators (if my memory of implementation from other projects serves).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understood from the docs, it appears to be only available on Windows (the altsep on Windows would be the forward-slash)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct, hence the first test would detect \
on Windows and /
on POSIX; the second test detects /
on Windows (always false on POSIX).
You could add a simple Windows-only test like this, if you’re inclined to:
@pytest.mark.parametrize('path', [
'.\\path\\to\\installable',
'relative\\path',
'C:\\absolute\\path',
])
@pytest.skipif(os.path.sep != '\\')
def test_looks_like_path_win(path):
assert _looks_like_path(path) == True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uranusjr I could also skip this test if not sys.platform.startswith("win")
Thanks for sticking with it @vinicyusmacedo 👍 |
@xavfernandez @uranusjr just added some Windows-specific tests. |
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
@xavfernandez is it possible to merge this one? |
Gentle up! How can we help to get this merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I went through the existing comments and I believe all of them are addressed, so I will merge this. If I missed anything we can always address it in a followup. Thanks for sticking with it @vinicyusmacedo! |
@chrahunt thank you and everyone who reviewed this PR :) |
Hello,
This PR fixes #6202 and includes tests for this issue.
When installing a .whl from a remote URL following this example,
pip @ https:///somewhere/pip-1.3.1-py33-none-any.whl
is_url
was splitting the scheme incorrectly and it wouldn't recognize the line as a URL. Pip would try (and fail) to reference a local .whl file instead.