Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an escape_name() function #542

Open
brettcannon opened this issue May 5, 2022 · 7 comments
Open

Add an escape_name() function #542

brettcannon opened this issue May 5, 2022 · 7 comments

Comments

@brettcannon
Copy link
Member

See #527 for the motivation. Basically utils.canonicalize_name(name).replace("-", "_") to match PEP 503 and 427 for project name normalization for sdists and wheels.

@MrMino

This comment was marked as resolved.

@brettcannon
Copy link
Member Author

I think simple replace is insufficient, see Escaping section in PEP427:

Each component of the filename is escaped by replacing runs of non-alphanumeric characters with an underscore _:

The runs of symbols is taken care of by canonicalize_name().

@dstufft
Copy link
Member

dstufft commented May 10, 2022

There's one tiny bit of ambiguity here, PEP 427 escaping doesn't require canonicalizing the name, while PEP 625 does. However, in practice the practical difference between those two statements is whether or not the output is lowercase or not. However, PEP 427 doesn't treat names as case sensitive, so there's no requirement to preserve case so I think it's perfectly fine to do that.

Actually, the other practical difference is that PEP 427 normalize leaves . in a section whereas this implementation does not allow that. I still don't think that matters exactly, but it does mean that the PEP 427 normalization function can work on any segment (name, version, compat tags, whatever), whereas the above implementation can only work on names.

I think that's OK, since the name of the function is escape_name(). Though it does raise the question if it makes sense to do something like:

import re

def escape_filename_segment(segment):  # better name?
    return re.sub("[^\w\d.]+", "_", segment, re.UNICODE)

Since that would work for all cases.

If not, does it make sense to add escape_version (which may or may not normalize version numbers? I'm not sure what makes sense)?

Dunno, I think all of the options are fine, just spelling out the edge cases.

@brettcannon
Copy link
Member Author

it does mean that the PEP 427 normalization function can work on any segment (name, version, compat tags, whatever), whereas the above implementation can only work on names.

I think that's OK, since the name of the function is escape_name().

That was my thinking: only care about a portion of the file when e.g. constructing an sdist file name.

it does raise the question if it makes sense to do something like:

import re

def escape_filename_segment(segment):  # better name?
    return re.sub("[^\w\d.]+", "_", segment, re.UNICODE)

Since that would work for all cases.

Interesting idea. I just don't know how often people are escaping the entire file at the end versus as they go.

If not, does it make sense to add escape_version (which may or may not normalize version numbers? I'm not sure what makes sense)?

I don't think so? But I don't have a good reason to think otherwise.

@brettcannon
Copy link
Member Author

Thinking about this more, maybe it would be better to expand the canonicalize_name() docs to say it can be used to escape a name with canonicalize_name(name).replace('-', '_')? I'm not sure if maintaining a separate function for such a niche thing is worth it as long as it's documented clearly how to produce the results? At least with canonicalize_name() the regex can be messed up and the type annotation on the return type is useful. But the return type for an escaped name isn't really there since it isn't something you would normally pass around as much as produce and then immediately consume.

@uranusjr
Copy link
Member

Or if we’re to add an API, I’d propose having something like ensure_valid_filename_component that also performs validation (i.e. makes sure the input is a valid packaging identifier to begin with).

@srittau
Copy link

srittau commented Nov 19, 2024

Since PyPI now requires PEP 625 filenames when uploading sdists, such a function (or at least clear documentation of the workaround) would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants