Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify status of non-identifier unpacked keyword arguments #96397

Closed
merwok opened this issue Aug 29, 2022 · 17 comments · Fixed by #96393
Closed

Clarify status of non-identifier unpacked keyword arguments #96397

merwok opened this issue Aug 29, 2022 · 17 comments · Fixed by #96393
Assignees
Labels
3.11 only security fixes 3.12 bugs and security fixes docs Documentation in the Doc dir

Comments

@merwok
Copy link
Member

merwok commented Aug 29, 2022

In current CPython, unpacked keyword arguments must be strings but don’t have to be valid identifiers.

Discussion: https://discuss.python.org/t/supporting-or-not-invalid-identifiers-in-kwargs/17147

Messages from Guido (creator of the language) and core developers show that this behaviour is desired and will not be changed.

The issue is whether this is a CPython (stable) implementation detail, in which case docs could be clarified with the fix backported, or if it’s time to make it a guaranteed part of the language, binding for other implementations, starting from 3.11 or 3.12.

I have not found where the doc should be improved; arguments and parameters are discussed in glossary, library reference, etc. but I don’t know where is one complete, authoritative explanation of unpacked keyword params.

@merwok merwok added docs Documentation in the Doc dir 3.11 only security fixes 3.12 bugs and security fixes labels Aug 29, 2022
@merwok
Copy link
Member Author

merwok commented Aug 29, 2022

Docs referenced from the discussion: https://docs.python.org/3/reference/expressions.html#calls

Secondary issue that could be wrapped in this (from Guido):

positional-only arguments are still described as a CPython-only feature for builtins only.)

@gvanrossum
Copy link
Member

Note that the same issue shows up in a few other places, e.g. setattr() and friends allow any string as the attribute name (unless a specific class restricts the name in their __setattr__() method).

(There's also obj.__dict__ which usually returns a dict that does no type checking on keys, and similar for globals() and locals(). We could declare that those objects must support arbitrary strings but that limiting the key type to just strings would be acceptable.)

@JelleZijlstra
Copy link
Member

I think there's two potentially separate issues:

  1. Strings that aren't valid identifiers (e.g. keywords, strings with spaces in them). I don't see any reason not to accept these as keyword args; if anything it would be more expensive for an implementation to have to check whether each kwarg is a valid identifier.
  2. Instances of subclasses of str. Allowing this seems much riskier, since string subclasses could have all kinds of unexpected behaviors. It may be safer to specify that only instances of exactly str are required to be accepted as kwargs.

@ChrisBarker-NOAA
Copy link
Contributor

ChrisBarker-NOAA commented Aug 29, 2022

Using non-valid identifiers as attribute names via __dict__, **kwargs, getattr(), setattr() is being done in live production code right now, at least in cPython (https://pypi.org/project/netCDF4/ and I'm sure others). I have no idea what other implementations are doing, but I suspect it's the same, at least with PyPy. Given that, regardless of original intent -- I think it's better to define this as a language feature rather than an implementation detail.

As Guido pointed out, this is also relevant to __dict__, setattr(), getattr(), locals, globals (anywhere else?) -- but I just looked through the docs, and I can't see where it would fit -- it's seems it belongs in the Language Reference, but I can't find where to put it -- anyone have any suggestions?

I had suggested text something like:

"""
In all instances where a python name is specified/stored in a container or variable, the name must be a string, but may be a non-valid python identifier.

Examples:

dict and **kwargs may not have non-string keys, but may have non-valid strings as identifiers (e.g. "this-name").

getattr() and setattr() will accept only strings, but the strings to do not have to be valid identifiers. (e.g. setattr(obj, "this-name", value)
"""

But I have no idea where to put it in the docs.

@gvanrossum
Copy link
Member

  1. Instances of subclasses of str. Allowing this seems much riskier, since string subclasses could have all kinds of unexpected behaviors. It may be safer to specify that only instances of exactly str are required to be accepted as kwargs.

I agree we don't have to allow string subclasses. Right now CPython allows it, but the usage would appear niche. We can leave this unspecified if we want to.

@jeff5
Copy link
Contributor

jeff5 commented Aug 29, 2022

As Guido pointed out, this is also relevant to __dict__, setattr(), getattr(), locals, globals (anywhere else?) -- but I just looked through the docs, and I can't see where it would fit -- it's seems it belongs in the Language Reference, but I can't find where to put it -- anyone have any suggestions?

I had suggested text something like:

""" In all instances where a python name is specified/stored in a container or variable, the name must be a string, but may be a non-valid python identifier.

I found this too difficult to understand. What is a "python name" different from a "python identifier"? And what, every container of any kind?

I opted for documenting the behaviour in the specific context of keys in a mapping in a call. I offer to do the same for the name arguments in getattr(), setattr(), maybe __dict__. Naturally one would want the same conventions to apply (which they seem to) that any string will do, but you can't then expect a parameter (in a function signature) or member (in obj.member notation) to be possible with that name.

getattr() and setattr() will accept only strings, but the strings to do not have to be valid identifiers. (e.g. setattr(obj, "this-name", value) """

But I have no idea where to put it in the docs.

No indeed. And I wouldn't know where to find it, which recommended in my mind putting consistent specialisations of the rule in the docs for call, __dict__ and getattr().

@jeff5
Copy link
Contributor

jeff5 commented Aug 29, 2022

  1. Instances of subclasses of str. Allowing this seems much riskier, since string subclasses could have all kinds of unexpected behaviors. It may be safer to specify that only instances of exactly str are required to be accepted as kwargs.

I agree we don't have to allow string subclasses. Right now CPython allows it, but the usage would appear niche. We can leave this unspecified if we want to.

Yes, but can I suggest this might be important to standardise, one way or the other? In #94938 I broke CPython by violating the rules of hashable, but you can do some things with well-behaved sub-classes might be thought useful:

class C:
    pass

class IdWithUnit(str):
    def __eq__(self, other):
        s = self.split()[0]
        return s.__eq__(other)
    def __hash__(self):
        s = self.split()[0]
        return s.__hash__()

def f(a, **kwargs):
    return f'a={a}, kwargs={kwargs}'


awu = IdWithUnit("a ft")
assert(awu == "a")
assert(str(awu) == "a ft")

# An IdWithUnit functions as a keyword assignment
assert(f(**{awu: 42}) == "a=42, kwargs={}")

# Objects with __dict__ accept IdWithUnit as attribute names
c = C()
setattr(c, awu, 42)
assert(getattr(c, awu) == 42)

# The IdWithUnit attribute retains its full name
assert(IdWithUnit in map(type, c.__dict__.keys()))
assert("a ft" in map(str, c.__dict__.keys()))

# Use of an IdWithUnit establishes an attribute with the plain name
assert(c.a == 42)

@ChrisBarker-NOAA
Copy link
Contributor

I found this too difficult to understand. What is a "python name" different from a "python identifier"?

An "identifier" is what you can put in code:

https://docs.python.org/3/reference/lexical_analysis.html#identifiers

I think that's well defined.

A "name" is the thing we can use in **kwargs dict, or pass to get/setattr() or use as a key in a dict. I have no idea how to clearly define it -- maybe: "A key in a namespace mapping" -- but I can't say that's clear either.

And what, every container of any kind?

I used "container" to be generic -- in practice, it's usually a dict -- but that's certainly not part of the language definition -- maybe Mapping? But now that I think about it, it's an argument to setattr() -- not in a container at all :-(

I used those generic terms because it seems that this pops up in multiple places in the language, and currently (and I hope forever), the rule is consistent -- so it seemed better to define the consistent rule in one place, rather than, or in addition to, adding a note to the half a dozen places where it's relevant (and maybe miss some).

I'm getting way out my area of expertise here, but maybe the rules for valid identifiers only apply when parsing/evaluating code. Once the system is passed the parsing stage, they are simply strings, and any string will do. So maybe that could be the way to define this behavior?

@jeff5
Copy link
Contributor

jeff5 commented Aug 29, 2022

An "identifier" is what you can put in code:
https://docs.python.org/3/reference/lexical_analysis.html#identifiers
I think that's well defined.

Agreed. And I might say "in code without the quotes" because in the context of Python source, the scanner can pick identifiers out as lexemes.

A "name" is the thing we can use in **kwargs dict, or pass to get/setattr() or use as a key in a dict. I have no idea how to clearly define it -- maybe: "A key in a namespace mapping" -- but I can't say that's clear either.

Also agreed. ;)

I used "container" to be generic -- in practice, it's usually a dict -- but that's certainly not part of the language definition -- maybe Mapping? But now that I think about it, it's an argument to setattr() -- not in a container at all :-(

Ok, I think we're in agreement that while a single statement would be good, it is quite difficult to write and place. Being a bear of very little brain, I chose to start with the specific cases, even at risk of repetition. Maybe a common thread will emerge. It was only about 100 words for keywords. I don't think it's as much as half-a-dozen places.

@jeff5
Copy link
Contributor

jeff5 commented Aug 30, 2022

I'll offer a separate PR documenting the range of names acceptable in (get|set|del)attr. In fact, I currently think the dunder methods are the place to go into this, because that's where an object would constrain the available choices, but obj.__setattr__ doesn't.

Note that our current definition of attribute is "something you can address with a dot" so ... would pseudo-attribute be a good name for something you can't address with a dot but can with getattr(), if the object lets you.

I realised I had recently come across pseudo-attributes in a the pandas.DataFrame, where columns and rows may be addressed by dot notation, if labeled by identifiers, and by index notation ["my-name"] generally. On a quick check, these are accessible via getattr(df, "my-name").

@ChrisBarker-NOAA
Copy link
Contributor

@jeff5: Thanks, that sounds like a good plan -- though I'm not sure the __setattr__, etc docs are the place for it, as I suppose that can be overridden in any weird way anyone wants to -- IIUC, the enforcement is in the implementation of set/get/delattr -- before the dunders get called.

I'm not sure I like "pseudo-attributes", but don't have a better suggestion :-(

@merwok
Copy link
Member Author

merwok commented Aug 30, 2022

The main thing here is about unpacked keyword arguments; I think https://docs.python.org/3/reference/expressions.html#calls is the place to change.

Docs for getattr/setattr (functions, not magic methods) could be updated in the same PR.

@gvanrossum
Copy link
Member

We'll need to wait for the SC to pipe up before the **kwds PR can be merged. Once that's decided we'll use the SC guidance to decide how to document setattr() and friends.

@jeff5
Copy link
Contributor

jeff5 commented Aug 30, 2022

@merwok I was given the choice and thought separate was better because of the possibility of separately approving the PRs.

@ChrisBarker-NOAA wrote:

enforcement is in the implementation of set/get/delattr -- before the dunders get called.

Actually in PyObject_GetAttr. Let me try and write it. I'll be able to tell if I'm doing it wrong or you can say so later.

This is interesting to me because in the prototype Jython 3 the type of the __getattribute__ slot is (Object, String). This seemed safe because in the CPython code base there is no public API (I could find) by which that slot can be invoked with anything other than str. (It's only a prototype, of course.)

@ChrisBarker-NOAA
Copy link
Contributor

Actually in PyObject_GetAttr.

Thanks for the clarification, but in any case, before it gets to a user-defined __getattr__ -- so I think if someone writes a __getattr__ that can accept a non-string, there would be no way to use it with the "usual" methods.

@jeff5
Copy link
Contributor

jeff5 commented Aug 31, 2022

@ChrisBarker-NOAA

Actually in PyObject_GetAttr.

Thanks for the clarification, but in any case, before it gets to a user-defined __getattr__ -- so I think if someone writes a __getattr__ that can accept a non-string, there would be no way to use it with the "usual" methods.

I chose to document this in the functions, rather than the dunders, but __getattribute__ seemed worth a mention in the explanation.

I'm not sure I like "pseudo-attributes", but don't have a better suggestion :-(

I'm boldly suggesting in #96454 that they be considered attributes, even if dot notation cannot be used (for grammatical reasons).

@merwok
Copy link
Member Author

merwok commented Sep 20, 2022

SC ruling: python/steering-council#142 (comment)

The feature is fine, and there’s a distinction between attributes in the object model and names in the syntax that’s IMO clear and meaningful.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 22, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 22, 2022
gvanrossum pushed a commit that referenced this issue Sep 29, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 29, 2022
…honGH-96454)

Co-authored-by: C.A.M. Gerlach <[email protected]>
(cherry picked from commit 9a11ed8)

Co-authored-by: Jeff Allen <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 29, 2022
…honGH-96454)

Co-authored-by: C.A.M. Gerlach <[email protected]>
(cherry picked from commit 9a11ed8)

Co-authored-by: Jeff Allen <[email protected]>
miss-islington added a commit that referenced this issue Sep 29, 2022
Co-authored-by: C.A.M. Gerlach <[email protected]>
(cherry picked from commit 9a11ed8)

Co-authored-by: Jeff Allen <[email protected]>
miss-islington added a commit that referenced this issue Sep 29, 2022
Co-authored-by: C.A.M. Gerlach <[email protected]>
(cherry picked from commit 9a11ed8)

Co-authored-by: Jeff Allen <[email protected]>
serhiy-storchaka pushed a commit to serhiy-storchaka/cpython that referenced this issue Oct 2, 2022
pablogsal pushed a commit that referenced this issue Oct 22, 2022
Co-authored-by: C.A.M. Gerlach <[email protected]>
(cherry picked from commit 9a11ed8)

Co-authored-by: Jeff Allen <[email protected]>
pablogsal pushed a commit that referenced this issue Oct 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 bugs and security fixes docs Documentation in the Doc dir
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants