-
-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Forward type references are used to refer to types that have not yet been defined. They are implemented as strings which must be evaluated later to compute the actual type declaration.
Whenever anything is eval
ed, it gets done within the context of a specific pair of global and local namespaces.
When computing a schema for a dataclass, marshmallow_dataclass
needs access to the final resolved type hints for the dataclass' attributes.
In most cases, this is easy: we call typing.get_type_hints
to compute those type hints.
The only tricky part is that for type references to be properly resolved we often need to provide the correct local namespace to get_type_hints
for it to use when resolving those references.
When resolving the type references of class attributes, we must use the local namespace from the scope where the class was defined. Unfortunately, there appears to be no sure-fire way to find that scope or that namespace.
Currently, our class_schema
defaults to using the caller's locals. Often that's the right thing — that is, often the dataclass was created in the same scope that calls class_schema
, or, alternatively, the dataclass is was created in module scope (where the local and global namespaces are one and the same), so if we assume the "wrong" locals, it doesn't really hurt (unless there's a name conflict the reference will be resolved correctly through the module globals).
E.g. using the caller’s local namespace does work here: def f(): @dataclasses.dataclass class A: b: "B"
@dataclasses.dataclass
class B:
x: int
MySchema = marshmallow.class_schema(A)
But using the caller's locals doesn't always work. Consider this case:
```py
def f() -> None:
@dataclasses.dataclass
class A:
b: "B"
@dataclasses.dataclass
class B:
a: "A"
def g():
MySchema = marshmallow_dataclass.class_schema(A)
print(locals())
g()
This currently doesn't work ("NameError: name 'B' is not defined
"). We need the locals from f
to resolve the type references, but class_schema
uses the locals from g
.
We can fix it by explicitly passing the frame of the function whose locals should be used:
def f() -> None:
@dataclasses.dataclass
class A:
b: "B"
@dataclasses.dataclass
class B:
a: "A"
def g():
MySchema = marshmallow_dataclass.class_schema(A, clazz_frame=inspect.currentframe().f_back)
print(locals())
g()
but that’s pretty ugly. And dealing with frames is fraught with booby traps waiting to cause large memory leaks.
But Wait, It Gets Worse!
The marshmallow_dataclass.dataclass
decorator sticks the computed schema into the .Schema
attribute of the dedicated class.
Except that in a case with a forward reference
@marshmallow_dataclass.dataclass
class A:
b: "B"
#... definition of class B follows
at the time class A
is decorated class B
has not yet been defined. No matter what locals we use it is, in general, not possible to resolve the forward reference at class-decoration time.
So, currently, we store the caller’s frame in an attribute descriptor which waits until the .Schema
attribute is first accessed to call class_schema
and compute the schema.
This comes with the same problem discussed above — how to pick the correct frame for a local namespace — but, worse still, it also means we have to hold a reference to a stack frame for potentially quite a while.
So, we'd feel much better if instead of holding onto the whole frame (which includes all the parent frames) we could just hold a reference to the frames
f_locals
dict.It turns out that doesn't work. It has something to do with "fast locals" (a CPython implementation detail). Not all of a frame’s locals (perhaps none of them) are actually stored in the
f_locals
dict. Instead they are stored on the frame in some other, presumably more optimized, way as a "fast local". When one accesses the local dict viaframe.f_locals
(or by callingbuiltin.locals()
), any fast locals in scope are copied to the local namespace dict before it is returned, sof_locals
give a complete view of the local namespace at the time thef_locals
attribute was accessed. Any fast locals which are created or modified after thef_locals
attribute was accessed, however, do not automatically make it into the locals dict — until next timef_locals
is accessed (orlocals()
is called.)def f(): localns = locals() x = 2 print("x" in localns) locals() print("x" in localns) f()For me, this prints
False True
both under CPython and pypy.
So, it appears that if we want access to any locals which may be defined in the future, we have to hold a reference to the frame, not just the locals dict.
It appears that all of this may be changing (PEP558) in the timeframe of Python 3.12 so that
locals()
returns a read-only dict snapshot (I think) of the function locals, but thatframe.f_locals
returns a legitimate read/write proxy to the fast locals. In that case, we could just hold on to anf_locals
proxy, but that proxy, I assume, would still have a reference to the whole frame.
Here are some API changes that could clean things up.
(At the same time, we might as well add the ability to pass an explicit globalns
parameter. It's not so generally useful, but may be useful in edge cases.)
Passing the local namespace to use for reference resolution explicitly is significantly cleaner than passing a stack frame.
- The locals are the only part of the frame we are interested in
- Dealing with frames is risky, memory-leak-wise. So long as we do our part right, we can implement
class_schema
without leaking any frame references, but in accepting a frame parameter as we now do, we encourage the user to engage in the fraught practice of dealing with frame references. - The only thing we do with the local namespace is pass it to
typing.get_type_hints
.Get_type_hints
acceptslocalns
andglobalns
parameters to specify the namespaces used to resolve forward references. Matching that API exactly gives our users maximum flexibility in controlling type reference resolution.
We can add those parameters in a backward-compatible way, but I would suggest dropping support for the clazz_frame
parameter.
(I suppose it could be deprecated for a period, but I wonder if anyone is actually using it. Clazz_frame
is not currently documented except in the docstring for class_schema
.)
The way we default to using the caller’s locals to resolve references is not always correct.
Consider
@dataclasses.dataclass
class A:
b: "B"
@dataclasses.dataclass
class B:
x: int
def f() -> None:
@dataclasses.dataclass
class B:
y: str
MySchema = marshmallow_dataclass.class_schema(A)
This currently produces an incorrect schema. The reference to "B"
gets resolved incorrectly to f
’s f.<locals>.B
, when it should be resolved to the module-level B
.
I think it would be better/safer/less-surprising to default to localns=None
unless the user explicitly specifies a value. That is sufficient (and correct) when dealing with dataclasses defined at the module level. (When passed localns=None
, since python 3.10 typing.get_type_hints
uses the class' __dict__
for the locals; prior to python 3.10, passing localns=None
resulted in no locals being used during reference resolution. Globalns
defaults to the class' module globals, which for module-level classes is all that is required for correct type resolution.)
In cases where a local namespace is required, we would require it to be passed explicitly:
def f():
@dataclasses.dataclass
class A:
b: "B"
@dataclasses.dataclass
class B:
a: "A"
ASchema = marshmallow_dataclass.class_schema(A, localns=locals())
This is a breaking change, however.
As a (perhaps interim) alternative we could allow for passing an explicit localns=None
to disable using the caller’s namespace during type resolution.
We could deprecate the current practice by using a heuristic scheme to determine when users are relying on the current behavior, and issue a DeprecationWarning
in that case. One scheme by which dependence on current behavior could be detected is: do a test call of get_type_hints(cls, localns=None)
; if that throws a NameError
then issue the DeprecationWarning
.
Our custom dataclass
decorator really does nothing over the stock dataclasses.dataclass
decorator other than adding the .Schema
attribute to the decorated class. One could easily live without this convenience. It's only one extra line of code to compute the schema using class_schema
.
As noted earlier, since forward type references in the class may refer to types that are yet to be defined, class-decoration time is generally not the correct time to be calling class_schema
. Supporting this case requires us to hold a reference to the caller's frame in the .Scheme
attribute descriptor for a somewhat arbitrary amount of time. This is not great.
My preference would be to deprecate the dataclasses.dataclass
decorator altogether.
An alternative would be to deprecate its use in cases where the local namespace is required for type resolution. Then it could still be used on module-level dataclasses, and on dataclasses without type references.
More things to ponder...
How does PEP563 (Postponed evaluations of annotations: from __future__ import annotations
) and (whatever postponed evaluation happens in python 3.10) fit into this?
Should we be using inspect.get_annotations in python 3.10 rather than typing.get_type_hints
? (Ref: Annotations Best Practices)
The future: PEP649 looks pretty slick and maybe the (at some point) solution to the headache.
Here’s a deep and probably elucidating blog post which says PEP649 is not the answer: https://lukasz.langa.pl/61df599c-d9d8-4938-868b-36b67fdb4448/
In short, it is suggested:
- Add
globalns
andlocalns
parameters toclass_schema
(and remove or deprecate theclazz_frame
parameter). - Work towards requiring explicit specification of the local namespace to
class_schema
when required: change the default behavior of from assuming the caller's local namespace to assuming no local namespace.
- Deprecate its use entirely. Or perhaps just deprecate its use in cases where the local namespace is required for type resolution.