Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-92678: Document Py_TPFLAGS_MANAGED_DICT and friends #95440

Closed
wants to merge 4 commits into from

Conversation

Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented Jul 29, 2022

@encukou
Copy link
Member

encukou commented Aug 1, 2022

I'll dive into the code tomorrow but I'll write down the questions that come to mind today. Hopefully they'll be easy to answer.

What does it mean for __dict__ to be “managed”? CPython will take care of allocating and creating the dict, but not of deallocating or traversing it? What else about “management” is there to to think about?

If a type with Py_TPFLAGS_MANAGED_DICT derives from another type that has Py_TPFLAGS_MANAGED_DICT, whose responsibility is it to call _PyObject_ClearManagedDict/_PyObject_VisitManagedDict?
What about types derived from a class with managed dict (and the clear/visit calls) using class – will they be handled correctly?

@Fidget-Spinner
Copy link
Member Author

I'll dive into the code tomorrow but I'll write down the questions that come to mind today. Hopefully they'll be easy to answer.

What does it mean for __dict__ to be “managed”? CPython will take care of allocating and creating the dict, but not of deallocating or traversing it? What else about “management” is there to to think about?

Yes. The managed just means previously to set __dict__ you'd have to set a slot in PyMemberDef or something similar. Allowing CPython to manage it means CPython can place the __dict__ wherever it wants. In this case it's 3 PyObject * behind the object head (it's 3 because of the GC info preceding it).

If a type with Py_TPFLAGS_MANAGED_DICT derives from another type that has Py_TPFLAGS_MANAGED_DICT, whose responsibility is it to call _PyObject_ClearManagedDict/_PyObject_VisitManagedDict?

The type that is being derived from.

What about types derived from a class with managed dict (and the clear/visit calls) using class – will they be handled correctly?

I'm not sure about inheritance. Maybe @markshannon knows. However, AFAICS, any type that sets the Py_TPFLAGS_MANAGED_DICT and constructs itself using type (I don't remember if __build_class__ is included) will automatically have the correct functions to clean up the dict.

@markshannon
Copy link
Member

On reflection, maybe MANAGED is not the best choice of adjective, although I don't know what would be.

One important thing to document is that object creation and deallocation are handled by the VM, but visiting, initialization and clearing are the responsibility of the extension.

Consequently, we can create the "managed" dictionary automatically during object creation, or lazily on first access, and we can also de-allocate it automatically, we rely on the extension to visit and/or clear it. Hence the need for #95246.

@markshannon
Copy link
Member

Regarding inheritance, semantically inheritance only requires that a class has the __dict__ attribute if one of their base classes has the __dict__ attribute. The "management" of that __dict__ is an implementation detail of the class.

Therefore is perfectly OK for a class to have a "legacy" __dict__ (tp_dictoffset != 0) even if one of its base classes has a "managed" __dict__.

It is all down to memory layout, the C code in superclasses should work correctly on subclasses, so we need to preserve the layout of extension superclasses.

For example, class M has a managed dict, and class L has the a "legacy" __dict__:

class M:
     pass
class L(types.ModuleType):
    pass

>>> M.__flags__ & Py_TPFLAGS_MANAGED_DICT
16
>>> L.__flags__ & Py_TPFLAGS_MANAGED_DICT
0

If we inherit from both, then the new class has a "legacy" __dict__.

class C(L, M):
    pass

>>> C.__flags__ & Py_TPFLAGS_MANAGED_DICT
0

Things would be much simpler if all objects with __dict__s had "managed" dicts, but that won't happen for a while yet.

Comment on lines +1930 to +1933
* CPython no longer assumes that a ``__dict__`` exists after an object. C extension
types with inheritance and ``__dict__`` may have MRO problems. To fix this, allow
CPython to manage the ``__dict__`` of instances of your type by setting
:const:`Py_TPFLAGS_MANAGED_DICT` in :c:member:`~PyTypeObject.tp_flags`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really raises more questions than it answers. In what cases did CPython assume a __dict__ exists after an object? (Didn't it put it there itself before 3.11?) What does it mean for a type to be “with inheritance”? What kind of MRO problems can you fix with the flag?

@encukou
Copy link
Member

encukou commented Aug 2, 2022

If you inherit from C and L using PyType_FromSpecWithBases, should you specify the Py_TPFLAGS_MANAGED_DICT flag, and should you call _PyObject_*ManagedDict if you override the relevant slots?

If we inherit from both, then the new class has a "legacy" dict.

Ah, and the newly exposed _PyObject_*ManagedDict functions will do nothing if Py_TPFLAGS_MANAGED_DICT is not set, right?


If you override tp_traverse to call _PyObject_VisitManagedDict as this PR suggests, but then subclass the result in Python code, it seems the subclass would get subtype_traverse which calls _PyObject_VisitManagedDict and also calls base->tp_traverse, which would visit the managed dict again. AFAIK multiple visits are bad. It seems that if base has Py_TPFLAGS_MANAGED_DICT set, subtype_traverse should leave the visiting to base->tp_traverse?
I guess I'll need to test this.

I see _PyObject_ClearManagedDict is idempotent, so this isn't a problem there.
But: shouldn't you also call _PyObject_ClearManagedDict from tp_dealloc if you override that? tp_clear is only called if it's needed for breaking cycles; when an object is DECREF'd to 0 CPython will go straight to tp_dealloc.

@pablogsal
Copy link
Member

pablogsal commented Aug 2, 2022

AFAIK multiple visits are bad.

They are: they will mess up with the gc algorithm. When the gc starts decrementing the gc references to find cycles, it will decrement the references twice if the object is visited too many times. The contract is that it has to be visited once per strong reference.

@markshannon
Copy link
Member

If you inherit from C and L using PyType_FromSpecWithBases

What does that mean? There is no such thing as inheritance in C.

There has to be an assumption that if you are doing fancy stuff with classes implemented in C, that you know what you are doing.
If you don't, then use Cython or some other tool.

I'm beginning to think that the documentation for Py_TPFLAGS_MANAGED_DICT should be "Don't use this", and no more.

If Cython, pybind11 or mypyc can figure out how to use it safely, then good on them, but we shouldn't be encouraging its use.

For 3.12 we'll add a better API, but it's way too late for 3.11.

@encukou
Copy link
Member

encukou commented Aug 2, 2022

If you inherit from C and L using PyType_FromSpecWithBases

What does that mean?

Using them as as bases.

There has to be an assumption that if you are doing fancy stuff with classes implemented in C, that you know what you are doing.
If you don't, then use Cython or some other tool.

Sorry, but it seems that there are very few people who can use Py_TPFLAGS_MANAGED_DICT correctly, and no one from them wants to document the assumptions around it. That's a pretty bad situation for CPython to be in, honestly.

@markshannon
Copy link
Member

Sorry, but it seems that there are very few people who can use Py_TPFLAGS_MANAGED_DICT correctly

So don't use it. No one needs to use it.

@encukou
Copy link
Member

encukou commented Aug 2, 2022

So don't use it. No one needs to use it.

So what's the suggestion for pybind11/mypyc? Does #92678 (comment) no longer apply, so we need to restart that discussion?

@encukou
Copy link
Member

encukou commented Aug 2, 2022

The public docs for tp_dictoffset are wrong: the value is meaningless if Py_TPFLAGS_MANAGED_DICT is set. Is that right?

@markshannon
Copy link
Member

The public docs for tp_dictoffset are wrong: the value is meaningless if Py_TPFLAGS_MANAGED_DICT is set. Is that right?

This docs are still correct for 3.11.
We've maintained semantics for 3.11 by only setting Py_TPFLAGS_MANAGED_DICT if tp_itemsize == 0 and setting tp_dictoffset to -sizeof(struct ...) - 3 * sizeof(PyObject *).
But we always check for Py_TPFLAGS_MANAGED_DICT internally.

For 3.12 I'd like to set tp_dictoffset to -1 if Py_TPFLAGS_MANAGED_DICT in order to fail fast as we have a more complex per-header and we can't safely use tp_dictoffset directly.

@markshannon
Copy link
Member

So what's the suggestion for pybind11/mypyc?

That's up to them.
If they can produce valid C API code, that fails on 3.11, then that's a bug and I want to fix it.
If they can't produce valid C API code, then that's their problem.

@encukou
Copy link
Member

encukou commented Aug 2, 2022

We've maintained semantics for 3.11 by only setting Py_TPFLAGS_MANAGED_DICT if tp_itemsize == 0 and setting tp_dictoffset to -sizeof(struct ...) - 3 * sizeof(PyObject *).

But now you can't expect to find a valid PyDictObject * at that offset, right? You now need to go call the undocumented/underscored _PyObject_GetDictPtr.

@markshannon
Copy link
Member

But now you can't expect to find a valid PyDictObject * at that offset, right? You now need to go call the
undocumented/underscored _PyObject_GetDictPtr.

You will find a valid PyDictObject * pointer, but it might be NULL even though there are values.
To get meaningful pointer, yes you'll need to use _PyObject_GetDictPtr

However, it appears that no one uses it, because what would you do with a pointer to the dictionary?
If you need an attribute on object, you'll need to call PyObject_GetAttr() to handle descriptors and such.

Cython does access tp_dictoffset but falls back on _PyObject_GetDictPtr if tp_dictoffset is negative.
But Cython mostly ignores the C-API, so there's not much we can do about that.

@markshannon markshannon removed the needs backport to 3.11 only security fixes label Aug 4, 2022
@Fidget-Spinner
Copy link
Member Author

We don't need to document this anymore right?

@encukou
Copy link
Member

encukou commented Aug 4, 2022

However, it appears that no one uses it, because what would you do with a pointer to the dictionary?

Print it out, pickle it, update() it, bypass a descriptor, ...
I'd be surprised if no one found a use for it. I wouldn't be surprised if it turned out that once upon a time this was the API to get a dict.
That's why I'd prefer if searching NEWS/What's New for tp_dictoffset gave you _PyObject_GetDictPtr or PyObject_GenericGetDict :)

We don't need to document this anymore right?

I don't think so. 3.11 will get a revert and 3.12 should get a better API. Closing, please reopen if you disagree.

@encukou encukou closed this Aug 4, 2022
@Fidget-Spinner Fidget-Spinner deleted the doc_gc branch August 4, 2022 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review docs Documentation in the Doc dir skip news
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants