Replies: 1 comment
-
Nice write-up @mbercx . As you say, this is not an easy topic especially when you have to ensure you cover all bases. I have just given it a read and wanted to quickly write down my first thought before I forget. Likewise these are not complete and are subject to change 😅 I agree that the current design leads to surpising behavior for users, although I really think this mostly applies to I think, then, that the proposed solution will most likely not be immune to these problems either, and will have quirks and pitfalls of its own that will stump users. d = Dict({'a': 1, 'b': 2})
d['a'] = 5
print(d['a']) What would this print? A user would probably expect One thing I definitely agree with is that As for using |
Beta Was this translation helpful? Give feedback.
-
The Problem ❗
The fact that the behaviour for data type nodes is dependent on whether the node is stored or not can lead to confusing and downright erroneous behaviour. An example I encountered recently can be found in #5970. Another discussion that illustrates the problems with mutability when stored or not is #5297.
This concept is also often difficult to explain to new users: "Why is the behaviour of this node suddenly different"? "Why can't I just change an input in this dictionary and run again?". We often have to explain the concept of stored nodes several times before a user is on board, and even then it often leads to errors or frustration, especially when the expected behaviour fails to hold up silently as above.
So I have the following radical suggestion: what if we simply posit that AiiDA nodes are all immutable objects, once and for all?
First thoughts 💭
The idea would be that every data node instance can be fully defined using its constructor (I'm looking at you,
KpointsData
). Once defined, there are no methods on the node that can change its contents. We basically can follow astr
-like approach: it can have methods that sort of sound like they mutate the contents, but these always return a new instance, e.g.:We should then consider which methods we want to keep. For
StructureData
, I would remove almost everything. We don't need to implement methods that are already implemented elsewhere for very similar classes. Again: an AiiDA data node is simply a data container: you put something in it, and need useful methods to obtain the data it stores. Other behaviour might be trickier, the primordial examples to consider areList
andDict
, e.g.:We could make such operation raise, or possibly return a new
Dict
node. I'm currently leaning towards simply not implementing any method that mutates the object, but I'm not sure if I can defend that approach if we start thinking about the consequences in more detail. ^^ Some might argue that we instead were planning to move close to the base Python type (e.g. #2676), but I think we all agree that the AiiDA base types will never exactly replicate the behaviour of the base Python ones. The fact that we semi-advertise the nodes as such again leads to confusion and frustration, so we should move away from that and clearly state that they are not the same. Perhaps even consider renaming them (e.g.DictData
,ListData
). That may be going too far, but I'm just spit-balling at this point (EDIT: On the other hand, it would make backwards-compatibility easier, see below).Other considerations 🤔
If we are going to look at redesigning the data nodes, which has already long been on the agenda for the materials-science types, we should consider a few other principles in the design:
We should consider making our nodes
pydantic
BaseModel
s, and do away with all these*args
and**kwargs
in the constructors and methods. Although documentation is great, users should really be able to see clearly how to use a certain node type after loading it by looking at the signatures of the methods with some good type argument names, type hinting and docstrings. There may be other reasons not to usepydantic
, I'm not set on that, just have been using it a lot lately and now am seeingpydantic
-opportunities everywhere. 😅The constructor should have no more arguments than strictly necessary to define the node instance. Notorious examples I'm not a fan of are the
ase
andpymatgen
(and even more egregious,pymatgen_structure
andpymatgen_molecule
) input arguments of the StructureData constructor. Although being able to convert instances of these classes is essential, it should in my view be done throughfrom_pymatgen()
-likeclassmethod
s and something likeas_pymatgen()
to obtain the correspondingpymatgen
Structure
instance. One aspect ofpydantic
is that it sort of enforces this. It's also much easier to get good constructor type hinting, especially for classes with a convoluted hierarchy, but I've already raved about this here so I won't repeat myself on that front.But what about backwards-compatibility? 😱
Of course, changing the way these fundamental data types work is about as backwards-incompatible as you can get, and might require a lot of code changes in plugins. So we'll have to take care in how to approach this. There is some good news though:
aiida-atomistic
(mostly, see Prepare the move of material science specific data classes to separate plugin #2686). That would be an ideal opportunity to make sure all the new versions are immutable at all times. Plugin developers can then move to the new data types at their leisure (but there are still many technical considerations for sure.Int
,Float
,Str
,Bool
obviously come to mind. The codes as well, but I'm actually a bit of a hypocrite on these (see below). Most of the others I'd frankly have to try to be sure, but this post is getting long enough as is.aiida-upgrade
. You can't fix everything with static analysis, obviously, but if you are a bit rash in making assumptions and let the code developers decide on which automated changes make sense would go a long way towards getting all plugins up to date (even still for v2.X, frankly).You're a total hypocrite! 🤬
It is true. I have been known to argue that codes and computers should be editable since so many starting users have trouble with this (see aiidateam/team-compass#12). But I just noticed that @ltalirz actually has suggested a very nice comprimise there, so maybe I can have my 🍰 and eat it too.
If you've gotten this far: kudos! For those just skimming: let me sum it up.
TL;DR
pydantic
BaseModel
s.StructureData
/ase.Atoms
/pymatgen.Structure) should be done via
from_Xand
as_X` methods.Beta Was this translation helpful? Give feedback.
All reactions