Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] make the toml encoder extensible #11

Open
miccoli opened this issue Jul 20, 2020 · 7 comments
Open

[feature request] make the toml encoder extensible #11

miccoli opened this issue Jul 20, 2020 · 7 comments

Comments

@miccoli
Copy link
Contributor

miccoli commented Jul 20, 2020

The standard library json encoder is extensible, a feature that I find very handy:

import json
from pathlib import Path
import numpy as np

class MyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, Path):
            return obj.as_posix()
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        # Let the base class default method raise the TypeError
        return super().default(obj)

so that

>>> json.dumps(Path("foo") / "bar", cls=MyEncoder)
'"foo/bar"'
>>> json.dumps(np.zeros((2,2,3)), cls=MyEncoder)
'[[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]], [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]]'

What about implementing a similar protocol also in qtoml?

@alethiophile
Copy link
Owner

alethiophile commented Jul 25, 2020

I've checked in a first version of this feature in commit 2199128. Does this look like it will fit your use cases?

@miccoli
Copy link
Contributor Author

miccoli commented Jul 27, 2020

It is not working as expected.

Running

import os
import pathlib
import qtoml

class MyEncoder(qtoml.TOMLEncoder):
    """extended encoder """

    def default(self, val):
        """local encoder"""
        print("*****", repr(val))
        if isinstance(val, os.PathLike):
            return str(val)

        return super().default(val)

x = {"path": pathlib.Path("ABC.txt")}
x1 = {"path": "ABC.txt"}

print(qtoml.dumps({"top": x}, cls=MyEncoder))
print(qtoml.dumps({"top": x1}, cls=MyEncoder))

I have

***** {'top': {'path': PosixPath('ABC.txt')}}
***** {'path': PosixPath('ABC.txt')}
***** PosixPath('ABC.txt')
path = 'ABC.txt'

***** {'top': {'path': 'ABC.txt'}}
***** {'path': 'ABC.txt'}
***** 'ABC.txt'
[top]
path = 'ABC.txt'

I would expect this output:

***** PosixPath('ABC.txt')
[top]
path = 'ABC.txt'

[top]
path = 'ABC.txt'

The problems I see are:

  • qtoml.dumps({"top": x}, cls=MyEncoder) returns a wrong value (the top section is missing).
  • default should be called only on non encodble objects and not on all step of parsing down from the top object

@alethiophile
Copy link
Owner

The first issue is a straightforward logic error, which should be fixed in 24054b1.

I'm not sure what to think about the second. The json module acts as you describe, only calling the default method in the case of a non-encodable object. On the other hand, calling it for every level seems to enable more use cases; you can imagine a situation where you want to tweak the data on a higher level even though the original data is still technically encodable. The downside, of course, is that it hurts performance to call that method more often than is needed.

I'll think about this and maybe do some profiling to determine the extent of the performance hit.

@miccoli
Copy link
Contributor Author

miccoli commented Jul 27, 2020

I'm not sure what to think about the second. The json module acts as you describe, only calling the default method in the case of a non-encodable object.

I agree that your proposed protocol is more versatile than the one of the json module, and probably you will not notice a big performance degradation.

My only concern is that users accustomed with the standard lib behaviour could be surprised by the new protocol. Maybe a different method name (like filter or map) could be a better choice to outline the different semantics. But this is only a minor point, if the method is well documented.

@miccoli
Copy link
Contributor Author

miccoli commented Jul 27, 2020

I found another path which leads to unexpected results (tested with 24054b1):

from pathlib import Path
import qtoml

class MyEncoder(qtoml.TOMLEncoder):
    """extended encoder """

    def default(self, val):
        """local encoder"""
        if isinstance(val, os.PathLike):
            return str(val)

        return super().default(val)

ex1 = {Path("path1"): Path("path2")}

print(qtoml.dumps(ex1, cls=MyEncoder))

which errors with qtoml.encoder.TOMLEncodeError: dictionary keys must all be strings, but I would expect

path1 = 'path2'

@alethiophile
Copy link
Owner

default() only gets called separately on the values, not the keys. (This is consistent with the json behavior.)

You can use the default method here, but it has to be explicit:

class MyEncoder(qtoml.TOMLEncoder):
    def default(self, val):
        if isinstance(val, dict):
            return { str(k): v for k, v in val }
        elif isinstance(val, os.PathLike):
            return str(val)
        return super().default(val)

This rectifies the keys at the stage above, when default() is called on the dict as a value. (Supporting cases like this is one reason I wanted to call default() on every level of the tree, rather than just unsupported types.)

As I think about this, though, I think it's probably a good idea to use a different protocol than default(). It's good to maintain interface compatibility with JSONEncoder, but it's also a protocol that's really designed to be used only for default objects, rather than for every level of the parse tree. I'll have to think about what design is best here.

(Also, a naive implementation of the default() protocol almost doubles time spent on small encoding calls, and I suspect a different protocol could be more amenable to optimization.)

@miccoli
Copy link
Contributor Author

miccoli commented Jul 28, 2020

default() only gets called separately on the values, not the keys. (This is consistent with the json behavior.)

you're right... also json enforces strictly the requirement that dict keys should be str (I missed this point).

I have no strong opinion on the best protocol... I proposed json's default() just because I'm using it often for aggregate types (like numpy arrays) which have a "natural" JSON encoding but are not one of the basic types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants