[feature request] make the toml encoder extensible #11

miccoli · 2020-07-20T23:13:39Z

The standard library json encoder is extensible, a feature that I find very handy:

import json
from pathlib import Path
import numpy as np

class MyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, Path):
            return obj.as_posix()
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        # Let the base class default method raise the TypeError
        return super().default(obj)

so that

>>> json.dumps(Path("foo") / "bar", cls=MyEncoder)
'"foo/bar"'
>>> json.dumps(np.zeros((2,2,3)), cls=MyEncoder)
'[[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]], [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]]'

What about implementing a similar protocol also in qtoml?

The text was updated successfully, but these errors were encountered:

alethiophile · 2020-07-25T18:54:25Z

I've checked in a first version of this feature in commit 2199128. Does this look like it will fit your use cases?

miccoli · 2020-07-27T12:38:28Z

It is not working as expected.

Running

import os
import pathlib
import qtoml

class MyEncoder(qtoml.TOMLEncoder):
    """extended encoder """

    def default(self, val):
        """local encoder"""
        print("*****", repr(val))
        if isinstance(val, os.PathLike):
            return str(val)

        return super().default(val)

x = {"path": pathlib.Path("ABC.txt")}
x1 = {"path": "ABC.txt"}

print(qtoml.dumps({"top": x}, cls=MyEncoder))
print(qtoml.dumps({"top": x1}, cls=MyEncoder))

I have

***** {'top': {'path': PosixPath('ABC.txt')}}
***** {'path': PosixPath('ABC.txt')}
***** PosixPath('ABC.txt')
path = 'ABC.txt'

***** {'top': {'path': 'ABC.txt'}}
***** {'path': 'ABC.txt'}
***** 'ABC.txt'
[top]
path = 'ABC.txt'

I would expect this output:

***** PosixPath('ABC.txt')
[top]
path = 'ABC.txt'

[top]
path = 'ABC.txt'

The problems I see are:

qtoml.dumps({"top": x}, cls=MyEncoder) returns a wrong value (the top section is missing).
default should be called only on non encodble objects and not on all step of parsing down from the top object

alethiophile · 2020-07-27T15:29:52Z

The first issue is a straightforward logic error, which should be fixed in 24054b1.

I'm not sure what to think about the second. The json module acts as you describe, only calling the default method in the case of a non-encodable object. On the other hand, calling it for every level seems to enable more use cases; you can imagine a situation where you want to tweak the data on a higher level even though the original data is still technically encodable. The downside, of course, is that it hurts performance to call that method more often than is needed.

I'll think about this and maybe do some profiling to determine the extent of the performance hit.

miccoli · 2020-07-27T17:30:08Z

I'm not sure what to think about the second. The json module acts as you describe, only calling the default method in the case of a non-encodable object.

I agree that your proposed protocol is more versatile than the one of the json module, and probably you will not notice a big performance degradation.

My only concern is that users accustomed with the standard lib behaviour could be surprised by the new protocol. Maybe a different method name (like filter or map) could be a better choice to outline the different semantics. But this is only a minor point, if the method is well documented.

miccoli · 2020-07-27T19:36:51Z

I found another path which leads to unexpected results (tested with 24054b1):

from pathlib import Path
import qtoml

class MyEncoder(qtoml.TOMLEncoder):
    """extended encoder """

    def default(self, val):
        """local encoder"""
        if isinstance(val, os.PathLike):
            return str(val)

        return super().default(val)

ex1 = {Path("path1"): Path("path2")}

print(qtoml.dumps(ex1, cls=MyEncoder))

which errors with qtoml.encoder.TOMLEncodeError: dictionary keys must all be strings, but I would expect

path1 = 'path2'

alethiophile · 2020-07-27T20:17:22Z

default() only gets called separately on the values, not the keys. (This is consistent with the json behavior.)

You can use the default method here, but it has to be explicit:

class MyEncoder(qtoml.TOMLEncoder):
    def default(self, val):
        if isinstance(val, dict):
            return { str(k): v for k, v in val }
        elif isinstance(val, os.PathLike):
            return str(val)
        return super().default(val)

This rectifies the keys at the stage above, when default() is called on the dict as a value. (Supporting cases like this is one reason I wanted to call default() on every level of the tree, rather than just unsupported types.)

As I think about this, though, I think it's probably a good idea to use a different protocol than default(). It's good to maintain interface compatibility with JSONEncoder, but it's also a protocol that's really designed to be used only for default objects, rather than for every level of the parse tree. I'll have to think about what design is best here.

(Also, a naive implementation of the default() protocol almost doubles time spent on small encoding calls, and I suspect a different protocol could be more amenable to optimization.)

miccoli · 2020-07-28T21:29:16Z

default() only gets called separately on the values, not the keys. (This is consistent with the json behavior.)

you're right... also json enforces strictly the requirement that dict keys should be str (I missed this point).

I have no strong opinion on the best protocol... I proposed json's default() just because I'm using it often for aggregate types (like numpy arrays) which have a "natural" JSON encoding but are not one of the basic types.

miccoli mentioned this issue Jul 22, 2020

Support UserDict and friends #10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] make the toml encoder extensible #11

[feature request] make the toml encoder extensible #11

miccoli commented Jul 20, 2020

alethiophile commented Jul 25, 2020 •

edited

Loading

miccoli commented Jul 27, 2020

alethiophile commented Jul 27, 2020

miccoli commented Jul 27, 2020

miccoli commented Jul 27, 2020

alethiophile commented Jul 27, 2020

miccoli commented Jul 28, 2020

[feature request] make the toml encoder extensible #11

[feature request] make the toml encoder extensible #11

Comments

miccoli commented Jul 20, 2020

alethiophile commented Jul 25, 2020 • edited Loading

miccoli commented Jul 27, 2020

alethiophile commented Jul 27, 2020

miccoli commented Jul 27, 2020

miccoli commented Jul 27, 2020

alethiophile commented Jul 27, 2020

miccoli commented Jul 28, 2020

alethiophile commented Jul 25, 2020 •

edited

Loading