如何将树类对象结构序列化为json文件格式?

时间:2022-08-23 08:18:56

Given the code sample below, how can I serialize these class instances with JSON using Python 3?

鉴于下面的代码示例,如何使用Python 3使用JSON序列化这些类实例?

class TreeNode():
    def __init__(self, name):
        self.name = name
        self.children = []

When I try to do a json.dumps I get the following error:

当我尝试做一个json.dumps时,我收到以下错误:

TypeError: <TreeNode object at 0x7f6sf4276f60> is not JSON serializable

TypeError:<0x7f6sf4276f60>的 不是JSON可序列化的 对象>

I was then able to find that if I set the default to json.dumps to return a __dict__ I could serialize it fine but then doing a json.loads becomes an issue.

我当时能够发现,如果我将默认设置为json.dumps以返回__dict__,我可以将其序列化,但是然后执行json.loads会成为一个问题。

I can find a lot of custom encoder/decoder examples with basic strings but none where there is a list, in this case self.children. The children list will hold child nodes and their children other nodes. I need a way to get all of it.

我可以找到许多带有基本字符串的自定义编码器/解码器示例,但是没有列表,在这种情况下是self.children。子列表将保存子节点及其子节点的其他节点。我需要一种方法来获得所有这些。

2 个解决方案

#1


4  

Since you're dealing with a tree structure, it's natural to use nested dictionaries. The following creates a subclass of dict and uses itself as the underlying __dict__ of the instance — which is an interesting and useful trick I've run across in many different contexts:

由于您正在处理树结构,因此使用嵌套字典是很自然的。下面创建了一个dict的子类,并将自己用作实例的底层__dict__ - 这是我在许多不同的上下文中遇到的一个有趣且有用的技巧:

     Is it preferable to return an anonymous class or an object to use as a 'struct'? (*)
     jsobject.py (PyDoc.net)
     Making Python Objects that act like Javascript Objects (James Robert's blog)
     AttrDict (ActiveState recipe)
     Dictionary with attribute-style access (ActiveState recipe)

是否最好返回一个匿名类或一个对象用作'struct'? (*)jsobject.py(PyDoc.net)制作像Javascript对象一样的Python对象(James Robert的博客)带有属性样式访问的AttrDict(ActiveState配方)字典(ActiveState配方)

...so often that I'd have to consider it a (less well-known) Python idiom.

...所以经常我不得不认为它是一个(不太知名的)Python成语。

class TreeNode(dict):
    def __init__(self, name, children=None):
        super().__init__()
        self.__dict__ = self
        self.name = name
        self.children = [] if not children else children

This solves half the serialization battle, but when the data produced is read back in with json.loads() it will be a regular dictionary object, not an instance of TreeNode. This is because JSONEncoder can encode dictionaries (and subclasses of them) itself.

这解决了序列化争夺的一半,但是当使用json.loads()读回生成的数据时,它将是常规字典对象,而不是TreeNode的实例。这是因为JSONEncoder可以编码字典(及其子类)本身。

One way to address that is add an alternative constructor method to the TreeNode class that can be called to reconstruct the data structure from the nested dictionary that json.loads() returns.

解决这个问题的一种方法是向TreeNode类添加一个替代构造函数方法,可以调用该方法从json.loads()返回的嵌套字典重构数据结构。

Here's what I mean:

这就是我的意思:

    @staticmethod
    def from_dict(dict_):
        """ Recursively (re)construct TreeNode-based tree from dictionary. """
        root = TreeNode(dict_['name'], dict_['children'])
        root.children = list(map(TreeNode.from_dict, root.children))
        return root

if __name__ == '__main__':
    import json

    tree = TreeNode('Parent')
    tree.children.append(TreeNode('Child 1'))
    child2 = TreeNode('Child 2')
    tree.children.append(child2)
    child2.children.append(TreeNode('Grand Kid'))
    child2.children[0].children.append(TreeNode('Great Grand Kid'))

    json_str = json.dumps(tree, sort_keys=True, indent=2)
    print(json_str)

    print()
    pyobj = TreeNode.from_dict(json.loads(json_str))  # reconstitute
    print('pyobj class: {}'.format(pyobj.__class__.__name__))  # -> TreeNode
    print(json.dumps(pyobj, sort_keys=True, indent=2))

Output:

{
  "children": [
    {
      "children": [],
      "name": "Child 1"
    },
    {
      "children": [
        {
          "children": [
            {
              "children": [],
              "name": "Great Grand Kid"
            }
          ],
          "name": "Grand Kid"
        }
      ],
      "name": "Child 2"
    }
  ],
  "name": "Parent"
}

pyobj class: TreeNode
{
  "children": [
    same as before...
  ],
  "name": "Parent"
}

#2


0  

Here's an alternative answer, which is basically a Python 3 version of my answer to the question Making object JSON serializable with regular encoder which pickles any Python objects that the regular json encoder doesn't already handle.

这是一个替代的答案,它基本上是我对问题的回答的Python 3版本。使用常规编码器使对象JSON可序列化,这可以挑选常规json编码器尚未处理的任何Python对象。

There's a couple of the differences. One is that it doesn't monkey-patch thejsonmodule only because that's not an essential part of the solution. Another is that although theTreeNodeclass isn't derived from thedictclass this time, it has essentially the same functionality. This was done intentionally to keep the stockJSONEncoderfrom encoding as one and would instead invoke the_default()method in the JSONEncodersubclass being used.

有几个不同之处。一个是它不会对jsonmodule进行修补,因为它不是解决方案的重要部分。另一个原因是虽然这次是使用theTictclass派生的NodeNodeclass,但它具有基本相同的功能。这是故意将stockJSONEncoder编码保持为一个,而是调用正在使用的JSONEncodersubclass中的_default()方法。

Other than that it is a very generic approach and will be able to handle many other Python object including user defined classes without modification.

除此之外,它是一种非常通用的方法,并且能够处理许多其他Python对象,包括用户定义的类而无需修改。

import base64
from collections import MutableMapping
import json
import pickle

class PythonObjectEncoder(json.JSONEncoder):
    def default(self, obj):
        return {'_python_object': 
                base64.b64encode(pickle.dumps(obj)).decode('utf-8') }

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(base64.b64decode(dct['_python_object']))
    return dct

# based on AttrDict -- https://code.activestate.com/recipes/576972-attrdict
class TreeNode(MutableMapping):
    """ dict-like object whose contents can be accessed as attributes. """
    def __init__(self, name, children=None):
        self.name = name
        self.children = list(children) if children is not None else []
    def __getitem__(self, key):
        return self.__getattribute__(key)
    def __setitem__(self, key, val):
        self.__setattr__(key, val)
    def __delitem__(self, key):
        self.__delattr__(key)
    def __iter__(self):
        return iter(self.__dict__)
    def __len__(self):
        return len(self.__dict__)

tree = TreeNode('Parent')
tree.children.append(TreeNode('Child 1'))
child2 = TreeNode('Child 2')
tree.children.append(child2)
child2.children.append(TreeNode('Grand Kid'))
child2.children[0].children.append(TreeNode('Great Grand Kid'))

json_str = json.dumps(tree, cls=PythonObjectEncoder, indent=4)
print('json_str:', json_str)
pyobj = json.loads(json_str, object_hook=as_python_object)
print(type(pyobj))

Output:

json_str: {
    "_python_object": "gANjX19tYWluX18KVHJlZU5vZGUKcQApgXEBfXECKFgIAAAAY2hp"
                      "bGRyZW5xA11xBChoACmBcQV9cQYoaANdcQdYBAAAAG5hbWVxCFgH"
                      "AAAAQ2hpbGQgMXEJdWJoACmBcQp9cQsoaANdcQxoACmBcQ19cQ4o"
                      "aANdcQ9oACmBcRB9cREoaANdcRJoCFgPAAAAR3JlYXQgR3JhbmQg"
                      "S2lkcRN1YmFoCFgJAAAAR3JhbmQgS2lkcRR1YmFoCFgHAAAAQ2hp"
                      "bGQgMnEVdWJlaAhYBgAAAFBhcmVudHEWdWIu"
}
<class '__main__.TreeNode'>

#1


4  

Since you're dealing with a tree structure, it's natural to use nested dictionaries. The following creates a subclass of dict and uses itself as the underlying __dict__ of the instance — which is an interesting and useful trick I've run across in many different contexts:

由于您正在处理树结构,因此使用嵌套字典是很自然的。下面创建了一个dict的子类,并将自己用作实例的底层__dict__ - 这是我在许多不同的上下文中遇到的一个有趣且有用的技巧:

     Is it preferable to return an anonymous class or an object to use as a 'struct'? (*)
     jsobject.py (PyDoc.net)
     Making Python Objects that act like Javascript Objects (James Robert's blog)
     AttrDict (ActiveState recipe)
     Dictionary with attribute-style access (ActiveState recipe)

是否最好返回一个匿名类或一个对象用作'struct'? (*)jsobject.py(PyDoc.net)制作像Javascript对象一样的Python对象(James Robert的博客)带有属性样式访问的AttrDict(ActiveState配方)字典(ActiveState配方)

...so often that I'd have to consider it a (less well-known) Python idiom.

...所以经常我不得不认为它是一个(不太知名的)Python成语。

class TreeNode(dict):
    def __init__(self, name, children=None):
        super().__init__()
        self.__dict__ = self
        self.name = name
        self.children = [] if not children else children

This solves half the serialization battle, but when the data produced is read back in with json.loads() it will be a regular dictionary object, not an instance of TreeNode. This is because JSONEncoder can encode dictionaries (and subclasses of them) itself.

这解决了序列化争夺的一半,但是当使用json.loads()读回生成的数据时,它将是常规字典对象,而不是TreeNode的实例。这是因为JSONEncoder可以编码字典(及其子类)本身。

One way to address that is add an alternative constructor method to the TreeNode class that can be called to reconstruct the data structure from the nested dictionary that json.loads() returns.

解决这个问题的一种方法是向TreeNode类添加一个替代构造函数方法,可以调用该方法从json.loads()返回的嵌套字典重构数据结构。

Here's what I mean:

这就是我的意思:

    @staticmethod
    def from_dict(dict_):
        """ Recursively (re)construct TreeNode-based tree from dictionary. """
        root = TreeNode(dict_['name'], dict_['children'])
        root.children = list(map(TreeNode.from_dict, root.children))
        return root

if __name__ == '__main__':
    import json

    tree = TreeNode('Parent')
    tree.children.append(TreeNode('Child 1'))
    child2 = TreeNode('Child 2')
    tree.children.append(child2)
    child2.children.append(TreeNode('Grand Kid'))
    child2.children[0].children.append(TreeNode('Great Grand Kid'))

    json_str = json.dumps(tree, sort_keys=True, indent=2)
    print(json_str)

    print()
    pyobj = TreeNode.from_dict(json.loads(json_str))  # reconstitute
    print('pyobj class: {}'.format(pyobj.__class__.__name__))  # -> TreeNode
    print(json.dumps(pyobj, sort_keys=True, indent=2))

Output:

{
  "children": [
    {
      "children": [],
      "name": "Child 1"
    },
    {
      "children": [
        {
          "children": [
            {
              "children": [],
              "name": "Great Grand Kid"
            }
          ],
          "name": "Grand Kid"
        }
      ],
      "name": "Child 2"
    }
  ],
  "name": "Parent"
}

pyobj class: TreeNode
{
  "children": [
    same as before...
  ],
  "name": "Parent"
}

#2


0  

Here's an alternative answer, which is basically a Python 3 version of my answer to the question Making object JSON serializable with regular encoder which pickles any Python objects that the regular json encoder doesn't already handle.

这是一个替代的答案,它基本上是我对问题的回答的Python 3版本。使用常规编码器使对象JSON可序列化,这可以挑选常规json编码器尚未处理的任何Python对象。

There's a couple of the differences. One is that it doesn't monkey-patch thejsonmodule only because that's not an essential part of the solution. Another is that although theTreeNodeclass isn't derived from thedictclass this time, it has essentially the same functionality. This was done intentionally to keep the stockJSONEncoderfrom encoding as one and would instead invoke the_default()method in the JSONEncodersubclass being used.

有几个不同之处。一个是它不会对jsonmodule进行修补,因为它不是解决方案的重要部分。另一个原因是虽然这次是使用theTictclass派生的NodeNodeclass,但它具有基本相同的功能。这是故意将stockJSONEncoder编码保持为一个,而是调用正在使用的JSONEncodersubclass中的_default()方法。

Other than that it is a very generic approach and will be able to handle many other Python object including user defined classes without modification.

除此之外,它是一种非常通用的方法,并且能够处理许多其他Python对象,包括用户定义的类而无需修改。

import base64
from collections import MutableMapping
import json
import pickle

class PythonObjectEncoder(json.JSONEncoder):
    def default(self, obj):
        return {'_python_object': 
                base64.b64encode(pickle.dumps(obj)).decode('utf-8') }

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(base64.b64decode(dct['_python_object']))
    return dct

# based on AttrDict -- https://code.activestate.com/recipes/576972-attrdict
class TreeNode(MutableMapping):
    """ dict-like object whose contents can be accessed as attributes. """
    def __init__(self, name, children=None):
        self.name = name
        self.children = list(children) if children is not None else []
    def __getitem__(self, key):
        return self.__getattribute__(key)
    def __setitem__(self, key, val):
        self.__setattr__(key, val)
    def __delitem__(self, key):
        self.__delattr__(key)
    def __iter__(self):
        return iter(self.__dict__)
    def __len__(self):
        return len(self.__dict__)

tree = TreeNode('Parent')
tree.children.append(TreeNode('Child 1'))
child2 = TreeNode('Child 2')
tree.children.append(child2)
child2.children.append(TreeNode('Grand Kid'))
child2.children[0].children.append(TreeNode('Great Grand Kid'))

json_str = json.dumps(tree, cls=PythonObjectEncoder, indent=4)
print('json_str:', json_str)
pyobj = json.loads(json_str, object_hook=as_python_object)
print(type(pyobj))

Output:

json_str: {
    "_python_object": "gANjX19tYWluX18KVHJlZU5vZGUKcQApgXEBfXECKFgIAAAAY2hp"
                      "bGRyZW5xA11xBChoACmBcQV9cQYoaANdcQdYBAAAAG5hbWVxCFgH"
                      "AAAAQ2hpbGQgMXEJdWJoACmBcQp9cQsoaANdcQxoACmBcQ19cQ4o"
                      "aANdcQ9oACmBcRB9cREoaANdcRJoCFgPAAAAR3JlYXQgR3JhbmQg"
                      "S2lkcRN1YmFoCFgJAAAAR3JhbmQgS2lkcRR1YmFoCFgHAAAAQ2hp"
                      "bGQgMnEVdWJlaAhYBgAAAFBhcmVudHEWdWIu"
}
<class '__main__.TreeNode'>