glom初级教程

1.glom介绍

通常对于字典和json的提取我们都是使用如下方式

>>> data = {\'a\': {\'b\': {\'c\': \'d\'}}}
>>> data[\'a\'][\'b\'][\'c\']
\'d\'

这种方式看起来简单，但是如果字段结构改变就引发了悲剧

>>> data2 = {\'a\': {\'b\': None}}
>>> data2[\'a\'][\'b\'][\'c\']
Traceback (most recent call last):...
TypeError: \'NoneType\' object is not subscriptable

错误虽然出来，可是没有直观告诉我们是哪个key引起的，a、b、c?

这个时候glom就应运而生，它非常方便的解决字典或者json嵌套的取值，还提供了输出控制，格式控制，结构容错等功能。

开始之前先对glom用法做一个介绍，它包含一下两个术语

target: 需要提取的dict、json、list或者其他对象。
spec: 我们想要的输出

output = glom(traget, spec) 这样就会提交到内存，然后得到我们想要的格式

>>> target = {\'galaxy\': {\'system\': {\'planet\': \'jupiter\'}}}
>>> spec = \'galaxy.system.planet\'
>>> glom(target, spec)
\'jupiter\'

2.glom安装

pip install glom

from glom import *

3.基本路径提取

glom提供了三种路径提取的方式

字符串
Path对象
T

字符串

路径直接提取数据 (单路单一匹配）

>>> target = {\'galaxy\': {\'system\': {\'planet\': \'jupiter\'}}}
>>> spec = \'galaxy.system.planet\'
>>> glom(target, spec)
\'jupiter\'

现在数据结构换了，planet变成list了

>>> target = {\'system\': {\'planets\': [{\'name\': \'earth\'}, {\'name\': \'jupiter\'}]}}
>>> glom(target, (\'system.planets\', [\'name\']))
[\'earth\', \'jupiter\']

现在要求变了，数据加字段了，output需要多个字段 (多路径单一匹配)

>>> target = {\'system\': {\'planets\': [{\'name\': \'earth\', \'moons\': 1}, {\'name\': \'jupiter\', \'moons\': 69}]}}
>>> spec1 =(\'system.planets\', [\'name\'])
>>> spec2 =  (\'system.planets\', [\'moons\'])}
>>> pprint(glom(target, spec1))
[\'earth\', \'jupiter\']
>>> pprint(glom(target, spec2))
[1, 69]

这样写太麻烦了，glom提供了一个合并的方法，使用字典的方式格式化输出

>>> target = {\'system\': {\'planets\': [{\'name\': \'earth\', \'moons\': 1}，{\'name\': \'jupiter\', \'moons\': 69}]}
>>> spec = {\'names\': (\'system.planets\', [\'name\']), \'moons\': (\'system.planets\', [\'moons\'])}
>>> pprint(glom(target, spec))
{\'moons\': [1, 69], \'names\': [\'earth\', \'jupiter\']}

现在更复杂了，不仅多了字段，有的数据key也发生了变化（多路径多匹配）

>>> target1 = {\'system\': {\'dwarf_planets\': [{\'name\': \'pluto\', \'moons\': 5},... {\'name\': \'ceres\', \'moons\': 0}]}}
>>> target2 = {\'system\': {\'planets\': [{\'name\': \'earth\', \'moons\': 1},... {\'name\': \'jupiter\', \'moons\': 69}]}}

>>> spec = {\'names\': (Coalesce(\'system.planets\', \'system.dwarf_planets\'), [\'name\']),\'moons\': (Coalesce(\'system.planets\', \'system.dwarf_planets\'), [\'moons\'])}
>>> pprint(glom(target, spec))
{\'moons\': [1, 69], \'names\': [\'earth\', \'jupiter\']}

Path对象

比如路径包含int，datetime等不适合使用\'a.b.c\'这种方式调用的，需要使用Path

>>> target = {\'a\': {\'b\': \'c\', \'d.e\': \'f\', 2: 3}}
>>> glom(target, Path(\'a\', 2))
3
>>> glom(target, Path(\'a\', \'d.e\'))
\'f\'

Path支持join

>>> Path(T[\'a\'], T[\'b\'])T[\'a\'][\'b\']
>>> Path(Path(\'a\', \'b\'),Path(\'c\', \'d\'))
Path(\'a\', \'b\', \'c\', \'d\')

Path支持切片

>>> path = Path(\'a\', \'b\', 1, 2)
>>> path[0]
Path(\'a\')
>>> path[-2:]
Path(1, 2)

具体用法就是将字符串路径我位置替换成相应的Path对象

面向对象的表达方式，但是目前只能提取数据，不能做加工

>>> spec = T[\'a\'][\'b\'][\'c\']
>>> target = {\'a\': {\'b\': {\'c\': \'d\'}}}
>>> glom(target, spec)
\'d\'

T提取出来的就是对应的python对象，（具体用法待考证）

>>> from glom import T
>>> target = {\'system\': {\'planets\': [{\'name\': \'earth\', \'moons\': 1}，{\'name\': \'jupiter\', \'moons\': 69}]}
>>> spec = T[\'system\'][\'planets\'][-1].values()
>>> glom(target, spec)
[\'jupiter\', 69]

>>> spec = (\'a\', (T[\'b\'].items(), list))
 # reviewed below
>>> glom(target, spec)
[(\'c\', \'d\')]

4.数据加工

glom不仅仅支持数据的提取，还支持对数据格式化，或者自定义的lambda函数

比如将每个数据的moons求和

>>> target = {\'system\': {\'planets\': [{\'name\': \'earth\', \'moons\': 1},{\'name\': \'jupiter\', \'moons\': 69}]}}
>>> pprint(glom(target, (\'system.planets\', [\'moons\'], sum)}))
70

>>> target = {\'system\': {\'planets\': [{\'name\': \'earth\', \'moons\': 1},{\'name\': \'jupiter\', \'moons\': 69}]}}
>>> pprint(glom(target, (\'system.planets\', [\'moons\'], [lambda x: x*2])}))
[2, 138]

5.格式化输出

为了让输出更加有意义，glom提供结构化的2种方法，

字符串

{

"your name1": 提取路径规则1，

"your name2": 提取路径规则2，

"your name3": 提取路径规则3，

}

类 (之后补充)

6.debug调试

如果现有的error输出无法帮你解决bug，那么请使用 glom.Inspect

>>> target = {\'a\': {\'b\': {}}}
>>> val = glom(target, Inspect(\'a.b\')) 
 # wrapping a spec
---path: [\'a.b\']
target: {\'a\': {\'b\': {}}}
output: {}---

相关文章