在Python中将JSON转换为SQLite -如何正确地将JSON键映射到数据库列?

时间:2021-01-20 12:34:14

I want to convert a JSON file I created to a SQLite database.

我想将我创建的JSON文件转换为SQLite数据库。

My intention is to decide later which data container and entry point is best, json (data entry via text editor) or SQLite (data entry via spreadsheet-like GUIs like SQLiteStudio).

我的目的是稍后决定哪个数据容器和入口点是最好的,json(通过文本编辑器输入数据)还是SQLite(通过类似于电子表格的gui的数据条目,比如SQLiteStudio)。

My json file is like this (containing traffic data from some crossroads in my city):

我的json文件就像这样(包含了我所在城市的一些十字路口的交通数据):

...
"2011-12-17 16:00": {
    "local": "Av. Protásio Alves; esquina Ramiro Barcelos",
    "coord": "-30.036916,-51.208093",
    "sentido": "bairro-centro",
    "veiculos": "automotores",
    "modalidade": "semaforo 50-15",
    "regime": "típico",
    "pistas": "2+c",
    "medicoes": [
        [32, 50],
        [40, 50],
        [29, 50],
        [32, 50],
        [35, 50]
        ]
    },
"2011-12-19 08:38": {
    "local": "R. Fernandes Vieira; esquina Protásio Alves",
    "coord": "-30.035535,-51.211079",
    "sentido": "único",
    "veiculos": "automotores",
    "modalidade": "semáforo 30-70",
    "regime": "típico",
    "pistas": "3",
    "medicoes": [
        [23, 30],
        [32, 30],
        [33, 30],
        [32, 30]
        ]
    }
...

And I have created nice database with a one-to-many relation with these lines of Python code:

我创建了一个很好的数据库,与这些Python代码行有一对多的关系:

import sqlite3

db = sqlite3.connect("fluxos.sqlite")
c = db.cursor()

c.execute('''create table medicoes
         (timestamp text primary key,
          local text,
          coord text,
          sentido text,
          veiculos text,
          modalidade text,
          pistas text)''')

c.execute('''create table valores
         (id integer primary key,
          quantidade integer,
          tempo integer,
          foreign key (id) references medicoes(timestamp))''')

BUT the problem is, when I was preparing to insert the rows with actual data with something like c.execute("insert into medicoes values(?,?,?,?,?,?,?)" % keys), I realized that, since the dict loaded from the JSON file has no special order, it does not map properly to the column order of the database.

但问题是,当我准备用c之类的东西插入带有实际数据的行时。执行(“插入医生值(?,?,?,?,?,?,?)”我意识到,由于从JSON文件加载的dict类型没有特殊的顺序,它不能正确地映射到数据库的列顺序。

So, I ask: "which strategy/method should I use to programmatically read the keys from each "block" in the JSON file (in this case, "local", "coord", "sentido", "veiculos", "modalidade", "regime", "pistas" e "medicoes"), create the database with the columns in that same order, and then insert the rows with the proper values"?

所以,我问:“我应该用哪种策略/方法从每个“编程读取键块”在JSON文件中(在本例中,“当地”,“coord”,“打”、“veiculos”,“modalidade”、“政权”、“pistas”e“医生”),创建数据库的列顺序相同,然后用适当的值插入行”?

I have a fair experience with Python, but am just beginning with SQL, so I would like to have some counseling about good practices, and not necessarily a ready recipe.

我有丰富的Python经验,但我只是从SQL开始,所以我想要一些关于好的实践的咨询,而不一定是现成的菜谱。

1 个解决方案

#1


38  

You have this python code:

您有以下python代码:

c.execute("insert into medicoes values(?,?,?,?,?,?,?)" % keys)

which I think should be

我认为应该是哪个

c.execute("insert into medicoes values (?,?,?,?,?,?,?)", keys)

since the % operator expects the string to its left to contain formatting codes.

由于%操作符希望左边的字符串包含格式化代码。

Now all you need to make this work is for keys to be a tuple (or list) containing the values for the new row of the medicoes table in the correct order. Consider the following python code:

现在,您只需将键设置为一个tuple(或list),其中包含以正确顺序排列的medicoes表的新行的值。考虑以下python代码:

import json

traffic = json.load(open('xxx.json'))

columns = ['local', 'coord', 'sentido', 'veiculos', 'modalidade', 'pistas']
for timestamp, data in traffic.iteritems():
    keys = (timestamp,) + tuple(data[c] for c in columns)
    print str(keys)

When I run this with your sample data, I get:

当我用你的样本数据运行时,我得到:

(u'2011-12-19 08:38', u'R. Fernandes Vieira; esquina Prot\xe1sio Alves', u'-30.035535,-51.211079', u'\xfanico', u'automotores', u'sem\xe1foro 30-70', u'3')
(u'2011-12-17 16:00', u'Av. Prot\xe1sio Alves; esquina Ramiro Barcelos', u'-30.036916,-51.208093', u'bairro-centro', u'automotores', u'semaforo 50-15', u'2+c')

which would seem to be the tuples you require.

这似乎是你需要的元组。

You could add the necessary sqlite code with something like this:

您可以添加必要的sqlite代码,如下所示:

import json
import sqlite3

traffic = json.load(open('xxx.json'))
db = sqlite3.connect("fluxos.sqlite")

query = "insert into medicoes values (?,?,?,?,?,?,?)"
columns = ['local', 'coord', 'sentido', 'veiculos', 'modalidade', 'pistas']
for timestamp, data in traffic.iteritems():
    keys = (timestamp,) + tuple(data[c] for c in columns)
    c = db.cursor()
    c.execute(query, keys)
    c.close()

Edit: if you don't want to hard-code the list of columns, you could do something like this:

编辑:如果你不想硬编码列列表,你可以这样做:

import json

traffic = json.load(open('xxx.json'))

someitem = traffic.itervalues().next()
columns = list(someitem.keys())
print columns

When I run this it prints:

当我运行这个程序时,它会打印:

[u'medicoes', u'veiculos', u'coord', u'modalidade', u'sentido', u'local', u'pistas', u'regime']

You could use it with something like this:

你可以用它来做这样的事情:

import json
import sqlite3

db = sqlite3.connect('fluxos.sqlite')
traffic = json.load(open('xxx.json'))

someitem = traffic.itervalues().next()
columns = list(someitem.keys())
columns.remove('medicoes')
columns.remove('regime')

query = "insert into medicoes (timestamp,{0}) values (?{1})"
query = query.format(",".join(columns), ",?" * len(columns))
print query

for timestamp, data in traffic.iteritems():
    keys = (timestamp,) + tuple(data[c] for c in columns)
    c = db.cursor()
    c.execute(query)
    c.close()

The query this code prints when I try it with your sample data is something like this:

当我尝试使用您的示例数据时,该代码输出的查询是这样的:

insert into medicoes (timestamp,veiculos,coord,modalidade,sentido,local,pistas) values (?,?,?,?,?,?,?)

#1


38  

You have this python code:

您有以下python代码:

c.execute("insert into medicoes values(?,?,?,?,?,?,?)" % keys)

which I think should be

我认为应该是哪个

c.execute("insert into medicoes values (?,?,?,?,?,?,?)", keys)

since the % operator expects the string to its left to contain formatting codes.

由于%操作符希望左边的字符串包含格式化代码。

Now all you need to make this work is for keys to be a tuple (or list) containing the values for the new row of the medicoes table in the correct order. Consider the following python code:

现在,您只需将键设置为一个tuple(或list),其中包含以正确顺序排列的medicoes表的新行的值。考虑以下python代码:

import json

traffic = json.load(open('xxx.json'))

columns = ['local', 'coord', 'sentido', 'veiculos', 'modalidade', 'pistas']
for timestamp, data in traffic.iteritems():
    keys = (timestamp,) + tuple(data[c] for c in columns)
    print str(keys)

When I run this with your sample data, I get:

当我用你的样本数据运行时,我得到:

(u'2011-12-19 08:38', u'R. Fernandes Vieira; esquina Prot\xe1sio Alves', u'-30.035535,-51.211079', u'\xfanico', u'automotores', u'sem\xe1foro 30-70', u'3')
(u'2011-12-17 16:00', u'Av. Prot\xe1sio Alves; esquina Ramiro Barcelos', u'-30.036916,-51.208093', u'bairro-centro', u'automotores', u'semaforo 50-15', u'2+c')

which would seem to be the tuples you require.

这似乎是你需要的元组。

You could add the necessary sqlite code with something like this:

您可以添加必要的sqlite代码,如下所示:

import json
import sqlite3

traffic = json.load(open('xxx.json'))
db = sqlite3.connect("fluxos.sqlite")

query = "insert into medicoes values (?,?,?,?,?,?,?)"
columns = ['local', 'coord', 'sentido', 'veiculos', 'modalidade', 'pistas']
for timestamp, data in traffic.iteritems():
    keys = (timestamp,) + tuple(data[c] for c in columns)
    c = db.cursor()
    c.execute(query, keys)
    c.close()

Edit: if you don't want to hard-code the list of columns, you could do something like this:

编辑:如果你不想硬编码列列表,你可以这样做:

import json

traffic = json.load(open('xxx.json'))

someitem = traffic.itervalues().next()
columns = list(someitem.keys())
print columns

When I run this it prints:

当我运行这个程序时,它会打印:

[u'medicoes', u'veiculos', u'coord', u'modalidade', u'sentido', u'local', u'pistas', u'regime']

You could use it with something like this:

你可以用它来做这样的事情:

import json
import sqlite3

db = sqlite3.connect('fluxos.sqlite')
traffic = json.load(open('xxx.json'))

someitem = traffic.itervalues().next()
columns = list(someitem.keys())
columns.remove('medicoes')
columns.remove('regime')

query = "insert into medicoes (timestamp,{0}) values (?{1})"
query = query.format(",".join(columns), ",?" * len(columns))
print query

for timestamp, data in traffic.iteritems():
    keys = (timestamp,) + tuple(data[c] for c in columns)
    c = db.cursor()
    c.execute(query)
    c.close()

The query this code prints when I try it with your sample data is something like this:

当我尝试使用您的示例数据时,该代码输出的查询是这样的:

insert into medicoes (timestamp,veiculos,coord,modalidade,sentido,local,pistas) values (?,?,?,?,?,?,?)