将数千条记录插入表的最有效方法是什么(MySQL、Python、Django)

时间:2022-07-24 09:27:12

I have a database table with a unique string field and a couple of integer fields. The string field is usually 10-100 characters long.

我有一个带有唯一字符串字段和几个整数字段的数据库表。字符串字段通常为10-100个字符。

Once every minute or so I have the following scenario: I receive a list of 2-10 thousand tuples corresponding to the table's record structure, e.g.

每隔一分钟左右,我就会有以下场景:我收到一个与该表记录结构相对应的2-1万个元组列表,例如。

[("hello", 3, 4), ("cat", 5, 3), ...]

I need to insert all these tuples to the table (assume I verified neither of these strings appear in the database). For clarification, I'm using InnoDB, and I have an auto-incremental primary key for this table, the string is not the PK.

我需要将所有这些元组插入到表中(假设我没有验证这些字符串在数据库中出现)。澄清一下,我使用的是InnoDB,并且我有一个自动递增的主键,这个字符串不是PK。

My code currently iterates through this list, for each tuple creates a Python module object with the appropriate values, and calls ".save()", something like so:

我的代码当前遍历这个列表,对于每个tuple创建一个具有适当值的Python模块对象,并调用“.save()”,像这样:

@transaction.commit_on_success
def save_data_elements(input_list):
    for (s, i1, i2) in input_list:
        entry = DataElement(string=s, number1=i1, number2=i2)
        entry.save()

This code is currently one of the performance bottlenecks in my system, so I'm looking for ways to optimize it.

这段代码目前是我系统中的性能瓶颈之一,所以我正在寻找方法来优化它。

For example, I could generate SQL codes each containing an INSERT command for 100 tuples ("hard-coded" into the SQL) and execute it, but I don't know if it will improve anything.

例如,我可以生成包含100个元组(在SQL中“硬编码”)的INSERT命令的SQL代码并执行它,但我不知道它是否会有所改进。

Do you have any suggestion to optimize such a process?

你有什么建议来优化这个过程吗?

Thanks

谢谢

8 个解决方案

#1


11  

You can write the rows to a file in the format "field1", "field2", .. and then use LOAD DATA to load them

您可以将行写入“field1”、“field2”等格式的文件中。然后使用LOAD数据加载它们

data = '\n'.join(','.join('"%s"' % field for field in row) for row in data)
f= open('data.txt', 'w')
f.write(data)
f.close()

Then execute this:

然后执行:

LOAD DATA INFILE 'data.txt' INTO TABLE db2.my_table;

Reference

参考

#2


12  

For MySQL specifically, the fastest way to load data is using LOAD DATA INFILE, so if you could convert the data into the format that expects, it'll probably be the fastest way to get it into the table.

特别是对于MySQL,加载数据的最快方式是使用load data INFILE,所以如果您可以将数据转换为预期的格式,那么它可能是将数据放入表的最快方式。

#3


4  

If you don't LOAD DATA INFILE as some of the other suggestions mention, two things you can do to speed up your inserts are :

如果您不像其他一些建议提到的那样载入数据,您可以做两件事来加速插入:

  1. Use prepared statements - this cuts out the overhead of parsing the SQL for every insert
  2. 使用准备好的语句——这减少了为每次插入解析SQL的开销
  3. Do all of your inserts in a single transaction - this would require using a DB engine that supports transactions (like InnoDB)
  4. 在一个事务中执行所有插入操作——这需要使用支持事务的DB引擎(比如InnoDB)

#4


4  

If you can do a hand-rolled INSERT statement, then that's the way I'd go. A single INSERT statement with multiple value clauses is much much faster than lots of individual INSERT statements.

如果你能做一个手动滚动的插入语句,那么我就会这么做。带有多个值子句的单个INSERT语句要比许多单独的INSERT语句快得多。

#5


2  

Regardless of the insert method, you will want to use the InnoDB engine for maximum read/write concurrency. MyISAM will lock the entire table for the duration of the insert whereas InnoDB (under most circumstances) will only lock the affected rows, allowing SELECT statements to proceed.

不管插入方法是什么,您都希望使用InnoDB引擎实现最大的读/写并发性。MyISAM将在插入期间锁定整个表,而InnoDB(在大多数情况下)只锁定受影响的行,允许SELECT语句继续。

#6


1  

what format do you receive? if it is a file, you can do some sort of bulk load: http://www.classes.cs.uchicago.edu/archive/2005/fall/23500-1/mysql-load.html

您收到的格式是什么?如果是一个文件,您可以做一些批量加载:http://www.classes.cs.uchicago.edu/archive/2005/fall/23500-1/mysql-load.html

#7


1  

This is unrelated to the actual load of data into the DB, but...

这与将数据加载到DB中的实际负载无关,但是……

If providing a "The data is loading... The load will be done shortly" type of message to the user is an option, then you can run the INSERTs or LOAD DATA asynchronously in a different thread.

如果提供“数据正在加载……”“向用户发送的消息类型是一个选项,然后您可以在不同的线程中异步运行插入或加载数据。”

Just something else to consider.

只是要考虑别的事情。

#8


1  

I donot know the exact details, but u can use json style data representation and use it as fixtures or something. I saw something similar on Django Video Workshop by Douglas Napoleone. See the videos at http://www.linux-magazine.com/online/news/django_video_workshop. and http://www.linux-magazine.com/online/features/django_reloaded_workshop_part_1. Hope this one helps.

我不知道确切的细节,但是u可以使用json样式的数据表示,并将其用作fixture或其他东西。我在道格拉斯·拿破仑的《Django Video Workshop》上看到了类似的东西。参见http://www.linux-magazine.com/online/news/django_video_workshop。和http://www.linux-magazine.com/online/features/django_reloaded_workshop_part_1。希望这个有帮助。

Hope you can work it out. I just started learning django, so I can just point you to resources.

希望你能解决。我刚开始学习django,所以我可以给你们指出资源。

#1


11  

You can write the rows to a file in the format "field1", "field2", .. and then use LOAD DATA to load them

您可以将行写入“field1”、“field2”等格式的文件中。然后使用LOAD数据加载它们

data = '\n'.join(','.join('"%s"' % field for field in row) for row in data)
f= open('data.txt', 'w')
f.write(data)
f.close()

Then execute this:

然后执行:

LOAD DATA INFILE 'data.txt' INTO TABLE db2.my_table;

Reference

参考

#2


12  

For MySQL specifically, the fastest way to load data is using LOAD DATA INFILE, so if you could convert the data into the format that expects, it'll probably be the fastest way to get it into the table.

特别是对于MySQL,加载数据的最快方式是使用load data INFILE,所以如果您可以将数据转换为预期的格式,那么它可能是将数据放入表的最快方式。

#3


4  

If you don't LOAD DATA INFILE as some of the other suggestions mention, two things you can do to speed up your inserts are :

如果您不像其他一些建议提到的那样载入数据,您可以做两件事来加速插入:

  1. Use prepared statements - this cuts out the overhead of parsing the SQL for every insert
  2. 使用准备好的语句——这减少了为每次插入解析SQL的开销
  3. Do all of your inserts in a single transaction - this would require using a DB engine that supports transactions (like InnoDB)
  4. 在一个事务中执行所有插入操作——这需要使用支持事务的DB引擎(比如InnoDB)

#4


4  

If you can do a hand-rolled INSERT statement, then that's the way I'd go. A single INSERT statement with multiple value clauses is much much faster than lots of individual INSERT statements.

如果你能做一个手动滚动的插入语句,那么我就会这么做。带有多个值子句的单个INSERT语句要比许多单独的INSERT语句快得多。

#5


2  

Regardless of the insert method, you will want to use the InnoDB engine for maximum read/write concurrency. MyISAM will lock the entire table for the duration of the insert whereas InnoDB (under most circumstances) will only lock the affected rows, allowing SELECT statements to proceed.

不管插入方法是什么,您都希望使用InnoDB引擎实现最大的读/写并发性。MyISAM将在插入期间锁定整个表,而InnoDB(在大多数情况下)只锁定受影响的行,允许SELECT语句继续。

#6


1  

what format do you receive? if it is a file, you can do some sort of bulk load: http://www.classes.cs.uchicago.edu/archive/2005/fall/23500-1/mysql-load.html

您收到的格式是什么?如果是一个文件,您可以做一些批量加载:http://www.classes.cs.uchicago.edu/archive/2005/fall/23500-1/mysql-load.html

#7


1  

This is unrelated to the actual load of data into the DB, but...

这与将数据加载到DB中的实际负载无关,但是……

If providing a "The data is loading... The load will be done shortly" type of message to the user is an option, then you can run the INSERTs or LOAD DATA asynchronously in a different thread.

如果提供“数据正在加载……”“向用户发送的消息类型是一个选项,然后您可以在不同的线程中异步运行插入或加载数据。”

Just something else to consider.

只是要考虑别的事情。

#8


1  

I donot know the exact details, but u can use json style data representation and use it as fixtures or something. I saw something similar on Django Video Workshop by Douglas Napoleone. See the videos at http://www.linux-magazine.com/online/news/django_video_workshop. and http://www.linux-magazine.com/online/features/django_reloaded_workshop_part_1. Hope this one helps.

我不知道确切的细节,但是u可以使用json样式的数据表示,并将其用作fixture或其他东西。我在道格拉斯·拿破仑的《Django Video Workshop》上看到了类似的东西。参见http://www.linux-magazine.com/online/news/django_video_workshop。和http://www.linux-magazine.com/online/features/django_reloaded_workshop_part_1。希望这个有帮助。

Hope you can work it out. I just started learning django, so I can just point you to resources.

希望你能解决。我刚开始学习django,所以我可以给你们指出资源。