通过加载数据infile将95 GB CSV文件上传到MySQL MyISAM表中:CSV引擎可以替代吗?

时间:2020-12-20 16:59:35

I'm trying to upload a 95 GB CSV file into a MySQL database (MySQL 5.1.36) via the following command:

我正在尝试通过以下命令将95 GB CSV文件上传到MySQL数据库(MySQL 5.1.36):

CREATE TABLE MOD13Q1 (
rid INT UNSIGNED NOT NULL AUTO_INCREMENT,
gid MEDIUMINT(6) UNSIGNED NOT NULL ,
yr SMALLINT(4) UNSIGNED NOT NULL ,
dyyr SMALLINT(4) UNSIGNED NOT NULL ,
ndvi DECIMAL(7,4) NOT NULL comment 'NA value is 9',
reliability TINYINT(4)  NOT NULL comment 'NA value is 9',
ndviquality1 TINYINT(1) NOT NULL ,
ndviquality2 TINYINT(1) NOT NULL ,
primary key (rid),
key(gid)
) ENGINE = MyISAM ;

LOAD DATA INFILE 'datafile.csv' INTO TABLE MOD13Q1 FIELDS TERMINATED by ',' LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(gid, yr, dyyr, ndvi, reliability,
ndviquality1, ndviquality2
) ;

I'm running this script via DOS at the moment, but the database is not responding. It works for smaller CSV files (1.5 GB) fine. Would it work for this file size?

我目前正在通过DOS运行此脚本,但数据库没有响应。它适用于较小的CSV文件(1.5 GB)。它适用于此文件大小吗?

Do you have any recommendation on how to do this more efficiently/faster? Would engine = CSV be an alternative (indexing not activated! -> so queries might run super slow?).

您对如何更有效/更快地做到这一点有什么建议吗?将engine = CSV作为替代方案(索引未激活! - >因此查询可能会超级运行吗?)。

Update

Thanks for the tips, It worked!

感谢您的提示,它的确有效!

mysql> LOAD DATA INFILE 'E:\\AAJan\\data\\data.csv' INTO TABL
E MOD13Q1
    -> FIELDS TERMINATED by ','
    ->     LINES TERMINATED BY '\r\n'
    ->     IGNORE 1 LINES
    ->     (gid, yr, dyyr, ndvi, reliability,
    ->     ndviquality1, ndviquality2
    ->     ) ;
Query OK, -1923241485 rows affected (18 hours 28 min 51.26 sec)
Records: -1923241485  Deleted: 0  Skipped: 0  Warnings: 0

mysql>

Hope this is helpful for others avoiding splitting data up in chunks.

希望这有助于其他人避免以块的形式分割数据。

5 个解决方案

#1


1  

You should disable all the constraints when you are importing. Apart from that I think it should work properly and to be noted that it is going to take a while, probably hours.

您应该在导入时禁用所有约束。除此之外,我认为它应该正常工作,并指出它需要一段时间,可能是几个小时。

#2


3  

No easy way, you will have to split your data in chunks and then import those...

没有简单的方法,你将不得不分块数据,然后导入这些...

#3


0  

Bcp ? .................................. Oh wait. It does not matter anyway it will be some bulk transaction. You need chunks. You need it to avoid overfilling yout log segment space. The lock count limits. Anything greater than 1 million of things at a time it too much. So the best known batch size for BCP is 10,000 records!

Bcp? .................................. 等一下。无论如何,这将是一些批量交易无关紧要。你需要块。您需要它来避免过度填充您的日志段空间。锁定计数限制。一次超过100万件的东西太多了。因此,BCP最知名的批量大小是10,000条记录!

#4


0  

I agree with RageZ and Sarfraz answers, but i have something to add.

我同意RageZ和Sarfraz的回答,但我有一些补充。

1. Increasing database cache and reconfiguring some mysql options may help (RAM usage).

1.增加数据库缓存并重新配置一些mysql选项可能会有所帮助(RAM使用)。

Take a look at this:

看看这个:

Mysql Database Performance tuning

Mysql数据库性能调优

I think you should focus on write_buffer, read_buffer, query_cache_size and other RAM and I/O related options.

我认为你应该专注于write_buffer,read_buffer,query_cache_size以及其他与RAM和I / O相关的选项。

2. You probably need faster storage device. What are you using now?

2.您可能需要更快的存储设备。你现在在用什么?

For database big like this - you should use RAID-5 array with fast and modern hard disks.

对于像这样大的数据库 - 你应该使用具有快速和现代硬盘的RAID-5阵列。

Maybe your configuration is enough for everyday tasks, but what about backups and crysis situations?

也许您的配置足以满足日常任务,但是备份和孤岛危机的情况又如何呢?

Creating backup and restoring database big like this will take too much time on machine, that needs 18 hours for simple insert import.

像这样创建备份和恢复数据库将花费太多时间在机器上,简单的插入导入需要18个小时。

I know that 95GB is really big text file, but... i think you should use hardware that is able to do simple operations like this in max 2-3 hours.

我知道95GB是真正的大文本文件,但是...我认为你应该使用能够在最多2-3小时内完成这样简单操作的硬件。

#5


0  

You can try to use MySQLTuner - High Performance MySQL Tuning Script written in perl which helps you with your MySQL configuration and make recommendations for increased performance and stability.

您可以尝试使用MySQLTuner - 用perl编写的高性能MySQL调优脚本,它可以帮助您进行MySQL配置,并提供更高性能和稳定性的建议。

#1


1  

You should disable all the constraints when you are importing. Apart from that I think it should work properly and to be noted that it is going to take a while, probably hours.

您应该在导入时禁用所有约束。除此之外,我认为它应该正常工作,并指出它需要一段时间,可能是几个小时。

#2


3  

No easy way, you will have to split your data in chunks and then import those...

没有简单的方法,你将不得不分块数据,然后导入这些...

#3


0  

Bcp ? .................................. Oh wait. It does not matter anyway it will be some bulk transaction. You need chunks. You need it to avoid overfilling yout log segment space. The lock count limits. Anything greater than 1 million of things at a time it too much. So the best known batch size for BCP is 10,000 records!

Bcp? .................................. 等一下。无论如何,这将是一些批量交易无关紧要。你需要块。您需要它来避免过度填充您的日志段空间。锁定计数限制。一次超过100万件的东西太多了。因此,BCP最知名的批量大小是10,000条记录!

#4


0  

I agree with RageZ and Sarfraz answers, but i have something to add.

我同意RageZ和Sarfraz的回答,但我有一些补充。

1. Increasing database cache and reconfiguring some mysql options may help (RAM usage).

1.增加数据库缓存并重新配置一些mysql选项可能会有所帮助(RAM使用)。

Take a look at this:

看看这个:

Mysql Database Performance tuning

Mysql数据库性能调优

I think you should focus on write_buffer, read_buffer, query_cache_size and other RAM and I/O related options.

我认为你应该专注于write_buffer,read_buffer,query_cache_size以及其他与RAM和I / O相关的选项。

2. You probably need faster storage device. What are you using now?

2.您可能需要更快的存储设备。你现在在用什么?

For database big like this - you should use RAID-5 array with fast and modern hard disks.

对于像这样大的数据库 - 你应该使用具有快速和现代硬盘的RAID-5阵列。

Maybe your configuration is enough for everyday tasks, but what about backups and crysis situations?

也许您的配置足以满足日常任务,但是备份和孤岛危机的情况又如何呢?

Creating backup and restoring database big like this will take too much time on machine, that needs 18 hours for simple insert import.

像这样创建备份和恢复数据库将花费太多时间在机器上,简单的插入导入需要18个小时。

I know that 95GB is really big text file, but... i think you should use hardware that is able to do simple operations like this in max 2-3 hours.

我知道95GB是真正的大文本文件,但是...我认为你应该使用能够在最多2-3小时内完成这样简单操作的硬件。

#5


0  

You can try to use MySQLTuner - High Performance MySQL Tuning Script written in perl which helps you with your MySQL configuration and make recommendations for increased performance and stability.

您可以尝试使用MySQLTuner - 用perl编写的高性能MySQL调优脚本,它可以帮助您进行MySQL配置,并提供更高性能和稳定性的建议。