如何在不减速的情况下从mysql表中删除数百万条记录

时间:2022-10-21 22:25:47

Is there a good way to delete lots of records without slowing down a website?

是否有一种很好的方法可以删除大量的记录而不会减慢网站的速度?

I need to delete millions of records from a MySQL table that has no indexes and no primary key. I read on SO and various tutorials on the web that the basic strategy is to limit the delete query, sleep for a second or two between deletes, and repeat the process until finished. I also (using PDO) am running a commit after all loops complete.

我需要从一个没有索引和主键的MySQL表中删除数百万条记录。我在网上读了SO和各种教程,其中的基本策略是限制删除查询,在删除之间休眠一两秒钟,然后重复这个过程直到完成。我还(使用PDO)在所有循环完成后运行commit。

That worked fine last week, but every time I ran the script, the database slowed down and we received many complaints about the site being slow etc. This is on a Miva Merchant baskets table, not that it really matters.

这在上周运行得很好,但是每次我运行脚本时,数据库都变慢了,我们收到了很多关于站点变慢的抱怨。

I'm almost done trimming the table so I could just suffer through it and finish. But there must be a better way...?

我几乎完成了对桌子的修整,这样我就可以忍受并完成它。但一定有更好的办法……

Here is the relevant code:

以下是相关代码:

$database->beginTransaction();
$selectLimit = 4900; // mysql will lock the entire table at 5000+.....
$loopLimit = 10;
$date = "1456272001"; // 2016-02-24

for( $i = 0; $i < $loopLimit; $i++ ) {
    $startTime = time();
    $oldBaskets = $database->prepare("DELETE FROM s01_Baskets WHERE CAST(lastupdate AS UNSIGNED) < '" . $date . "' LIMIT " . $selectLimit . "");
    if ( $oldBaskets->execute() ) {
        $deletes = $oldBaskets->rowCount();
        $totalDeletes += $deletes;
        $duration = time() - $startTime;
        echo "\ndeleted '" . $deletes . "' entries";
        echo "\n-- took '" . $duration . "' seconds";
    }
    sleep(2);
}
$database->commit();

2 个解决方案

#1


2  

Create an index on lastupdate and modify your query a little:

在lastupdate上创建一个索引并稍微修改您的查询:

DELETE
FROM    s01_Baskets
WHERE   lastupdate < :date
ORDER BY
        lastupdate
LIMIT   :limit

Having an index on lastupdate will allow MySQL to use it both for ordering and filtering, so only the records which have to be deleted will be visited by the engine.

在lastupdate上有一个索引将允许MySQL同时使用它进行排序和过滤,因此只有必须删除的记录才会被引擎访问。

Without an index, MySQL has to examine all the records in your database as it reads them, before it reaches the limit.

在没有索引的情况下,MySQL必须在读取数据之前检查数据库中的所有记录。

Using CAST on an indexed field in MySQL makes the expression unsargable (unable to use the index for filtering), that's why you should convert the expression you are comparing against ($date), not vice versa.

在MySQL中的索引字段上使用CAST可以使表达式不可sargable(无法使用索引进行过滤),这就是为什么应该将正在比较的表达式转换为($date),而不是相反。

#2


0  

Since it sounds like you have no indexes and no auto-incremented IDs I would personally go for direct SQL like this:

因为听起来您没有索引,也没有自动递增的id,所以我个人会使用如下的直接SQL:

Note: You should probably do this when there is minimal activity on the system

注意:当系统上的活动很小时,您应该这样做

RENAME TABLE s01_Baskets TO s01_Baskets_to_be_deleted;

CREATE TABLE s01_Baskets LIKE s01_Baskets_to_be_deleted;

INSERT INTO s01_Baskets (col1, col2, ..., coln)
SELECT *
FROM s01_Baskets_to_be_deleted
WHERE lastupdate >= '2016-02-24 00:00:00';

DROP TABLE s01_Baskets_to_be_deleted;

The first two should execute relatively quickly and your users will not notice a slowdown. All of their interaction will simply be routed to your new empty table.

前两个应该执行得比较快,您的用户不会注意到放缓。它们的所有交互都将被路由到新的空表。

The third command will re-insert the records your wish to keep.

第三个命令将重新插入您希望保存的记录。

As for the DROP command, it might slow down the DB a little in terms of disk I/O but since none of the records are being interacted with then your users should experience almost no slowdown.

至于DROP命令,它可能会在磁盘I/O方面稍微降低DB,但由于没有任何记录正在与之交互,因此您的用户应该不会感到任何放缓。


Also, another reason that deletion is so slow and intense is because MySQL will log each row and if you have any active triggers then those must be executed before the delete can be performed.

此外,删除速度如此缓慢和强烈的另一个原因是MySQL会记录每一行,如果您有任何活动触发器,那么这些触发器必须在执行删除之前执行。

#1


2  

Create an index on lastupdate and modify your query a little:

在lastupdate上创建一个索引并稍微修改您的查询:

DELETE
FROM    s01_Baskets
WHERE   lastupdate < :date
ORDER BY
        lastupdate
LIMIT   :limit

Having an index on lastupdate will allow MySQL to use it both for ordering and filtering, so only the records which have to be deleted will be visited by the engine.

在lastupdate上有一个索引将允许MySQL同时使用它进行排序和过滤,因此只有必须删除的记录才会被引擎访问。

Without an index, MySQL has to examine all the records in your database as it reads them, before it reaches the limit.

在没有索引的情况下,MySQL必须在读取数据之前检查数据库中的所有记录。

Using CAST on an indexed field in MySQL makes the expression unsargable (unable to use the index for filtering), that's why you should convert the expression you are comparing against ($date), not vice versa.

在MySQL中的索引字段上使用CAST可以使表达式不可sargable(无法使用索引进行过滤),这就是为什么应该将正在比较的表达式转换为($date),而不是相反。

#2


0  

Since it sounds like you have no indexes and no auto-incremented IDs I would personally go for direct SQL like this:

因为听起来您没有索引,也没有自动递增的id,所以我个人会使用如下的直接SQL:

Note: You should probably do this when there is minimal activity on the system

注意:当系统上的活动很小时,您应该这样做

RENAME TABLE s01_Baskets TO s01_Baskets_to_be_deleted;

CREATE TABLE s01_Baskets LIKE s01_Baskets_to_be_deleted;

INSERT INTO s01_Baskets (col1, col2, ..., coln)
SELECT *
FROM s01_Baskets_to_be_deleted
WHERE lastupdate >= '2016-02-24 00:00:00';

DROP TABLE s01_Baskets_to_be_deleted;

The first two should execute relatively quickly and your users will not notice a slowdown. All of their interaction will simply be routed to your new empty table.

前两个应该执行得比较快,您的用户不会注意到放缓。它们的所有交互都将被路由到新的空表。

The third command will re-insert the records your wish to keep.

第三个命令将重新插入您希望保存的记录。

As for the DROP command, it might slow down the DB a little in terms of disk I/O but since none of the records are being interacted with then your users should experience almost no slowdown.

至于DROP命令,它可能会在磁盘I/O方面稍微降低DB,但由于没有任何记录正在与之交互,因此您的用户应该不会感到任何放缓。


Also, another reason that deletion is so slow and intense is because MySQL will log each row and if you have any active triggers then those must be executed before the delete can be performed.

此外,删除速度如此缓慢和强烈的另一个原因是MySQL会记录每一行,如果您有任何活动触发器,那么这些触发器必须在执行删除之前执行。