删除带有索引的大型MySQL表的性能

时间:2022-12-04 16:55:13

Let's say we have a web forum application with a MySQL 5.6 database that are accessed 24/7 by many many users. Now there is a table like this for metadata of notifications sent to users.

假设我们有一个带有MySQL 5.6数据库的Web论坛应用程序,许多用户可以全天候访问这些数据库。现在有一个这样的表用于发送给用户的通知的元数据。

| notifications | CREATE TABLE `notifications` (
 `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
 `user_id` bigint(20) unsigned NOT NULL,
 `message_store_id` bigint(20) unsigned NOT NULL,
 `status` varchar(10) COLLATE ascii_bin NOT NULL,
 `sent_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 PRIMARY KEY (`id`),
 KEY `user_id` (`user_id`,`sent_date`)
) ENGINE=InnoDB AUTO_INCREMENT=736601 DEFAULT CHARSET=ascii COLLATE=ascii_bin |

This table has 1 million rows. With this table, a certain message_store_id becomes suddenly ineffective for some reason and I'm planning to remove all of records with that message_store_id with a single delete statement like

该表有100万行。使用此表,某个message_store_id由于某种原因突然变得无效,我打算用一个删除语句删除带有该message_store_id的所有记录

DELETE FROM notifications WHERE message_store_id = 12345;

This single statement affects 10% of the table since this message was sent to so many users. Meanwhile this notifications tables are accessed all the time by thousands of users, so the index must be present. Apparently index recreation is very costly when deleting records, so I'm afraid to do that and cause down time by maxing out the server resources. However, if I drop the index, delete the records then add an index again, I have to shut down the database for some time, unfortunately it is not possible for our service.

此单一语句影响表的10%,因为此消息已发送给这么多用户。同时,成千上万的用户一直在访问这些通知表,因此索引必须存在。显然,索引重新创建在删除记录时非常昂贵,所以我害怕这样做,并通过最大化服务器资源来减少时间。但是,如果我删除索引,删除记录然后再次添加索引,我必须关闭数据库一段时间,不幸的是我们的服务是不可能的。

I wish MySQL 5.6 is not so stupid that this single statement can kill the database, but I guess it's very likely. My question is, is the index recreation really fatal for a case like this? If so, is there any good strategy for this operation that doesn't require me to halt the database for the maintenance?

我希望MySQL 5.6不是那么愚蠢,这个单一语句可以杀死数据库,但我想这很有可能。我的问题是,对于像这样的案例,索引娱乐真的是致命的吗?如果是这样,这个操作是否有任何好的策略,不需要我停止维护数据库?

2 个解决方案

#1


3  

There can be a lot of tricks/strategies you could employ depending on details of your application.

根据您的申请细节,您可以采用许多技巧/策略。

  1. If you plan to do these operations on a regular basis (e.g. it's not a one-time thing), AND you have few distinct values in message_store_id, you can use partitions. Partition by value of message_store_id, create X partitions beforehand (where X is some reasonable cap on the amount of values for the id), and then you can delete all the records in that partition in an instant by truncating that partition. A matter of milliseconds. Downside: message_store_id will have to be a part of primary key. Note: you'll have to create partitions beforehand, because the last time I worked with them, alter table add partition re-created the entire table, which is a disaster on large tables.
  2. 如果您打算定期执行这些操作(例如,它不是一次性的事情),并且您在message_store_id中几乎没有不同的值,则可以使用分区。按message_store_id的值进行分区,事先创建X分区(其中X是id值的一些合理上限),然后您可以通过截断该分区立即删除该分区中的所有记录。几毫秒。缺点:message_store_id必须是主键的一部分。注意:您必须事先创建分区,因为我上次使用它们时,alter table add partition重新创建了整个表,这对大型表来说是一场灾难。

  3. Even if the alter table truncate partition does not work for you, you can still benefit from partitioning. If you issue a DELETE on the partition, by supplying corresponding where condition, the rest of the table will not be affected/locked by this DELETE op.
  4. 即使alter table截断分区不适合您,您仍然可以从分区中受益。如果在分区上发出DELETE,则通过提供相应的where条件,该DELETE操作不会影响/锁定表的其余部分。

  5. Alternative way of deleting records without locking the DB for too long:

    删除记录而不锁定DB太长时间的替代方法:

    while (true) {
      // assuming autocommit mode
      delete from table where {your condition} limit 10000;
      // at this moment locks are released and other transactions have a chance
      // to do some stuff.
      if (affected rows == 0) {
        break;
      }
      // This is a good place to insert sleep(5) to give other transactions
      // more time to do their stuff before the next chunk gets deleted.
    }
    

#2


0  

One option is to perform the delete as several smaller operations, rather than one huge operation.

一种选择是将删除执行为几个较小的操作,而不是一个大的操作。

MySQL provides a LIMIT clause, which will limit the number of rows matched by the query.

MySQL提供了一个LIMIT子句,它将限制查询匹配的行数。

For example, you could delete just 1000 rows:

例如,您只能删除1000行:

DELETE FROM notifications WHERE message_store_id = 12345 LIMIT 1000;

You could repeat that, leaving a suitable window of time for other operations (competing for locks on the same table) to complete. To handle this in pure SQL, we can use the MySQL SLEEP() function, to pause for 2 seconds, for example:

您可以重复这一点,为其他操作留下合适的时间窗口(在同一个表上竞争锁定)以完成。要在纯SQL中处理这个问题,我们可以使用MySQL SLEEP()函数暂停2秒,例如:

SELECT SLEEP(2);

And obviously, this can be incorporated into a loop, in a MySQL procedure, for example, continuing to loop until the DELETE statement affects zero rows.

显然,这可以合并到一个循环中,例如,在MySQL过程中,继续循环直到DELETE语句影响零行。

#1


3  

There can be a lot of tricks/strategies you could employ depending on details of your application.

根据您的申请细节,您可以采用许多技巧/策略。

  1. If you plan to do these operations on a regular basis (e.g. it's not a one-time thing), AND you have few distinct values in message_store_id, you can use partitions. Partition by value of message_store_id, create X partitions beforehand (where X is some reasonable cap on the amount of values for the id), and then you can delete all the records in that partition in an instant by truncating that partition. A matter of milliseconds. Downside: message_store_id will have to be a part of primary key. Note: you'll have to create partitions beforehand, because the last time I worked with them, alter table add partition re-created the entire table, which is a disaster on large tables.
  2. 如果您打算定期执行这些操作(例如,它不是一次性的事情),并且您在message_store_id中几乎没有不同的值,则可以使用分区。按message_store_id的值进行分区,事先创建X分区(其中X是id值的一些合理上限),然后您可以通过截断该分区立即删除该分区中的所有记录。几毫秒。缺点:message_store_id必须是主键的一部分。注意:您必须事先创建分区,因为我上次使用它们时,alter table add partition重新创建了整个表,这对大型表来说是一场灾难。

  3. Even if the alter table truncate partition does not work for you, you can still benefit from partitioning. If you issue a DELETE on the partition, by supplying corresponding where condition, the rest of the table will not be affected/locked by this DELETE op.
  4. 即使alter table截断分区不适合您,您仍然可以从分区中受益。如果在分区上发出DELETE,则通过提供相应的where条件,该DELETE操作不会影响/锁定表的其余部分。

  5. Alternative way of deleting records without locking the DB for too long:

    删除记录而不锁定DB太长时间的替代方法:

    while (true) {
      // assuming autocommit mode
      delete from table where {your condition} limit 10000;
      // at this moment locks are released and other transactions have a chance
      // to do some stuff.
      if (affected rows == 0) {
        break;
      }
      // This is a good place to insert sleep(5) to give other transactions
      // more time to do their stuff before the next chunk gets deleted.
    }
    

#2


0  

One option is to perform the delete as several smaller operations, rather than one huge operation.

一种选择是将删除执行为几个较小的操作,而不是一个大的操作。

MySQL provides a LIMIT clause, which will limit the number of rows matched by the query.

MySQL提供了一个LIMIT子句,它将限制查询匹配的行数。

For example, you could delete just 1000 rows:

例如,您只能删除1000行:

DELETE FROM notifications WHERE message_store_id = 12345 LIMIT 1000;

You could repeat that, leaving a suitable window of time for other operations (competing for locks on the same table) to complete. To handle this in pure SQL, we can use the MySQL SLEEP() function, to pause for 2 seconds, for example:

您可以重复这一点,为其他操作留下合适的时间窗口(在同一个表上竞争锁定)以完成。要在纯SQL中处理这个问题,我们可以使用MySQL SLEEP()函数暂停2秒,例如:

SELECT SLEEP(2);

And obviously, this can be incorporated into a loop, in a MySQL procedure, for example, continuing to loop until the DELETE statement affects zero rows.

显然,这可以合并到一个循环中,例如,在MySQL过程中,继续循环直到DELETE语句影响零行。