在滚动的基础上删除MySQL的旧行最好的方法是什么?

时间:2022-09-16 14:30:42

I find myself wanting to delete rows older than (x)-days on a rolling basis in a lot of applications. What is the best way to do this most efficiently on a high-traffic table?

我发现自己想要在很多应用程序中以滚动的方式删除比(x)天更早的行。在高通信量的桌子上,最有效的方法是什么?

For instance, if I have a table that stores notifications and I only want to keep these for 7 days. Or high scores that I only want to keep for 31 days.

例如,如果我有一个存储通知的表,我只想保存7天。或者是我想要保持31天的高分。

Right now I keep a row storing the epoch time posted and run a cron job that runs once per hour and deletes them in increments like this:

现在,我保留了一个存储已发布的历元时间的行,并运行一个每小时运行一次的cron作业,并以这样的增量删除它们:

DELETE FROM my_table WHERE time_stored < 1234567890 LIMIT 100

I do that until mysql_affected_rows returns 0.

我这样做,直到mysql_affected_rows返回0。

I used to do it all at once but that caused everything in the application to hang for 30 seconds or so while INSERTS piled up. Adding the LIMIT worked to alleviate this but I'm wondering if there is a better way to do this.

我曾经一次完成所有这些操作,但这导致应用程序中的所有内容在插入堆积时挂起30秒左右。增加限制可以缓解这个问题,但是我想知道是否有更好的方法来解决这个问题。

6 个解决方案

#1


24  

Check out MySQL Partitioning:

查看MySQL分区:

Data that loses its usefulness can often be easily removed from a partitioned table by dropping the partition (or partitions) containing only that data. Conversely, the process of adding new data can in some cases be greatly facilitated by adding one or more new partitions for storing specifically that data.

丢失有用性的数据通常可以通过删除只包含该数据的分区(或分区)从分区表中删除。相反,添加新数据的过程在某些情况下可以通过添加一个或多个新分区来存储特定的数据而大大简化。

See e.g. this post to get some ideas on how to apply it:

参见下面这篇文章,了解如何应用它:

Using Partitioning and Event Scheduler to Prune Archive Tables

使用分区和事件调度器删除归档表

And this one:

这一个:

Partitioning by dates: the quick how-to

按日期划分:快速操作

#2


52  

Try creating Event that will run on database automatically after the time interval you want.

尝试创建在您想要的时间间隔之后自动在数据库上运行的事件。

Here is an Example: If you want to delete entries that are more than 30 days old from some table 'tableName', having column entry 'datetime'. Then following query runs every day which will do required clean-up action.

这里有一个例子:如果您想要删除从某个表“tableName”中超过30天的条目,有列条目“datetime”。然后,每天执行以下查询,这将执行所需的清理操作。

CREATE EVENT AutoDeleteOldNotifications
ON SCHEDULE AT CURRENT_TIMESTAMP + INTERVAL 1 DAY 
ON COMPLETION PRESERVE
DO 
DELETE LOW_PRIORITY FROM databaseName.tableName WHERE datetime < DATE_SUB(NOW(), INTERVAL 30 DAY)

We need to add ON COMPLETION PRESERVE to keep the event after each run. You can find more info here: http://www.mysqltutorial.org/mysql-triggers/working-mysql-scheduled-event/

我们需要在每次运行后添加完成保存以保存事件。您可以在这里找到更多信息:http://www.mysqltutorial.org/mysql-triggers/working-mysql-scheduled-event/

#3


2  

Instead of executing the delete against the table alone, try gathering the matching keys first and then do a DELETE JOIN

与其单独对表执行delete,不如先收集匹配的键,然后执行delete连接

Given you sample query above

上面给出的示例查询

DELETE FROM my_table WHERE time_stored < 1234567890 LIMIT 100 ;

You can leave the LIMIT out of it.

你可以不加限制。

Let say you want to delete data that over 31 days old.

假设您想要删除超过31天的数据。

Let's compute 31 days in seconds (86400 X 31 = 2678400)

我们以秒为单位计算31天(86400 X 31 = 2678400)

  • Start with key gathering
  • 开始与主要收集
  • Next, index the keys
  • 接下来,索引键
  • Then, perform DELETE JOIN
  • 然后,执行删除连接
  • Finally, drop the gathered keys
  • 最后,删除收集的密钥。

Here is the algorithm

这是算法

CREATE TABLE delete_keys SELECT id FROM my_table WHERE 1=2;
INSERT INTO delete_keys
SELECT id FROM
(
    SELECT id FROM my_table
    WHERE time_stored < (UNIX_TIMESTAMP() - 2678400)
    ORDER BY time_stored
) A LIMIT 100;
ALTER TABLE delete_keys ADD PRIMARY KEY (id);
DELETE B.* FROM delete_keys
INNER JOIN my_table B USING (id);
DROP TABLE delete_keys;

If the key gathering is less than 5 minutes, then run this query every 5 minutes.

如果密钥收集少于5分钟,则每5分钟运行此查询。

Give it a Try !!!

试试看!!!

UPDATE 2012-02-27 16:55 EDT

Here is something that should speed up key gathering a little more. Add the following index:

这里有一些东西应该加快收集更多的钥匙。添加以下指标:

ALTER TABLE my_table ADD INDEX time_stored_id_ndx (time_stored,id);

This will better support the subquery that populates the delete_keys table because this provides a covering index so that the fields are retrieved frok the index only.

这将更好地支持填充delete_keys表的子查询,因为这提供了一个覆盖索引,以便仅检索字段frok索引。

UPDATE 2012-02-27 16:59 EDT

Since you have to delete often, you may want to try this every two months

由于您必须经常删除,所以您可能需要每两个月尝试一次

OPTIMIZE TABLE my_table;

This will defrag the table after all those annoying little deletes every 5 minutes for two months

在连续两个月每隔5分钟删除一次这些烦人的小删除之后,这将删除表

#4


1  

At my company, we have a similar situation. We have a table that contains keys that have an expiration. We have a cron that runs to clean that out:

在我的公司,我们也有类似的情况。我们有一个包含有过期键的表。我们有一个cron,用来清理

DELETE FROM t1 WHERE expiration < UNIXTIME(NOW());

This ran once an hour, but we were having similar issues to what you are experiencing. We increased it to once per minute. Then 6 times per minute. Setup a cron with a bash script that basically does the query, then sleeps for a few seconds and repeats until the minute is up.

这是一个小时,但我们对你正在经历的事情有类似的问题。我们把它增加到每分钟一次。然后每分钟6次。使用bash脚本设置一个cron,该脚本主要执行查询,然后休眠几秒钟并重复,直到分钟结束。

The increased frequency significantly decreased the number of rows that we were deleting. Which relieved the contention. This is the route that I would go.

增加的频率显著减少了我们正在删除的行数。松了一口气的争用。这是我要走的路。

However, if you find that you still have too many rows to delete, use the limit and do a sleep between them. For example, if you have 50k rows to delete, do a 10k chunk with a 2 second sleep between them. This will help the queries from stacking up, and it will allow the server to perform some normal operations between these bulk deletes.

但是,如果您发现仍然有太多的行要删除,请使用限制,并在它们之间进行睡眠。例如,如果有50k行要删除,那么做10k块,在它们之间休息2秒。这将有助于查询的堆叠,并允许服务器在这些块删除之间执行一些常规操作。

#5


1  

You may want to consider introducing a master/slave (replication) solution into your design. If you shift all the read traffic to the slave, you open up the master to handle 'on-the-fly' CRUD activities, which then replicate down to the slave (your read server).

您可能需要考虑在设计中引入主/从(replication)解决方案。如果将所有读取的流量转移到从服务器,则打开主服务器以处理“动态”CRUD活动,然后将这些活动复制到从服务器(您的读取服务器)。

And because you are deleting so many records you may want to consider running an optimize on the table(s) from where the rows are being deleted.

由于您正在删除如此多的记录,您可能需要考虑在删除行所在的表上运行一个优化。

#6


0  

Ended up using this to leave only 100 last rows in place, so significant lag when executed frequently (every minute)

最后,使用它只保留最后100行,因此在频繁执行时(每分钟)存在显著的延迟

delete a from tbl a left join (
    select ID
    from tbl
    order by id desc limit 100
) b on a.ID = b.ID
where b.ID is null;

#1


24  

Check out MySQL Partitioning:

查看MySQL分区:

Data that loses its usefulness can often be easily removed from a partitioned table by dropping the partition (or partitions) containing only that data. Conversely, the process of adding new data can in some cases be greatly facilitated by adding one or more new partitions for storing specifically that data.

丢失有用性的数据通常可以通过删除只包含该数据的分区(或分区)从分区表中删除。相反,添加新数据的过程在某些情况下可以通过添加一个或多个新分区来存储特定的数据而大大简化。

See e.g. this post to get some ideas on how to apply it:

参见下面这篇文章,了解如何应用它:

Using Partitioning and Event Scheduler to Prune Archive Tables

使用分区和事件调度器删除归档表

And this one:

这一个:

Partitioning by dates: the quick how-to

按日期划分:快速操作

#2


52  

Try creating Event that will run on database automatically after the time interval you want.

尝试创建在您想要的时间间隔之后自动在数据库上运行的事件。

Here is an Example: If you want to delete entries that are more than 30 days old from some table 'tableName', having column entry 'datetime'. Then following query runs every day which will do required clean-up action.

这里有一个例子:如果您想要删除从某个表“tableName”中超过30天的条目,有列条目“datetime”。然后,每天执行以下查询,这将执行所需的清理操作。

CREATE EVENT AutoDeleteOldNotifications
ON SCHEDULE AT CURRENT_TIMESTAMP + INTERVAL 1 DAY 
ON COMPLETION PRESERVE
DO 
DELETE LOW_PRIORITY FROM databaseName.tableName WHERE datetime < DATE_SUB(NOW(), INTERVAL 30 DAY)

We need to add ON COMPLETION PRESERVE to keep the event after each run. You can find more info here: http://www.mysqltutorial.org/mysql-triggers/working-mysql-scheduled-event/

我们需要在每次运行后添加完成保存以保存事件。您可以在这里找到更多信息:http://www.mysqltutorial.org/mysql-triggers/working-mysql-scheduled-event/

#3


2  

Instead of executing the delete against the table alone, try gathering the matching keys first and then do a DELETE JOIN

与其单独对表执行delete,不如先收集匹配的键,然后执行delete连接

Given you sample query above

上面给出的示例查询

DELETE FROM my_table WHERE time_stored < 1234567890 LIMIT 100 ;

You can leave the LIMIT out of it.

你可以不加限制。

Let say you want to delete data that over 31 days old.

假设您想要删除超过31天的数据。

Let's compute 31 days in seconds (86400 X 31 = 2678400)

我们以秒为单位计算31天(86400 X 31 = 2678400)

  • Start with key gathering
  • 开始与主要收集
  • Next, index the keys
  • 接下来,索引键
  • Then, perform DELETE JOIN
  • 然后,执行删除连接
  • Finally, drop the gathered keys
  • 最后,删除收集的密钥。

Here is the algorithm

这是算法

CREATE TABLE delete_keys SELECT id FROM my_table WHERE 1=2;
INSERT INTO delete_keys
SELECT id FROM
(
    SELECT id FROM my_table
    WHERE time_stored < (UNIX_TIMESTAMP() - 2678400)
    ORDER BY time_stored
) A LIMIT 100;
ALTER TABLE delete_keys ADD PRIMARY KEY (id);
DELETE B.* FROM delete_keys
INNER JOIN my_table B USING (id);
DROP TABLE delete_keys;

If the key gathering is less than 5 minutes, then run this query every 5 minutes.

如果密钥收集少于5分钟,则每5分钟运行此查询。

Give it a Try !!!

试试看!!!

UPDATE 2012-02-27 16:55 EDT

Here is something that should speed up key gathering a little more. Add the following index:

这里有一些东西应该加快收集更多的钥匙。添加以下指标:

ALTER TABLE my_table ADD INDEX time_stored_id_ndx (time_stored,id);

This will better support the subquery that populates the delete_keys table because this provides a covering index so that the fields are retrieved frok the index only.

这将更好地支持填充delete_keys表的子查询,因为这提供了一个覆盖索引,以便仅检索字段frok索引。

UPDATE 2012-02-27 16:59 EDT

Since you have to delete often, you may want to try this every two months

由于您必须经常删除,所以您可能需要每两个月尝试一次

OPTIMIZE TABLE my_table;

This will defrag the table after all those annoying little deletes every 5 minutes for two months

在连续两个月每隔5分钟删除一次这些烦人的小删除之后,这将删除表

#4


1  

At my company, we have a similar situation. We have a table that contains keys that have an expiration. We have a cron that runs to clean that out:

在我的公司,我们也有类似的情况。我们有一个包含有过期键的表。我们有一个cron,用来清理

DELETE FROM t1 WHERE expiration < UNIXTIME(NOW());

This ran once an hour, but we were having similar issues to what you are experiencing. We increased it to once per minute. Then 6 times per minute. Setup a cron with a bash script that basically does the query, then sleeps for a few seconds and repeats until the minute is up.

这是一个小时,但我们对你正在经历的事情有类似的问题。我们把它增加到每分钟一次。然后每分钟6次。使用bash脚本设置一个cron,该脚本主要执行查询,然后休眠几秒钟并重复,直到分钟结束。

The increased frequency significantly decreased the number of rows that we were deleting. Which relieved the contention. This is the route that I would go.

增加的频率显著减少了我们正在删除的行数。松了一口气的争用。这是我要走的路。

However, if you find that you still have too many rows to delete, use the limit and do a sleep between them. For example, if you have 50k rows to delete, do a 10k chunk with a 2 second sleep between them. This will help the queries from stacking up, and it will allow the server to perform some normal operations between these bulk deletes.

但是,如果您发现仍然有太多的行要删除,请使用限制,并在它们之间进行睡眠。例如,如果有50k行要删除,那么做10k块,在它们之间休息2秒。这将有助于查询的堆叠,并允许服务器在这些块删除之间执行一些常规操作。

#5


1  

You may want to consider introducing a master/slave (replication) solution into your design. If you shift all the read traffic to the slave, you open up the master to handle 'on-the-fly' CRUD activities, which then replicate down to the slave (your read server).

您可能需要考虑在设计中引入主/从(replication)解决方案。如果将所有读取的流量转移到从服务器,则打开主服务器以处理“动态”CRUD活动,然后将这些活动复制到从服务器(您的读取服务器)。

And because you are deleting so many records you may want to consider running an optimize on the table(s) from where the rows are being deleted.

由于您正在删除如此多的记录,您可能需要考虑在删除行所在的表上运行一个优化。

#6


0  

Ended up using this to leave only 100 last rows in place, so significant lag when executed frequently (every minute)

最后,使用它只保留最后100行,因此在频繁执行时(每分钟)存在显著的延迟

delete a from tbl a left join (
    select ID
    from tbl
    order by id desc limit 100
) b on a.ID = b.ID
where b.ID is null;