使用子查询在mysql基础上删除非常慢

时间:2022-09-10 00:51:25

This mysql query is running for around 10 hours and has not finished. Something is horribly wrong.

这个mysql查询运行了大约10个小时,但还没有完成。有些事情是可怕的错误。

Two tables (text and spam) are here. Spam stores the ids of spam entrys in text that I want to delete.

两个表(文本和垃圾邮件)在这里。垃圾邮件将垃圾邮件的ID存储在我要删除的文本中。

DELETE FROM tname.text WHERE old_id IN (SELECT textid FROM spam);

spam has just 2 columns, both are ints. 800K entries has a file size of several Mbs. Both ints are primary keys.

垃圾邮件只有2列,都是整数。 800K条目的文件大小为几Mbs。两个int都是主键。

text has 3 columns. id (prim key), text, flags. around 1200K entries, and around 2.1 gigabyte size (most spam).

文本有3列。 id(prim键),文本,标志。大约1200K条目,大约2.1千兆字节(大多数垃圾邮件)。

The server is a xeon quad, 2 gigabyte ram (don't ask me why). Only apache (why?) and mysqld is running. Its an old free bsd and mysql 4.1.2 (don't ask me why)

服务器是一个至强四核,2千兆字节的ram(不要问我为什么)。只有apache(为什么?)和mysqld正在运行。它是一个旧的免费bsd和mysql 4.1.2(不要问我为什么)

Threads: 6 Questions: 188805 Slow queries: 318 Opens: 810 Flush tables: 1 Open tables: 157 Queries per second avg: 7.532

主题:6个问题:188805慢查询:318打开:810刷新表:1打开表:157每秒查询数:7.532

Mysql my.cnf:

[mysqld]
datadir=/usr/local/mysql
log-error=/usr/local/mysql/mysqld.err
pid-file=/usr/local/mysql/mysqld.pid
tmpdir=/var/tmp
innodb_data_home_dir =
innodb_log_files_in_group = 2
join_buffer_size=2M
key_buffer_size=32M
max_allowed_packet=1M
max_connections=800
myisam_sort_buffer_size=32M
query_cache_size=8M
read_buffer_size=2M
sort_buffer_size=2M
table_cache=256
skip-bdb
log-slow-queries = slow.log
long_query_time = 1

#skip-innodb
#default-table-type=innodb
innodb_data_file_path = /usr/local/mysql/ibdata1:10M:autoextend
innodb_log_group_home_dir = /usr/local/mysql/
innodb_buffer_pool_size = 128M
innodb_log_file_size = 16M
innodb_log_buffer_size = 8M
#innodb_flush_log_at_trx_commit=1
#innodb_additional_mem_pool_size=1M
#innodb_lock_wait_timeout=50

log-bin
server-id=201

[isamchk]
key_buffer_size=128M
read_buffer_size=128M
write_buffer_size=128M
sort_buffer_size=128M

[myisamchk]
key_buffer_size=128M[server:~] dmesg | grep memory
real memory  = 2146828288 (2047 MB)
avail memory = 2095534080 (1998 MB)

read_buffer_size=128M
write_buffer_size=128M
sort_buffer_size=128M
tmpdir=/var/tmp

The query is using just one cpu, top says 25% cpu time (so 1 of 4).

查询只使用一个cpu,top表示25%的cpu时间(所以4个中有1个)。

real memory  = 2146828288 (2047 MB)
avail memory = 2095534080 (1998 MB)

62 processes:  2 running, 60 sleeping
CPU states: 25.2% user,  0.0% nice,  1.6% system,  0.0% interrupt, 73.2% idle
Mem: 244M Active, 1430M Inact, 221M Wired, 75M Cache, 112M Buf, 31M Free
Swap: 4096M Total, 1996K Used, 4094M Free

  PID USERNAME     THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
11536 mysql         27  20    0   239M   224M kserel 3 441:16 94.29% mysqld

Any idea how to fix it?

知道怎么解决吗?

4 个解决方案

#1


12  

In my experience sub queries are often a cause of slow execution times in SQL statements, therefor I try to avoid them. Try this:

根据我的经验,子查询通常是SQL语句执行时间慢的原因,因此我尽量避免使用它们。试试这个:

DELETE tname FROM tname INNER JOIN spam ON (tname.old_id = spam.textid);

Disclaimer: This query is not tested, make backups first! :-)

免责声明:此查询未经过测试,请先备份! :-)

#2


5  

Your choice of where id in (select ...) will always perform poorly.

你选择id(选择......)的地方总是表现不佳。

Instead, use a normal join which will be very efficient:

相反,使用一个非常有效的普通连接:

DELETE `text` 
FROM spam
join `text` on `text`.old_id = spam.textid;

Notice selection from spam first, then joining to text, which will give the best performance.

首先注意从垃圾邮件中选择,然后加入文本,这将提供最佳性能。

#3


1  

Copy rows that are not in spam form text to new table. Then delete text table and rename created table. Good idea is not to add any keys to created table. Add keys after renaming.

将不是垃圾邮件表单文本的行复制到新表。然后删除文本表并重命名创建的表。好主意是不要为创建的表添加任何键。重命名后添加密钥。

#4


0  

of corse it will take a lot of time because it execute the subquery for every record but by using INNER JOIN directly this query is executed only one time lets think that the query will take

corse它将花费大量时间,因为它为每条记录执行子查询但是直接使用INNER JOIN这个查询只执行一次让我们认为查询将采取

10 ms for 50000 rec  full time = 50000 * 10 ms ---> 8.333 minutes !! at least don't forget the condition and deleting time .....

but using join the query will be executed only one time :

但是使用join只会执行一次查询:

DELETE t FROM tname.text t INNER JOIN (SELECT textid FROM spam) sq on t.old_id = sq.textid ;

#1


12  

In my experience sub queries are often a cause of slow execution times in SQL statements, therefor I try to avoid them. Try this:

根据我的经验,子查询通常是SQL语句执行时间慢的原因,因此我尽量避免使用它们。试试这个:

DELETE tname FROM tname INNER JOIN spam ON (tname.old_id = spam.textid);

Disclaimer: This query is not tested, make backups first! :-)

免责声明:此查询未经过测试,请先备份! :-)

#2


5  

Your choice of where id in (select ...) will always perform poorly.

你选择id(选择......)的地方总是表现不佳。

Instead, use a normal join which will be very efficient:

相反,使用一个非常有效的普通连接:

DELETE `text` 
FROM spam
join `text` on `text`.old_id = spam.textid;

Notice selection from spam first, then joining to text, which will give the best performance.

首先注意从垃圾邮件中选择,然后加入文本,这将提供最佳性能。

#3


1  

Copy rows that are not in spam form text to new table. Then delete text table and rename created table. Good idea is not to add any keys to created table. Add keys after renaming.

将不是垃圾邮件表单文本的行复制到新表。然后删除文本表并重命名创建的表。好主意是不要为创建的表添加任何键。重命名后添加密钥。

#4


0  

of corse it will take a lot of time because it execute the subquery for every record but by using INNER JOIN directly this query is executed only one time lets think that the query will take

corse它将花费大量时间,因为它为每条记录执行子查询但是直接使用INNER JOIN这个查询只执行一次让我们认为查询将采取

10 ms for 50000 rec  full time = 50000 * 10 ms ---> 8.333 minutes !! at least don't forget the condition and deleting time .....

but using join the query will be executed only one time :

但是使用join只会执行一次查询:

DELETE t FROM tname.text t INNER JOIN (SELECT textid FROM spam) sq on t.old_id = sq.textid ;