如何检测MySQL中的死锁?在连接时,是什么原因导致我的应用程序挂起?

时间:2022-06-11 20:45:34

I have an application that is having some issues with the database: suddenly it freezes when it tries to open a connection to the database (or executing a query, this is not clear). There is no error message. I am suspecting that there is some query blocking others and I am trying to figure out what is that. I used

我有一个应用程序对数据库有一些问题:当它试图打开到数据库的连接时(或者执行查询时,这是不清楚的),它会突然冻结。没有错误消息。我怀疑有一些查询阻塞了其他查询,我正试图弄清楚那是什么。我使用

SET profiling=1;

but when I execute:

但是当我执行:

show profiles;

I get only the queries I executed myself, not the application queries (the application and I are using the same user).

我只得到我自己执行的查询,而不是应用程序查询(应用程序和我使用相同的用户)。

Calling

调用

 SHOW FULL PROCESSLIST;

Returns a table with all process.

返回一个包含所有进程的表。

+-----+----------+---------------------+--------+---------+------+-------+-----------------------+
| Id  | User     | Host                | db     | Command | Time | State | Info                  |
+-----+----------+---------------------+--------+---------+------+-------+-----------------------+
|   8 | user     | <HOST>              | DBs    | Sleep   |    3 |       | NULL                  |
| 722 | user     | <HOST>              | DBs    | Sleep   | 8205 |       | NULL                  |
| 726 | user     | <HOST>              | DBs    | Sleep   | 8212 |       | NULL                  |
| 727 | user     | <HOST>              | DBs    | Sleep   | 8205 |       | NULL                  |
| 728 | user     | <HOST>              | DBs    | Sleep   | 8205 |       | NULL                  |
| 730 | user     | <HOST>              | DBs    | Sleep   | 7172 |       | NULL                  |
| 732 | user     | <HOST>              | DBs    | Sleep   | 8095 |       | NULL                  |
| 733 | user     | <HOST>              | DBs    | Sleep   | 8055 |       | NULL                  |
| 735 | user     | <HOST>              | DBs    | Sleep   | 8075 |       | NULL                  |
| 736 | user     | <HOST>              | DBs    | Sleep   | 8075 |       | NULL                  |
| 737 | user     | <HOST>              | DBs    | Sleep   | 8035 |       | NULL                  |
| 738 | user     | <HOST>              | DBs    | Sleep   | 8015 |       | NULL                  |
| 740 | user     | <HOST>              | DBs    | Sleep   | 7995 |       | NULL                  |
| 741 | user     | <HOST>              | DBs    | Sleep   | 7975 |       | NULL                  |
| 742 | user     | <HOST>              | DBs    | Sleep   | 7955 |       | NULL                  |
| 774 | user     | <HOST>              | DBs    | Sleep   | 5772 |       | NULL                  |
| 779 | user     | <HOST>              | DBs    | Sleep   | 6068 |       | NULL                  |
| 806 | user     | <HOST>              | DBs    | Query   |    0 | init  | SHOW FULL PROCESSLIST |
+-----+----------+---------------------+--------+---------+------+-------+-----------------------+

Calling

调用

show engine innodb status

Returns a lot of transactions, some active, some not started. But no info about locked queries.

返回大量事务,一些活动,一些未启动。但是没有关于锁定查询的信息。

this query, that supposedly would give me information about blocked queries return an empty set:

这个查询可能会告诉我阻塞查询的信息返回空集:

SELECT r.trx_id waiting_trx_id, r.trx_mysql_thread_id waiting_thread, r.trx_query waiting_query, b.trx_id blocking_trx_id,  b.trx_mysql_thread_id blocking_thread, b.trx_query blocking_query FROM       information_schema.innodb_lock_waits w INNER JOIN information_schema.innodb_trx b  ON   b.trx_id = w.blocking_trx_id INNER JOIN information_schema.innodb_trx r  ON   r.trx_id = w.requesting_trx_id;

With all that information, can I have some guarantee that there is NO deadlock?

有了这些信息,我能保证没有死锁吗?

Would you have some guess about what could be happening so I can research about it?

你能猜到会发生什么吗?这样我就可以研究了。

Is there any way I can get more information about the processes?

有什么办法可以让我得到更多关于这些过程的信息?

I am new to DB administration and MySQL.

我是DB管理和MySQL的新手。

Thanks

谢谢

1 个解决方案

#1


3  

Delays for Lock Waits

A lock wait is probably what you mean. You can monitor for lock waits by enabling the slow-query log, collecting a bunch of logs, and then reviewing it. Here's an example:

锁等待可能就是你的意思。您可以通过启用慢查询日志来监视锁等待,收集一堆日志,然后检查它。这里有一个例子:

# Time: 140605 15:00:06
# User@Host: appuser[appuser] @  [127.0.0.1]  Id:    29
# Schema:   Last_errno: 0  Killed: 0
# Query_time: 0.011732  Lock_time: 0.000161  Rows_sent: 214  Rows_examined: 214  Rows_affected: 0
SET timestamp=1402005606;
SELECT ...blah blah blah...

You can see the field Lock_time above, which shows that the query was waiting for locks for 161 microseconds before it could begin executing. Then it took less than 12 milliseconds to execute (shown by Query_time).

您可以看到上面的字段Lock_time,它显示查询在开始执行之前等待了161微秒的锁。然后执行时间不到12毫秒(由Query_time显示)。

It's ordinary for Lock_time to be really small, often it's even off the scale, so it just shows as 0.000000. If it's getting into hundreds of milliseconds or more, that's unusual. If it's into whole seconds, you're in trouble.

Lock_time非常小是很正常的,通常它甚至超出了范围,所以它只显示了0.000000。如果时间超过几百毫秒或更多,这是不寻常的。如果它进入了整秒,你就有麻烦了。

Note that the slow-query log entry won't be written to the log unless Query_time exceeds your config variable long_query_time -- even if the Lock_time is large. For some more discussion on this, see http://www.mysqlperformanceblog.com/2012/11/22/get-me-some-query-logs/

注意,慢查询日志条目不会写入日志,除非Query_time超过了配置变量long_query_time——即使Lock_time很大。有关这个问题的更多讨论,请参见http://www.mysqlperformanceblog.com/2012/11/22/get-me- some-querylogs/。

Delays for Connections

You also mentioned it could be a delay caused by acquiring the connection, before you have run any query. You need to track down whether this is the case. It should be easy in any application language to read the time before and after the connection to the database, and compare them to see how long it takes. Some frameworks even provide this type of application-level profiling per query (or you can do it yourself).

您还提到,在运行任何查询之前,获取连接可能会导致延迟。你需要查明事实是否如此。在任何应用程序语言中,读取连接到数据库之前和之后的时间应该都很容易,然后比较它们,看看需要多长时间。一些框架甚至提供了这种类型的应用程序级别的查询(或者您自己也可以这样做)。

One common reason for delays on connection, for example, is that the MySQL Server is doing a reverse-DNS lookup to convert the incoming socket's IP address into a hostname. It does this so it can look up the hostname in the grant tables to figure out what privileges the user@host has. But if your DNS server is slow or overloaded, this can be slow. It's surprising that it would be more than a fraction of a second, but it's possible.

例如,延迟连接的一个常见原因是,MySQL服务器正在进行反向dns查询,以将传入的套接字的IP地址转换为主机名。它这样做是为了在grant表中查找主机名,以确定user@host有哪些特权。但是,如果您的DNS服务器速度很慢或超载,这可能会很慢。令人惊讶的是,它会超过几分之一秒,但这是可能的。

You can speed this up by setting the config variable skip_name_resolve. This means you cannot grant privileges to users based on hostname, you have to identify users by IP address only. Most production MySQL instances in the real world set skip_name_resolve.

可以通过设置配置变量skip_name_resolve来加快速度。这意味着您不能基于主机名向用户授予特权,您必须仅通过IP地址标识用户。现实世界中的大多数生产MySQL实例都设置skip_name_resolve。

There may also be other causes for slow connections, but first do some application profiling to determine conclusively whether it's the connection that is slow or a query.

可能还有其他原因导致连接速度变慢,但首先要做一些应用程序分析,以确定是连接变慢还是查询变慢。


P.S.: Lots of people say "deadlock" when they mean "lock wait." A deadlock is when two transactions are stuck waiting for each other's locks, and they cannot proceed. Deadlocks don't cause delays because InnoDB notices the cyclical dependency immediately and kills one of the transactions. You can see if you have had a deadlock in the SHOW ENGINE INNODB STATUS, in a section titled "LATEST DEADLOCK."

注:许多人说“死锁”的意思是“锁等待”。死锁是指两个事务被困在一起等待对方的锁,并且不能继续进行。死锁不会导致延迟,因为InnoDB会立即注意到周期性依赖关系并杀死一个事务。在“最新死锁”一节中,您可以看到在显示引擎INNODB状态中是否出现了死锁。

#1


3  

Delays for Lock Waits

A lock wait is probably what you mean. You can monitor for lock waits by enabling the slow-query log, collecting a bunch of logs, and then reviewing it. Here's an example:

锁等待可能就是你的意思。您可以通过启用慢查询日志来监视锁等待,收集一堆日志,然后检查它。这里有一个例子:

# Time: 140605 15:00:06
# User@Host: appuser[appuser] @  [127.0.0.1]  Id:    29
# Schema:   Last_errno: 0  Killed: 0
# Query_time: 0.011732  Lock_time: 0.000161  Rows_sent: 214  Rows_examined: 214  Rows_affected: 0
SET timestamp=1402005606;
SELECT ...blah blah blah...

You can see the field Lock_time above, which shows that the query was waiting for locks for 161 microseconds before it could begin executing. Then it took less than 12 milliseconds to execute (shown by Query_time).

您可以看到上面的字段Lock_time,它显示查询在开始执行之前等待了161微秒的锁。然后执行时间不到12毫秒(由Query_time显示)。

It's ordinary for Lock_time to be really small, often it's even off the scale, so it just shows as 0.000000. If it's getting into hundreds of milliseconds or more, that's unusual. If it's into whole seconds, you're in trouble.

Lock_time非常小是很正常的,通常它甚至超出了范围,所以它只显示了0.000000。如果时间超过几百毫秒或更多,这是不寻常的。如果它进入了整秒,你就有麻烦了。

Note that the slow-query log entry won't be written to the log unless Query_time exceeds your config variable long_query_time -- even if the Lock_time is large. For some more discussion on this, see http://www.mysqlperformanceblog.com/2012/11/22/get-me-some-query-logs/

注意,慢查询日志条目不会写入日志,除非Query_time超过了配置变量long_query_time——即使Lock_time很大。有关这个问题的更多讨论,请参见http://www.mysqlperformanceblog.com/2012/11/22/get-me- some-querylogs/。

Delays for Connections

You also mentioned it could be a delay caused by acquiring the connection, before you have run any query. You need to track down whether this is the case. It should be easy in any application language to read the time before and after the connection to the database, and compare them to see how long it takes. Some frameworks even provide this type of application-level profiling per query (or you can do it yourself).

您还提到,在运行任何查询之前,获取连接可能会导致延迟。你需要查明事实是否如此。在任何应用程序语言中,读取连接到数据库之前和之后的时间应该都很容易,然后比较它们,看看需要多长时间。一些框架甚至提供了这种类型的应用程序级别的查询(或者您自己也可以这样做)。

One common reason for delays on connection, for example, is that the MySQL Server is doing a reverse-DNS lookup to convert the incoming socket's IP address into a hostname. It does this so it can look up the hostname in the grant tables to figure out what privileges the user@host has. But if your DNS server is slow or overloaded, this can be slow. It's surprising that it would be more than a fraction of a second, but it's possible.

例如,延迟连接的一个常见原因是,MySQL服务器正在进行反向dns查询,以将传入的套接字的IP地址转换为主机名。它这样做是为了在grant表中查找主机名,以确定user@host有哪些特权。但是,如果您的DNS服务器速度很慢或超载,这可能会很慢。令人惊讶的是,它会超过几分之一秒,但这是可能的。

You can speed this up by setting the config variable skip_name_resolve. This means you cannot grant privileges to users based on hostname, you have to identify users by IP address only. Most production MySQL instances in the real world set skip_name_resolve.

可以通过设置配置变量skip_name_resolve来加快速度。这意味着您不能基于主机名向用户授予特权,您必须仅通过IP地址标识用户。现实世界中的大多数生产MySQL实例都设置skip_name_resolve。

There may also be other causes for slow connections, but first do some application profiling to determine conclusively whether it's the connection that is slow or a query.

可能还有其他原因导致连接速度变慢,但首先要做一些应用程序分析,以确定是连接变慢还是查询变慢。


P.S.: Lots of people say "deadlock" when they mean "lock wait." A deadlock is when two transactions are stuck waiting for each other's locks, and they cannot proceed. Deadlocks don't cause delays because InnoDB notices the cyclical dependency immediately and kills one of the transactions. You can see if you have had a deadlock in the SHOW ENGINE INNODB STATUS, in a section titled "LATEST DEADLOCK."

注:许多人说“死锁”的意思是“锁等待”。死锁是指两个事务被困在一起等待对方的锁,并且不能继续进行。死锁不会导致延迟,因为InnoDB会立即注意到周期性依赖关系并杀死一个事务。在“最新死锁”一节中,您可以看到在显示引擎INNODB状态中是否出现了死锁。