NodeJS Socket.io:状态CLOSE_WAIT和FIN_WAIT2中的许多连接没有释放

时间:2022-08-26 21:00:43

I used ubuntu(12.04) + nodejs (v0.10.22) + socket.io (v0.9.14) to transmit messages.

我使用ubuntu(12.04)+ nodejs(v0.10.22)+ socket.io(v0.9.14)来传输消息。

There are ~300 simultaneous connections. After some hours (about 1 or 2 hours above, it doesn't show up immediately), some connections will persistent in the state CLOSE_WAIT or FIN_WAIT2.

有大约300个同时连接。几个小时后(大约1或2个小时,它不会立即显示),某些连接将持续处于CLOSE_WAIT或FIN_WAIT2状态。

And these un-dead connections grows linearly with time. The users will hard to connect socket server when the connections number reach the limit (Default 1024) , unless some connections released normally.

而这些未死亡的连接随着时间线性增长。当连接数达到限制(默认1024)时,用户将很难连接套接字服务器,除非某些连接正常释放。

The following was socket service connections status, running about 3 hours.

以下是套接字服务连接状态,运行大约3个小时。

netstat -anl | grep <PORT_OF_NODE_PROCESS> | awk '/^tcp/ {t[$NF]++}END{for(state in t){print state, t[state]} }'

FIN_WAIT2 23
LISTEN 1
CLOSE_WAIT 27
TIME_WAIT 12
ESTABLISHED 333
FIN_WAIT1 12

Probably Solutions

1. Touch js file in regular periods

Using Nodemon Package to run js file, when change the file's last modified time, nodemon will restart service, and release all previous un-dead connections (CLOSEWAIT or FINWAIT2)

使用Nodemon Package运行js文件,当更改文件的上次修改时间时,nodemon将重新启动服务,并释放所有先前的未死连接(CLOSEWAIT或FINWAIT2)

2. Increase connections limit

sudo vim /etc/security/limits.conf

*       soft    nofile  1024
*       hard    nofile  2048
root    soft    nofile  4096
root    hard    nofile  8192
user1   soft    nofile  2048
user1   hard    nofile  2048

Try to let connections hard to reach limit.

尽量让连接难以达到极限。

3. Decrease keep-alive timeout

Let operation system to close connections automatically in the short time, but I'm not try it yet.

让操作系统在短时间内自动关闭连接,但我还没试过。

Question

I found some probably solution to fix the problem. But the above solutions were not really solved the persistent connections with state CLOSE_WAIT or FIN_WAIT2 problem. I could find this is a result of server(CLOSE_WAIT) or clients (FIN_WAIT2) not correctly closing connections. I think socket.io will force-close these incorrectly connection after some timeout. But it seems like not work correctly.

我找到了解决这个问题的一些解决办法。但上述解决方案并没有真正解决状态CLOSE_WAIT或FIN_WAIT2问题的持久连接。我发现这是服务器(CLOSE_WAIT)或客户端(FIN_WAIT2)没有正确关闭连接的结果。我认为socket.io将在一些超时后强制关闭这些错误的连接。但它似乎无法正常工作。

I try to reappear the state CLOSE_WAIT or FIN_WAIT2 problem in my test environment. But it never show up these connection situation.

我尝试在我的测试环境中重新出现状态CLOSE_WAIT或FIN_WAIT2问题。但它从未显示出这些连接情况。

  1. After connect socket server and disconnect network
  2. 连接套接字服务器和断开网络后

  3. Connect socket server for a long time
  4. 连接套接字服务器很长一段时间

I found someone ask related question before (Many stale connections in state CLOSE_WAIT and FIN_WAIT2), but still can't find the solution. Does anyone know how to solve this problem??

我之前发现有人问相关问题(状态CLOSE_WAIT和FIN_WAIT2中的许多陈旧连接),但仍无法找到解决方案。有谁知道如何解决这个问题?

Thanks

3 个解决方案

#1


3  

I try to use multiple connections to connect socket server at same time, I found that some of the client socket will use the same SOCKET ID(get from xhr and it will looks like nmXTMmCGNQp4EncrfHqj) to establish connection. I close the browser when all connections established, and it will cause many CLOSE_WAIT connections without release. A few of connections will close (Base on number of Unique SOCKET ID that have been generated). Because server will establish TCP/IP connection from SOCKET ID. But, if SOCKET ID connections already exist in connections pool, this connection will not store in connections pool. So when client send FIN packet to try to close connection but not exist in server connections pool. Server will always not send ACK packet to prepare close connection. So these connection will stay in CLOSE_WAIT state and without release.

我尝试使用多个连接同时连接套接字服务器,我发现一些客户端套接字将使用相同的SOCKET ID(从xhr获取,它看起来像nmXTMmCGNQp4EncrfHqj)来建立连接。我建立所有连接时关闭浏览器,这将导致许多CLOSE_WAIT连接而不释放。一些连接将关闭(基于已生成的唯一SOCKET ID的数量)。因为服务器将从SOCKET ID建立TCP / IP连接。但是,如果连接池中已存在SOCKET ID连接,则此连接将不会存储在连接池中。因此,当客户端发送FIN数据包以尝试关闭连接但在服务器连接池中不存在时。服务器将始终不发送ACK数据包以准备紧密连接。因此,这些连接将保持CLOSE_WAIT状态而不释放。

var host = 'http://socket.server/';
var sockets = [];
for(var i=0;i<200;i++){
    var socket = io.connect(host,{"force new connection":true});
    sockets.push(socket);

  socket.on("message",function(message){
    console.log(message);
  });
  socket.on("disconnect",function(){
    console.log("disconnect");
  });
}

Fix lib\manager.js line 670.

修复lib \ manager.js第670行。

Not to establish TCP/IP connection from SOCKET ID when SOCKET ID connections already exist in connections pool.

当连接池中已存在SOCKET ID连接时,不从SOCKET ID建立TCP / IP连接。

See also: https://github.com/kejyun/socket.io/commit/8d6c02a477d365f019530b4ec992420dfb90eb09

另见:https://github.com/kejyun/socket.io/commit/8d6c02a477d365f019530b4ec992420dfb90eb09

if (!this.connected[data.id]) {
  if (transport.open) {
    if (this.closed[data.id] && this.closed[data.id].length) {
      transport.payload(this.closed[data.id]);
      this.closed[data.id] = [];
      }

      this.onOpen(data.id);
      this.store.publish('open', data.id);
      this.transports[data.id] = transport;
    }

    this.onConnect(data.id);
    this.store.publish('connect', data.id);
    //....etc
  }
}

The following was socket service connections status, running about 6 hours.

以下是套接字服务连接状态,运行大约6个小时。

netstat -anl | grep <PORT_OF_NODE_PROCESS> | awk '/^tcp/ {t[$NF]++}END{for(state in t){print state, t[state]} }'

FIN_WAIT2 37
LISTEN 1
TIME_WAIT 13
ESTABLISHED 295
FIN_WAIT1 20
  1. Benchmarkt socket.io

#2


0  

The above solution may solve CLOSE_WAIT, but does not solve FIN_WAIT2. The latest discussion here(https://github.com/LearnBoost/socket.io/issues/1380) offers possible alternate solutions. This discussion also points out that the problem is in node.js itself and NOT socket.io.

上述解决方案可以解决CLOSE_WAIT,但不解决FIN_WAIT2。这里的最新讨论(https://github.com/LearnBoost/socket.io/issues/1380)提供了可能的替代解决方案。这个讨论还指出问题出在node.js本身而不是socket.io中。

#3


0  

If you use the native cluster module and spawn workers, note that if your worker process gets killed forcefully with clients connected to it (i.e. due to low system memory), it will leave behind CLOSE_WAIT sockets which will clog up system resources indefinitely.

如果您使用本机群集模块并生成工作程序,请注意,如果您的工作进程被连接到它的客户端强行杀死(即由于系统内存不足),它将留下CLOSE_WAIT套接字,这将无限期地阻塞系统资源。

The workaround is to kill your main Node.js process as soon as one of your workers gets terminated by the OS. Once the master Node.js process is killed, the system will destroy all sockets in CLOSE_WAIT belonging to dead workers.

解决方法是在您的某个工作程序被操作系统终止后立即终止您的主Node.js进程。一旦主Node.js进程被终止,系统将销毁属于死工人的CLOSE_WAIT中的所有套接字。

Also, it appears that calling socket.destroy() from worker processes on Node.js v4.9.5 also leads to sockets stuck in CLOSE_WAIT state. Updating to Node.js v6.9.5 LTS fixed this for me.

此外,似乎从Node.js v4.9.5上的工作进程调用socket.destroy()也会导致套接字陷入CLOSE_WAIT状态。更新到Node.js v6.9.5 LTS为我修复了这个问题。

#1


3  

I try to use multiple connections to connect socket server at same time, I found that some of the client socket will use the same SOCKET ID(get from xhr and it will looks like nmXTMmCGNQp4EncrfHqj) to establish connection. I close the browser when all connections established, and it will cause many CLOSE_WAIT connections without release. A few of connections will close (Base on number of Unique SOCKET ID that have been generated). Because server will establish TCP/IP connection from SOCKET ID. But, if SOCKET ID connections already exist in connections pool, this connection will not store in connections pool. So when client send FIN packet to try to close connection but not exist in server connections pool. Server will always not send ACK packet to prepare close connection. So these connection will stay in CLOSE_WAIT state and without release.

我尝试使用多个连接同时连接套接字服务器,我发现一些客户端套接字将使用相同的SOCKET ID(从xhr获取,它看起来像nmXTMmCGNQp4EncrfHqj)来建立连接。我建立所有连接时关闭浏览器,这将导致许多CLOSE_WAIT连接而不释放。一些连接将关闭(基于已生成的唯一SOCKET ID的数量)。因为服务器将从SOCKET ID建立TCP / IP连接。但是,如果连接池中已存在SOCKET ID连接,则此连接将不会存储在连接池中。因此,当客户端发送FIN数据包以尝试关闭连接但在服务器连接池中不存在时。服务器将始终不发送ACK数据包以准备紧密连接。因此,这些连接将保持CLOSE_WAIT状态而不释放。

var host = 'http://socket.server/';
var sockets = [];
for(var i=0;i<200;i++){
    var socket = io.connect(host,{"force new connection":true});
    sockets.push(socket);

  socket.on("message",function(message){
    console.log(message);
  });
  socket.on("disconnect",function(){
    console.log("disconnect");
  });
}

Fix lib\manager.js line 670.

修复lib \ manager.js第670行。

Not to establish TCP/IP connection from SOCKET ID when SOCKET ID connections already exist in connections pool.

当连接池中已存在SOCKET ID连接时,不从SOCKET ID建立TCP / IP连接。

See also: https://github.com/kejyun/socket.io/commit/8d6c02a477d365f019530b4ec992420dfb90eb09

另见:https://github.com/kejyun/socket.io/commit/8d6c02a477d365f019530b4ec992420dfb90eb09

if (!this.connected[data.id]) {
  if (transport.open) {
    if (this.closed[data.id] && this.closed[data.id].length) {
      transport.payload(this.closed[data.id]);
      this.closed[data.id] = [];
      }

      this.onOpen(data.id);
      this.store.publish('open', data.id);
      this.transports[data.id] = transport;
    }

    this.onConnect(data.id);
    this.store.publish('connect', data.id);
    //....etc
  }
}

The following was socket service connections status, running about 6 hours.

以下是套接字服务连接状态,运行大约6个小时。

netstat -anl | grep <PORT_OF_NODE_PROCESS> | awk '/^tcp/ {t[$NF]++}END{for(state in t){print state, t[state]} }'

FIN_WAIT2 37
LISTEN 1
TIME_WAIT 13
ESTABLISHED 295
FIN_WAIT1 20
  1. Benchmarkt socket.io

#2


0  

The above solution may solve CLOSE_WAIT, but does not solve FIN_WAIT2. The latest discussion here(https://github.com/LearnBoost/socket.io/issues/1380) offers possible alternate solutions. This discussion also points out that the problem is in node.js itself and NOT socket.io.

上述解决方案可以解决CLOSE_WAIT,但不解决FIN_WAIT2。这里的最新讨论(https://github.com/LearnBoost/socket.io/issues/1380)提供了可能的替代解决方案。这个讨论还指出问题出在node.js本身而不是socket.io中。

#3


0  

If you use the native cluster module and spawn workers, note that if your worker process gets killed forcefully with clients connected to it (i.e. due to low system memory), it will leave behind CLOSE_WAIT sockets which will clog up system resources indefinitely.

如果您使用本机群集模块并生成工作程序,请注意,如果您的工作进程被连接到它的客户端强行杀死(即由于系统内存不足),它将留下CLOSE_WAIT套接字,这将无限期地阻塞系统资源。

The workaround is to kill your main Node.js process as soon as one of your workers gets terminated by the OS. Once the master Node.js process is killed, the system will destroy all sockets in CLOSE_WAIT belonging to dead workers.

解决方法是在您的某个工作程序被操作系统终止后立即终止您的主Node.js进程。一旦主Node.js进程被终止,系统将销毁属于死工人的CLOSE_WAIT中的所有套接字。

Also, it appears that calling socket.destroy() from worker processes on Node.js v4.9.5 also leads to sockets stuck in CLOSE_WAIT state. Updating to Node.js v6.9.5 LTS fixed this for me.

此外,似乎从Node.js v4.9.5上的工作进程调用socket.destroy()也会导致套接字陷入CLOSE_WAIT状态。更新到Node.js v6.9.5 LTS为我修复了这个问题。