Flink集群报错(Could not resolve ResourceManager address )

时间:2025-04-02 07:59:49

task和jobmanager不知道为什么挂了
日志如下:


2021-04-04 10:03:15,058 INFO           - NettyConfig [server address: /192.168.11.132, server port: 0, ssl enabled: false, memory segment size (bytes): 32768, transport type: NIO, number of server threads: 1 (manual), number of client threads: 1 (manual), server connect backlog: 0 (use Netty's default), client connect timeout (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)]
2021-04-04 10:03:15,297 INFO       - Temporary file directory '/tmp': total 26 GB, usable 22 GB (84.62% usable)
2021-04-04 10:03:16,123 INFO    - Allocated 102 MB for network buffer pool (number of memory segments: 3278, bytes per segment: 32768).
2021-04-04 10:03:16,197 INFO          - Starting the network environment and its components.
2021-04-04 10:03:16,252 INFO           - Successful initialization (took 52 ms).
2021-04-04 10:03:16,309 INFO           - Successful initialization (took 56 ms). Listening on SocketAddress /192.168.11.132:37718.
2021-04-04 10:03:16,310 INFO       - Limiting managed memory to 0.7 of the currently free heap space (641 MB), memory will be allocated lazily.
2021-04-04 10:03:16,314 INFO            - I/O manager uses directory /tmp/flink-io-5cb46d08-d7bd-41bb-91d0-e67a2ca8ab47 for spill files.
2021-04-04 10:03:16,409 INFO    - Messages have a max timeout of 10000 ms
2021-04-04 10:03:16,421 INFO                - Starting RPC endpoint for  at akka://flink/user/taskmanager_0 .
2021-04-04 10:03:16,438 INFO    - Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
2021-04-04 10:03:16,439 INFO          - Start job leader service.
2021-04-04 10:03:16,441 INFO                    - User file cache uses directory /tmp/flink-dist-cache-9bd42cb9-9f68-419a-9381-95693ff61ac5
2021-04-04 10:03:16,452 INFO              - Connecting to ResourceManager ://flink@localhost:46715/user/resourcemanager(97844b5c0749ea747b4749fffa964081).
2021-04-04 10:03:16,570 WARN                      - Remote connection to [null] failed with : 拒绝连接: localhost/127.0.0.1:46715
2021-04-04 10:03:16,577 WARN                          - Association with remote system [://flink@localhost:46715] has failed, address is now gated for [50] ms. Reason: [Association failed with [://flink@localhost:46715]] Caused by: [拒绝连接: localhost/127.0.0.1:46715]
2021-04-04 10:03:16,583 INFO              - Could not resolve ResourceManager address ://flink@localhost:46715/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address ://flink@localhost:46715/user/resourcemanager..
2021-04-04 10:03:26,617 WARN                      - Remote connection to [null] failed with : 拒绝连接: localhost/127.0.0.1:46715
2021-04-04 10:03:26,623 WARN  

......

2021-04-04 10:08:07,454 WARN                      - Remote connection to [null] failed with : 拒绝连接: localhost/127.0.0.1:46715
2021-04-04 10:08:07,455 WARN                          - Association with remote system [://flink@localhost:46715] has failed, address is now gated for [50] ms. Reason: [Association failed with [://flink@localhost:46715]] Caused by: [拒绝连接: localhost/127.0.0.1:46715]
2021-04-04 10:08:07,456 INFO              - Could not resolve ResourceManager address ://flink@localhost:46715/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address ://flink@localhost:46715/user/resourcemanager..
2021-04-04 10:08:16,468 ERROR             - Fatal error occurred in TaskExecutor ://flink@192.168.11.132:45382/user/taskmanager_0.
: Could not register at the ResourceManager within the specified maximum registration duration 300000 ms. This indicates a problem with this instance. Terminating now.
	at (:1034)
	at $startRegistrationTimeout$3(:1020)
	at (:392)
	at (:185)
	at (:147)
	at $$anonfun$receive$(:165)
	at (:502)
	at $(:500)
	at (:95)
	at (:526)
	at (:495)
	at (:257)
	at (:224)
	at (:234)
	at (:289)
	at $(:1056)
	at (:1692)
	at (:157)
2021-04-04 10:08:16,472 ERROR        - Fatal error occurred while executing the TaskManager. Shutting it down...
: Could not register at the ResourceManager within the specified maximum registration duration 300000 ms. This indicates a problem with this instance. Terminating now.
	at (:1034)
	at $startRegistrationTimeout$3(:1020)
	at (:392)
	at (:185)
	at (:147)
	at $$anonfun$receive$(:165)
	at (:502)
	at $(:500)
	at (:95)
	at (:526)
	at (:495)
	at (:257)
	at (:224)
	at (:234)
	at (:289)
	at $(:1056)
	at (:1692)
	at (:157)
2021-04-04 10:08:16,478 INFO              - Stopping TaskExecutor ://flink@192.168.11.132:45382/user/taskmanager_0.
2021-04-04 10:08:16,478 INFO          - Stop job leader service.
2021-04-04 10:08:16,507 INFO    - Stopping ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
2021-04-04 10:08:16,507 INFO    - Shutting down TaskExecutorLocalStateStoresManager.
2021-04-04 10:08:16,514 INFO            - I/O manager removed spill file directory /tmp/flink-io-5cb46d08-d7bd-41bb-91d0-e67a2ca8ab47
2021-04-04 10:08:16,514 INFO          - Shutting down the network environment and its components.
2021-04-04 10:08:16,515 INFO           - Successful shutdown (took 0 ms).
2021-04-04 10:08:16,518 INFO           - Successful shutdown (took 1 ms).
2021-04-04 10:08:16,532 INFO          - Stop job leader service.
2021-04-04 10:08:16,532 INFO                    - removed file cache directory /tmp/flink-dist-cache-9bd42cb9-9f68-419a-9381-95693ff61ac5
2021-04-04 10:08:16,539 INFO              - Stopped TaskExecutor ://flink@192.168.11.132:45382/user/taskmanager_0.
2021-04-04 10:08:16,540 INFO                - Shutting down BLOB cache
2021-04-04 10:08:16,540 INFO                - Shutting down BLOB cache
2021-04-04 10:08:16,553 INFO    - backgroundOperationsLoop exiting
2021-04-04 10:08:16,565 INFO    - Session: 0x10000007e9d0008 closed
2021-04-04 10:08:16,565 INFO                - Stopping Akka RPC service.
2021-04-04 10:08:16,583 INFO  $RemotingTerminator         - Shutting down remote daemon.
2021-04-04 10:08:16,594 INFO    - EventThread shut down for session: 0x10000007e9d0008
2021-04-04 10:08:16,597 INFO  $RemotingTerminator         - Shutting down remote daemon.
2021-04-04 10:08:16,601 INFO  $RemotingTerminator         - Remote daemon shut down; proceeding with flushing remote transports.
2021-04-04 10:08:16,611 INFO  $RemotingTerminator         - Remote daemon shut down; proceeding with flushing remote transports.
2021-04-04 10:08:16,640 INFO  $RemotingTerminator         - Remoting shut down.
2021-04-04 10:08:16,641 INFO  $RemotingTerminator         - Remoting shut down.
2021-04-04 10:08:16,661 INFO                - Stopped Akka RPC service.

原因:配置zookeeper错误,改正后

: node1:2181,node2:2181,node3:2181

另外lib里面jar的权限改为了755,后面就正确了。

另外,虚拟机直接reboot发现,或3台机器一起启动taskmanager,也可能造成上面的错误,估计是多个taskmanager启动太过于同步导致的