zabbix监控软件的使用排错

在linux系统中，几乎所有运行的服务都会产生相对就的日志（log），所运行的程序在出错时都会有错误提示，即使没有任何提示也可以通过“echo $”来查看运行是否成功。使用zabbix已经有一段时间了，整理一下遇到过的问题和解决的方法以。

zabbix的日志存放在/tmp下，服务器端对应的日志是zabbix_server.log，被监控端对应的日志是zabbix_agentd.log.

一，zabbix服务是否已经开启成功

查看系统是否已经有zabbix进程正在运行
# ps aux |grep zabbix
查看系统是否已监听zabbix server和zabbix agent所使用的10050、10051端口
# netstat -nplut |grep zabbix
如果没有，则开启：#/etc/init.d/zabbix_server_ctl start
#/etc/init.d/zabbix_agent_ctl start

特别需要注意的是：每次修改完配置文件之后都需要重新启动对应的zabbix server或者zabbix agentd。

部分运行脚本在做restart时无法关闭zabbix导致服务无法重新启动，可用kill的命令把zabbix相关的进程杀掉再启动。

二、zabbix_server.log出现的提示

2009:20121023:193549.354 Sending list of active checks to [192.168.30.3] failed: host [CentOS-3] not found

这是因为zabbix_agentd.conf配置文件中的Hostname与web中的主机名对应。

三，网页中了出现的错误

1，

Get value from agent failed: cannot connect to [[192.168.30.2]:10050]: [111] Connection refused

192.168.30.2是我的zabbix server服务器，本身也有监控自己本身的agent功能。出现这种错误是因为忘记在zabbix服务器开户zabbix_agentd。在Last 20 issues中也有提示

Last 20 issues

Host

Issue

Last change

Age

Ack

Actions

Zabbix server

Server Zabbix server is unreachable

23 Oct 2012 18:42:14

6m 57s

解决方法：开启zabbix_agentd即可。

2，

Get value from agent failed: cannot connect to [[192.168.30.3]:10050]: [113] No route to host

看提示“No route to host”，与网络连接有关。排除的方法如下：

a）查看192.168.30.3这台机器是否已开机

b)在zabbix server端向这台机器ping，看网络是否通

c）用telnet 登录10050和10051端口，看该主机是否允许这两个端口通讯

d)查看iptables防火墙规则是否拦截10050、10051端口

3，

网页中不停地有以下红色提示：

zabbix server is not running: the information displayed may not be current.

zabbix server is running | No.

查看/tmp/zabbix_server.log和/tmp/zabbix_agent.log无任何异常。看zabbix_server和zabbix_agent进程、端口都正常……几翻google以后并尝试，终于得到了解决！

http://www.zabbix.com/forum/showthread.php?t=23878&page=3 这里面有说到zabbix受selinux的影响而已有这种错误提示。

http://www.zabbix.com/forum/showthread.php?t=25321 这里面说到了修改hostname为IP的做法。

我具体的做法是：

①查看selinux产生的log，确实有错误提示：

#tail -f /var/log/audit/audit.log

type=AVC msg=audit(1351863204.990:32): avc: denied { name_connect } for pid=1575 comm="httpd" dest=10051 scontext=system_u:system_r:httpd_t:s0 tcontext=system_u:object_r:port_t:s0 tclass=tcp_socket

type=SYSCALL msg=audit(1351863204.990:32): arch=40000003 syscall=102 success=no exit=-13 a0=3 a1=bfd494b0 a2=b76b0ad8 a3=d items=0 ppid=1434 pid=1575 auid=4294967295 uid=48 gid=48 euid=48 suid=48 fsuid=48 egid=48 sgid=48 fsgid=48 tty=(none) ses=4294967295 comm="httpd" exe="/usr/sbin/httpd" subj=system_u:system_r:httpd_t:s0 key=(null)

②然后让selinux允许它通过

setsebool -P httpd_can_network_connect on

③编辑zabbix.conf.php文件，把$ZBX_SERVER的值改为本机的IP地址

$ZBX_SERVER = '192.168.30.2'; #######用IP代替hostname

④OK

用户自定义脚本监控：

有时候用户自定义的脚本运行的时间可能比较长，如超过10秒的20秒的。这时在执行zabbix_agentd -p 或者zabbix_agentd -t时就可能出现“Alarm clock”，从而得不到想要的结果。这是因为zabbix agentd配置文件中定义Timeout时间默认为3秒，脚本运行取结果的时间超过了3秒就会出现这种情况。

解决方法：编辑配置文件/etc/zabbix/zabbix_agentd.conf，找到"Timeout"把它定义为30秒或小于30秒。

对a中的情况还需要注意对zabbix服务器端的配置，如我自己定义的脚本

UserParameter=ping.avgtime,ping 192.168.30.2 -c 10 -w 29 |grep 'avg' |awk -F "/" '{print $5}'
对192.168.30.2 ping 10取平均值，-w参数是对ping限定时间为29秒

这个脚本运行的大概时间为10秒左右，此时在agent端虽然可以用zabbix_agentd -t得到结果，但是在zabbix服务器端日志会不断的出现

1762:20121023:191941.360 resuming Zabbix agent checks on host [Zabbix server]: connection restored

1761:20121023:191952.149 Zabbix agent item [ping.avgtime] on host [CentOS-3] failed: first network error, wait for 15 seconds

1762:20121023:192010.610 Zabbix agent item [ping.avgtime] on host [CentOS-3] failed: another network error, wait for 15 seconds

1762:20121023:192028.628 Zabbix agent item [ping.avgtime] on host [CentOS-3] failed: another network error, wait for 15 seconds

这样的错误日志，并且在web端也没有画出图来。

解决方法：

①编辑zabbix服务器端的配置文件/etc/zabbix/zabbix_server.conf找到"Timeout"把它定义为30秒或小于30秒。

②如果还有类似提示则应该是zabbix服务器的内存设置得太小了，加大服务器内存便可。

转载于:https://blog.51cto.com/fengzhige/1034485

秒客网

zabbix监控软件的使用排错

相关文章