部署一个redis exporter监控所有的Redis实例

时间:2023-01-30 18:00:39

说明

之前有写过使用k8s部署redis exporter监控所有的Redis实例,简单分析了关于redis的监控和告警部署与配置。

本文将结合ConsulManager部署一个redis exporter监控所有的Redis实例。

部署 redis exporter

这里提供两种部署方式,我这里选择使用k8s部署方式,大家按需选择:

  • 使用docker-compose部署

  • 使用k8s 部署

使用 docker-compose 部署 exporter

新建一个docker-compose.yml,内容如下:

version: "3.2"
services:
  redis-exporter:
    image: oliver006/redis_exporter
    container_name: redis-exporter
    restart: unless-stopped
    command:
      - "-redis.password-file=/redis_passwd.json"
    volumes:
      - /usr/share/zoneinfo/PRC:/etc/localtime
      - /data/redis-exporter/redis_passwd.json:/redis_passwd.json
    expose:
      - 9121
    network_mode: "host"

新建一个redis的实例地址与密码文件,/data/redis-exporter/redis_passwd.json:

{
  "redis://xxxxxxxxxxx.dcs.huaweicloud.com:6379":"",
  "redis://aaaaaaaa.cn-south-1.dcs.myhuaweicloud.com:6379":"q1azw2sx"
}
  • docker-compose中挂载配置文件文件的本地路径注意根据实际情况修改。

  • 配置文件的格式为json,每行一个实例的信息格式为:"redis://实例地址端口":"redis密码"

  • 实例地址端口请查看云REDIS列表或自建redis管理的实例字段。

  • 如redis无密码,保留空双引号即可""。

启动:

docker-compose up -d

更多详情,请参考官网

使用 k8s 部署 export

新建一个redis-exporter.yaml文件,内容如下:

cat > redis-exporter.yaml <<EOF
---  
apiVersion: v1
data:
  redis_passwd.json: |
    {
      "redis://192.168.10.2:6379":"test@2000",
      "redis://192.168.10.3:6379":"test@2000",
      "redis://192.168.10.4:6379":""
    }
kind: ConfigMap
metadata:
  name: redis-passwd-cm
  namespace: kubesphere-monitoring-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: redis-exporter-prod
  name: redis-exporter-prod
  namespace: kubesphere-monitoring-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis-exporter-prod
  template:
    metadata:
      labels:
        app: redis-exporter-prod
    spec:
      containers:
      - name: redis-exporter
        image: oliver006/redis_exporter:latest
        env:
        - name: TZ
          value: "Asia/Shanghai"
        args:
        - "-redis.password-file=/opt/redis_passwd.json"
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - name: http-metrics
          containerPort: 9121
          protocol: TCP
        volumeMounts:
        - name: redis-passwd-conf-map
          mountPath: "/opt"
      volumes:
      - name: redis-passwd-conf-map
        configMap:
          name: redis-passwd-cm
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: redis-exporter-prod
  name: redis-exporter-prod
  namespace: kubesphere-monitoring-system
spec:
  ports:
  - name: http-metirc
    protocol: TCP
    port: 9121
    targetPort: 9121
  selector:
    app: redis-exporter-prod
EOF

部署 export,命令如下:

kubectl apply -f redis-exporter.yaml

Prometheus 自动发现配置

下面提供一个样例,也可以在consulmanager上进行配置生成:

cat > prometheus-additional.yaml << EOF
- job_name: redis_exporter
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /scrape
  consul_sd_configs:
    - server: '192.168.10.60:8500'  ## consul 服务地址和端口
      token: 'fe48c9a4-364e-af23-81df-9f28303012af'
      refresh_interval: 30s
      services: ['selfredis_exporter']
  relabel_configs:
    - source_labels: [__meta_consul_tags]
      regex: .*OFF.*
      action: drop
    - source_labels: [__meta_consul_service_address,__meta_consul_service_port]
      regex: ([^:]+)(?::\d+)?;(\d+)
      target_label: __param_target
      replacement: $1:$2
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: redis-exporter-prod.kubesphere-monitoring-system.svc:9121 ## redis exporter 服务地址和端口
    - source_labels: ['__meta_consul_service_metadata_vendor']
      target_label: vendor
    - source_labels: ['__meta_consul_service_metadata_region']
      target_label: region
    - source_labels: ['__meta_consul_service_metadata_group']
      target_label: group
    - source_labels: ['__meta_consul_service_metadata_account']
      target_label: account
    - source_labels: ['__meta_consul_service_metadata_name']
      target_label: name
    - source_labels: ['__meta_consul_service_metadata_iid']
      target_label: iid
    - source_labels: ['__meta_consul_service_metadata_mem']
      target_label: mem
    - source_labels: ['__meta_consul_service_metadata_itype']
      target_label: itype
    - source_labels: ['__meta_consul_service_metadata_ver']
      target_label: ver
    - source_labels: ['__meta_consul_service_metadata_exp']
      target_label: exp
EOF

加载上述自动发现配置,我这边prometheus无需手动更新:

kubectl delete secret additional-configs  -n kubesphere-monitoring-system
kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n kubesphere-monitoring-system

Grafana 看板

Grafana 看板详情,样例如下:

部署一个redis exporter监控所有的Redis实例

告警规则

样例如下:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: consul-redis-exporter
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 1.3.1
    prometheus: k8s
    role: alert-rules
  name: consul-redis-exporter-rules
  namespace: kubesphere-monitoring-system
spec:
  groups:
  - name: REDIS-Alert
    rules:
    - alert: RedisDown
      expr: redis_up == 0
      for: 0m
      labels:
        severity: 紧急
      annotations:
        summary: Redis down (instance {{ $labels.instance }})
        description: "Redis instance is down\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: RedisMissingMaster
      expr: (count(redis_instance_info{role="master"}) by (name,region,vendor,instance)) < 1
      for: 0m
      labels:
        severity: 紧急
      annotations:
        summary: Redis missing master (instance {{ $labels.instance }})
        description: "Redis cluster has no node marked as master.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: RedisTooManyMasters
      expr: count(redis_instance_info{role="master"}) by (name,region,vendor,instance) > 1
      for: 0m
      labels:
        severity: 紧急
      annotations:
        summary: Redis too many masters (instance {{ $labels.instance }})
        description: "Redis cluster has too many nodes marked as master.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: RedisDisconnectedSlaves
      expr: count without (instance, job) (redis_connected_slaves) - sum without (instance, job) (redis_connected_slaves) - 1 > 1
      for: 0m
      labels:
        severity: 紧急
      annotations:
        summary: Redis disconnected slaves (instance {{ $labels.instance }})
        description: "Redis not replicating for all slaves. Consider reviewing the redis replication status.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: RedisReplicationBroken
      expr: delta(redis_connected_slaves[2m]) < 0
      for: 0m
      labels:
        severity: 紧急
      annotations:
        summary: Redis replication broken (instance {{ $labels.instance }})
        description: "Redis instance lost a slave\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: RedisClusterFlapping
      expr: changes(redis_connected_slaves[1m]) > 1
      for: 2m
      labels:
        severity: 紧急
      annotations:
        summary: Redis cluster flapping (instance {{ $labels.instance }})
        description: "Changes have been detected in Redis replica connection. This can occur when replica nodes lose connection to the master and reconnect (a.k.a flapping).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: RedisMissingBackup
      expr: time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24
      for: 0m
      labels:
        severity: 紧急
      annotations:
        summary: Redis missing backup (instance {{ $labels.instance }})
        description: "Redis has not been backuped for 24 hours\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    # The exporter must be started with --include-system-metrics flag or REDIS_EXPORTER_INCL_SYSTEM_METRICS=true environment variable.
    - alert: RedisOutOfSystemMemory
      expr: redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90
      for: 2m
      labels:
        severity: 警告
      annotations:
        summary: Redis out of system memory (instance {{ $labels.instance }})
        description: "Redis is running out of system memory (> 90%)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    #- alert: RedisOutOfConfiguredMaxmemory
    #  expr: redis_memory_used_bytes / redis_memory_max_bytes * 100 > 90
    #  for: 2m
    #  labels:
    #    severity: 警告
    #  annotations:
    #    summary: Redis out of configured maxmemory (instance {{ $labels.instance }})
    #    description: "Redis is running out of configured maxmemory (> 90%)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: RedisTooManyConnections
      expr: redis_connected_clients > 100
      for: 2m
      labels:
        severity: 警告
      annotations:
        summary: Redis too many connections (instance {{ $labels.instance }})
        description: "Redis instance has too many connections\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: RedisNotEnoughConnections
      expr: redis_connected_clients < 1
      for: 2m
      labels:
        severity: 警告
      annotations:
        summary: Redis not enough connections (instance {{ $labels.instance }})
        description: "Redis instance should have more connections (> 5)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - alert: RedisRejectedConnections
      expr: increase(redis_rejected_connections_total[2m]) > 0
      for: 0m
      labels:
        severity: 紧急
      annotations:
        summary: Redis rejected connections (instance {{ $labels.instance }})
        description: "Some connections to Redis has been rejected\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

钉钉报警

告警和告警恢复样例如下~

告警样例:

部署一个redis exporter监控所有的Redis实例

告警恢复样例:

部署一个redis exporter监控所有的Redis实例

参考文档

部署一个redis exporter监控所有的Redis实例