prometheus rules配置

支持两种rules。

recording rules

alerting rules

编辑后不需要重启prometheus验证语法，通过promtool工具：

go get /prometheus/prometheus/cmd/promtool
promtool check rules /path/to/

Recording rules

对采集的metric最计算或聚合，生成新的metric

groups:
  - name: example
    rules:
    - record: job:http_inprogress_requests:sum
      expr: sum(http_inprogress_requests) by (job)

Alerting rules

可以通过表达式定义报警规则。报警规则的配置和recording rules一样。

groups:
- name: example
  rules:
  - alert: HighErrorRate
    expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
    for: 10m
    labels:
      severity: page
    annotations:
      summary: High request latency

for：第一次判断前的等待时间

labels: 会被添加到告警中

annotations：存一些告警信息到补充和描述

模版

labels和annotations可以采用模版变量。

用法：

# To insert a firing element's label values:
{{ $labels.<labelname> }}
# To insert the numeric expression value of the firing element:
{{ $value }}

例子：

groups:
- name: example
  rules:

  # Alert for any instance that is unreachable for >5 minutes.
  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $ }} down"
      description: "{{ $ }} of job {{ $ }} has been down for more than 5 minutes."

  # Alert for any instance that has a median request latency >1s.
  - alert: APIHighRequestLatency
    expr: api_http_request_latencies_second{quantile="0.5"} > 1
    for: 10m
    annotations:
      summary: "High request latency on {{ $ }}"
      description: "{{ $ }} has a median request latency above 1s (current value: {{ $value }}s)"

alert: InstanceDown
expr: up == 0
for: 5m
labels:
  - severity: page
annotations:
  summary: "Instance {{$}} down"
  description: "{{$}} of job {{$}} has been down for more than 5 minutes."

带循环的

{{ range query "up" }}
  {{ . }} {{ .Value }}
{{ end }}

更多参考：/docs/prometheus/latest/configuration/template_examples/

秒客网

prometheus rules配置

Recording rules

Alerting rules

模版

相关文章