Spring Boot 微服务应用集成Prometheus + Grafana 实现监控告警

时间:2023-03-08 23:37:33
Spring Boot 微服务应用集成Prometheus + Grafana 实现监控告警

Spring Boot 微服务应用集成Prometheus + Grafana 实现监控告警



部分内容原文地址:

Richard_Yi:Spring Boot 微服务应用集成Prometheus + Grafana 实现监控告警



一、添加依赖

Spring Boot 应用和Prometheus 集成,你需要增加micrometer-registry-prometheus依赖。

<!-- Micrometer Prometheus registry  -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

添加上述依赖项之后,Spring Boot 将会自动配置 PrometheusMeterRegistry 和 CollectorRegistry来以Prometheus 可以抓取的格式(即上文提到的 Metrics 格式)收集和导出指标数据。

所有的相关数据,都会在Actuator 的 /prometheus端点暴露出来。Prometheus 可以抓取该端点以定期获取度量标准数据。

1.1 Actuator 的 /prometheus端点

加micrometer-registry-prometheus依赖后,我们访问http://localhost:8080/actuator/prometheus地址,可以看到一下内容:

# HELP jvm_buffer_total_capacity_bytes An estimate of the total capacity of the buffers in this pool
# TYPE jvm_buffer_total_capacity_bytes gauge
jvm_buffer_total_capacity_bytes{id="direct",} 90112.0
jvm_buffer_total_capacity_bytes{id="mapped",} 0.0
# HELP tomcat_sessions_expired_sessions_total
# TYPE tomcat_sessions_expired_sessions_total counter
tomcat_sessions_expired_sessions_total 0.0
# HELP jvm_classes_unloaded_classes_total The total number of classes unloaded since the Java virtual machine has started execution
# TYPE jvm_classes_unloaded_classes_total counter
jvm_classes_unloaded_classes_total 1.0
# HELP jvm_buffer_count_buffers An estimate of the number of buffers in the pool
# TYPE jvm_buffer_count_buffers gauge
jvm_buffer_count_buffers{id="direct",} 11.0
jvm_buffer_count_buffers{id="mapped",} 0.0
# HELP system_cpu_usage The "recent cpu usage" for the whole system
# TYPE system_cpu_usage gauge
system_cpu_usage 0.0939447637893599
# HELP jvm_gc_max_data_size_bytes Max size of old generation memory pool
# TYPE jvm_gc_max_data_size_bytes gauge
jvm_gc_max_data_size_bytes 2.841116672E9 # 此处省略超多字...

这些都是按照上文提到的 Metrics 格式组织起来的程序监控指标数据。

metric name>{<label name>=<label value>, ...}

二、Prometheus 配置

配置Prometheus 去收集/actuator/prometheus的指标数据。

# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s). # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus' # metrics_path defaults to '/metrics'
# scheme defaults to 'http'. static_configs:
- targets: ['localhost:9090']
# demo job
- job_name: 'springboot-actuator-prometheus-test' # job name
metrics_path: '/actuator/prometheus' # 指标获取路径
scrape_interval: 5s # 间隔
basic_auth: # Spring Security basic auth
username: 'actuator'
password: 'actuator'
static_configs:
- targets: ['10.60.45.113:8080'] # 实例的地址,默认的协议是http