高可用kube-prometheus 5分钟快速搭建

时间:2022-12-25 14:59:12

项目地址

​prometheus-operator/kube-prometheus: Use Prometheus to monitor Kubernetes and applications running on Kubernetes (github.com)​

1. 初识prometheus

1.1 prometheus简介

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It is now a standalone open source project and maintained independently of any company. To emphasize this, and to clarify the project's governance structure, Prometheus joined theCloud Native Computing Foundationin 2016 as the second hosted project, afterKubernetes.

Prometheus是一个开源的监控系统+告警系统工具集,最早由SoudCloud开发,目前已被很多公司广泛使用,于2016年加入CNCF组织,成为继kubernetes之后第二个管理的项目。得益于kubernetes的火热,prometheus被越来越多的企业应用,已成为新一代的监控系统,成为CNCF第二个毕业的项目。

prometheus特点

  • 一个指标和键值对标识的时间序列化多维度数据模型
  • PromQL提供一个便捷查询语言实现多维度数据查询
  • 不依赖于分布式存储,单个节点能提供自治功能
  • 通过HTTP协议拉取时间系列数据模型
  • 支持通过gateway主动推送时间序列
  • 支持服务发现或者静态配置发现节点
  • 内置有多维度数据画图和集成grafana数据展示

1.2 prometheus架构

高可用kube-prometheus 5分钟快速搭建

prometheus架构

prometheus架构:

  • prometheus-server,prometheus主服务端,从exporters端采集和存储数据,并提供PromQL数据查询语言
  • Retrieval 采集模块,从exporters和pushgateway中采集数据,采集数据经过一定规则处理
  • TSDB 数据存储,TSDB是时序化数据库,将Retrieval采集数据存储,默认存储在本地
  • http server 提供http接口查询和数据展板,默认端口是9090,可以登陆查询监控指标和绘图
  • PromQL 提供边界的PromQL语言,用于数据统计,数据输出和数据展示接口集成
  • 数据采集,数据采集模块,包含两种数据采集方式:拉去pull和推送push
  • Jobs exporters 采集宿主机和container的性能指标,通过http方式拉取,支持多种不同数据类型采集
  • Short-lived jobs 瞬时在线任务,适用于实时监控指标,server端拉去时可能消失了,采用主动上报机制
  • Pushgateway 推动网关,Short-lived jobs将数据主动push到过gateway,server再从gateway拉取
  • 数据展示,借助于PromQL语言实现实现数据的展示,包含还有prometheus UI,Gafana和API clients
  • Prometheus Web UI,prometheus默认提一个数据查询和画图展示的UI,通过http 9090端口
  • Grafana,一个开源非常优秀绚烂的数据展示框架,从Prometheus中获取数据,采用模版绘图
  • API Clients,支持多种不同的客户端SDK语言,包括Go,python,Java等,便于编写开发监控系统
  • 告警系统,从server接受告警,推送给AlertManager告警系统,告警系统接受告警信息去重,分组。通知包含
  • pageduty
  • Email,邮件告警,结合smtp
  • 其他,如webhook等
  • 服务发现,借助于第三方接口实现服务机制,如DNS,Consul,Kubernetes等,如和kubernetes apiserver结合,获取目标target的是列表,并定期轮训获取到监控数据。

2. prometheus和kubernetes结合

2.1 prometheus安装简介

prometheus安装涉及较多的组件,因此给安装带来较大的困难,kube-prometheus是coreos公司提供在kubernets中自动安装prometheus的组件,为集成kuberntes提供的安装,包含如下组件:

  • The Prometheus Operator prometheus核心组件
  • Highly available Prometheus 提供高可用能力
  • Highly available Alertmanager 告警管理器
  • Prometheus node-exporter 数据采集组件
  • Prometheus Adapter for Kubernetes Metrics APIs 和kubernetes集成的适配器
  • kube-state-metrics 指标监控转换,使之适配kubernetes风格的接口
  • Grafana 数据展示

安装环境:

1、kubernetes版本

[root@k8s-master1 20221225-monitoring]#  kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.6", GitCommit:"8a62859e515889f07e3e3be6a1080413f17cf2c3", GitTreeState:"clean", BuildDate:"2021-04-15T03:28:42Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.6", GitCommit:"8a62859e515889f07e3e3be6a1080413f17cf2c3", GitTreeState:"clean", BuildDate:"2021-04-15T03:19:55Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}

2、安装版本

高可用kube-prometheus 5分钟快速搭建

高可用kube-prometheus 5分钟快速搭建

2.2 prometheus安装

有个国外镜像拉不下来 ,可以替换 

image: bitnami/kube-state-metrics:2.0.0

1、获取kube-prometheus安装源

git clone -b release-0.8 https://github.com/prometheus-operator/kube-prometheus.git

2、快速安装prometheus组件,相关的setup包的yaml文件在setup目录下,包含有很多自定义的CRD资源

[root@k8s-master1 kube-prometheus]# kubectl create -f manifests/setup
namespace/monitoring created
customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
service/prometheus-operator created
serviceaccount/prometheus-operator created
校验CRD资源安装情况,prometheus,alertmanagers,rules,servicemonitor均以CRM资源的方式部署
[root@k8s-master1 kube-prometheus]# kubectl get customresourcedefinitions.apiextensions.k8s.io |grep monitoring
alertmanagerconfigs.monitoring.coreos.com 2022-12-25T05:01:08Z
alertmanagers.monitoring.coreos.com 2022-12-25T05:01:08Z
podmonitors.monitoring.coreos.com 2022-12-25T05:01:08Z
probes.monitoring.coreos.com 2022-12-25T05:01:08Z
prometheuses.monitoring.coreos.com 2022-12-25T05:01:09Z
prometheusrules.monitoring.coreos.com 2022-12-25T05:01:09Z
servicemonitors.monitoring.coreos.com 2022-12-25T05:01:09Z
thanosrulers.monitoring.coreos.com 2022-12-25T05:01:09Z
部署了一个prometheus-operator的deployments和services
[root@k8s-master1 kube-prometheus]# kubectl get deployments -n monitoring
NAME READY UP-TO-DATE AVAILABLE AGE
prometheus-operator 1/1 1 1 17s
[root@k8s-master1 kube-prometheus]# kubectl get services -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
prometheus-operator ClusterIP None <none> 8443/TCP 21s

3、部署prometheus其他组件,包含kube-state-metric,grafana,node-exporter,alertmanager,prometheus-adapter,prometheus,组件包含在manifest所在目录,安装组件的角色如下:

  • prometheus prometheus核心组件
  • prometheus-adapter prometheus适配器,做数据转换
  • kube-state-metrics kubernetes指标转换器,转换为apiserver能识别的指标
  • alertmanager 告警管理器,用于指标阀值告警实现
  • node-exporter exporters,客户端监控上报agent,用于实现数据上报
  • grafana 数据显示展板
  • configmaps grafana数据展板配置模版,封装在configmap中
  • clusterrole,clusterrolebinding prometheus访问kubernetes的RBAC授权
[root@k8s-master1 kube-prometheus]# kubectl apply -f manifests/
alertmanager.monitoring.coreos.com/main created
poddisruptionbudget.policy/alertmanager-main created
prometheusrule.monitoring.coreos.com/alertmanager-main-rules created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager created
clusterrole.rbac.authorization.k8s.io/blackbox-exporter created
clusterrolebinding.rbac.authorization.k8s.io/blackbox-exporter created
configmap/blackbox-exporter-configuration created
deployment.apps/blackbox-exporter created
service/blackbox-exporter created
serviceaccount/blackbox-exporter created
servicemonitor.monitoring.coreos.com/blackbox-exporter created
secret/grafana-datasources created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-cluster-total created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-node created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-namespace-by-pod created
configmap/grafana-dashboard-namespace-by-workload created
configmap/grafana-dashboard-node-cluster-rsrc-use created
configmap/grafana-dashboard-node-rsrc-use created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pod-total created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-statefulset created
configmap/grafana-dashboard-workload-total created
configmap/grafana-dashboards created
deployment.apps/grafana created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
prometheusrule.monitoring.coreos.com/kube-prometheus-rules created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kube-state-metrics-rules created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kubernetes-monitoring-rules created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
prometheusrule.monitoring.coreos.com/node-exporter-rules created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
poddisruptionbudget.policy/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
servicemonitor.monitoring.coreos.com/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
prometheusrule.monitoring.coreos.com/prometheus-operator-rules created
servicemonitor.monitoring.coreos.com/prometheus-operator created
poddisruptionbudget.policy/prometheus-k8s created
prometheus.monitoring.coreos.com/k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-prometheus-rules created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus-k8s created

4、校验prometheus安装情况,包括node-exporter、kube-state-metrics、prometheus-adapter、alertmanager

、grafana等

#node-exporter agent上报端,通过DaemonSets部署
[root@k8s-master1 kube-prometheus]# kubectl get daemonsets -n monitoring
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
node-exporter 3 3 3 3 3 kubernetes.io/os=linux 32s
[root@k8s-master1 kube-prometheus]# kubectl get pods -n monitoring |grep node-exporter
node-exporter-7rhjl 2/2 Running 0 37s
node-exporter-bxcsr 2/2 Running 0 37s
node-exporter-j86gg 2/2 Running 0 37s

#prometheus-adapter,grafana,kube-state-metrics以deployments的形式部署
[root@k8s-master1 kube-prometheus]# kubectl get deployments -n monitoring
NAME READY UP-TO-DATE AVAILABLE AGE
blackbox-exporter 1/1 1 1 46s
grafana 1/1 1 1 45s
kube-state-metrics 1/1 1 1 45s
prometheus-adapter 2/2 2 2 44s
prometheus-operator 1/1 1 1 2m27s

#prometheus核心组件和告警组件,以statefulsets的形式部署
[root@k8s-master1 kube-prometheus]# kubectl get statefulsets.apps -n monitoring
NAME READY AGE
alertmanager-main 3/3 52s
prometheus-k8s 2/2 48s

#服务暴露,包括grafana,prometheus等
[root@k8s-master1 kube-prometheus]# kubectl get services -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main ClusterIP 10.10.45.62 <none> 9093/TCP 59s
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 60s
blackbox-exporter ClusterIP 10.10.69.230 <none> 9115/TCP,19115/TCP 59s
grafana ClusterIP 10.10.85.194 <none> 3000/TCP 58s
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 58s
node-exporter ClusterIP None <none> 9100/TCP 57s
prometheus-adapter ClusterIP 10.10.6.18 <none> 443/TCP 57s
prometheus-k8s ClusterIP 10.10.141.208 <none> 9090/TCP 55s
prometheus-operated ClusterIP None <none> 9090/TCP 55s
prometheus-operator ClusterIP None <none> 8443/TCP 2m40s

3. prometheus使用

3.1 prometheus原生指标

prometheus-k8s默认提供ClusterIP开放9090端口用于集群内部,修改为NodePort供集群外部访问,如下修改将prometheus-k8s的类型修改为NodePort类型

[root@k8s-master1 kube-prometheus]# kubectl patch -p '{"spec":{"type": "NodePort"}}' services -n monitoring prometheus-k8s
service/prometheus-k8s patched
[root@k8s-master1 kube-prometheus]# kubectl get services -n monitoring prometheus-k8s -o yaml
apiVersion: v1
kind: Service
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"prometheus","app.kubernetes.io/name":"prometheus","app.kubernetes.io/part-of":"kube-prometheus","app.kubernetes.io/version":"2.26.0","prometheus":"k8s"},"name":"prometheus-k8s","namespace":"monitoring"},"spec":{"ports":[{"name":"web","port":9090,"targetPort":"web"}],"selector":{"app":"prometheus","app.kubernetes.io/component":"prometheus","app.kubernetes.io/name":"prometheus","app.kubernetes.io/part-of":"kube-prometheus","prometheus":"k8s"},"sessionAffinity":"ClientIP"}}
creationTimestamp: "2022-12-25T05:02:54Z"
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.26.0
prometheus: k8s
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:labels:
.: {}
f:app.kubernetes.io/component: {}
f:app.kubernetes.io/name: {}
f:app.kubernetes.io/part-of: {}
f:app.kubernetes.io/version: {}
f:prometheus: {}
f:spec:
f:ports:
.: {}
k:{"port":9090,"protocol":"TCP"}:
.: {}
f:name: {}
f:port: {}
f:protocol: {}
f:targetPort: {}
f:selector:
.: {}
f:app: {}
f:app.kubernetes.io/component: {}
f:app.kubernetes.io/name: {}
f:app.kubernetes.io/part-of: {}
f:prometheus: {}
f:sessionAffinity: {}
f:sessionAffinityConfig:
.: {}
f:clientIP:
.: {}
f:timeoutSeconds: {}
manager: kubectl-client-side-apply
operation: Update
time: "2022-12-25T05:02:54Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:externalTrafficPolicy: {}
f:type: {}
manager: kubectl-patch
operation: Update
time: "2022-12-25T05:07:38Z"
name: prometheus-k8s
namespace: monitoring
resourceVersion: "102720"
uid: 65ad4b4a-2f7a-435e-a1c2-cae010f5b84b
spec:
clusterIP: 10.10.141.208
clusterIPs:
- 10.10.141.208
externalTrafficPolicy: Cluster
ports:
- name: web
nodePort: 31751
port: 9090
protocol: TCP
targetPort: web
selector:
app: prometheus
app.kubernetes.io/component: prometheus
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
prometheus: k8s
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10800
type: NodePort
status:
loadBalancer: {}

3.2 grafana数据展示

相比于prometheus web UI,grafana能够提供更丰富的数据展示功能,起借助于PromQL语言实现丰富的数据查询并通过模版展示控制台,grafana默认的3000端口并未对外部开放,为了从集群外部访问grafana,需要将grafana的servcie类型修改为NodePort,开放NodePort端口为30923

[root@k8s-master1 kube-prometheus]# kubectl patch -p '{"spec": {"type": "NodePort"}}' services grafana -n monitoring
service/grafana patched
[root@k8s-master1 kube-prometheus]# kubectl get services grafana -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
grafana NodePort 10.10.85.194 <none> 3000:30858/TCP 5m56s

外部通过30923端口访问grafana,初始默认登陆的用户名和密码均为admin,首次登陆grafana会提示修改用户密码,密码符合复杂性要求,如下为登陆后的grafana的展板显示

高可用kube-prometheus 5分钟快速搭建

高可用kube-prometheus 5分钟快速搭建

卸载

kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup

写在最后

本文总结了在kubernetes中使用prometheus提供完备的自定义监控系统,通过grafana展示更丰富绚烂的图标内容,相比于核心监控指标metric-server而言,prometheus能够提供更加丰富的监控指标,且这些自定义监控指标能用于HPA V2(参考​​官方说明​​)中实现更丰富的弹性扩展伸缩能力,毫无疑问,prometheus的出现让kubernetes的监控变得更简单而功能丰富。

参考文献

prometheus官网:​​https://prometheus.io​

kube-prometheus安装官档:​​https://github.com/coreos/kube-prometheus​

TKE自动弹性伸缩指标说明:​​https://cloud.tencent.com/document/product/457/38929​

HPA使用说明:​​https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/​