K8s: 在Pod里面创建init容器与健康检查

时间:2024-04-22 21:50:07
  • 创建一个 liveness.yaml 文件

    apiVersion: v1
    kind: Pod
    metadata:
      name: liveness-http
      labels:
        test: liveness
    spec:
      containers:
      - name: liveness
        image: mirrorgooglecontainers/liveness
        args:
        - /server
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
            httpHeaders:
            - name: Custom-Header
              value: Awesome
          initialDelaySeconds: 3
          periodSeconds: 3
    
  • 这是官方曾经提供的一个案例

  • 在这个配置文件中,可以看到 Pod 也只有一个容器

  • periodSeconds 字段指定了 kubelet 每隔 3 秒执行一次存活探测

  • initialDelaySeconds 字段告诉 kubelet 在执行第一次探测前应该等待 3 秒

  • kubelet会向容器内运行的服务(服务会监听 8080 端口)发送一个 HTTP GET 请求来执行探测

  • 如果服务器上 /healthz 路径下的处理程序返回成功代码,则 kubelet 认为容器是健康存活的

  • 如果处理程序返回失败代码,则 kubelet 会杀死这个容器并且重新启动它

  • 任何大于或等于 200 并且小于 400 的返回代码标示成功,其它返回代码都标示失败。

  • 可以有容器内运行服务的源码 server.go

    http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
    	 duration := time.Now().Sub(started)
    	 if duration.Seconds() > 10 {
    		 w.WriteHeader(500)
    		 w.Write([]byte(fmt.Sprintf("error: %v", duration.Seconds())))
    	 } else {
    		 w.WriteHeader(200)
    		 w.Write([]byte("ok"))
    	 }
    })
    
  • 容器存活的最开始 10 秒中, /healthz 处理程序返回一个 200 的状态码

  • 之后处理程序返回 500 的状态码

  • $ kubectl apply -f liveness.yaml 创建 pod

    pod/liveness-http created
    
  • $ kubectl get po -w | grep live 监控 pod

    liveness-http    0/1     ContainerCreating   0          2s
    liveness-http    1/1     Running             0          5s
    liveness-http    1/1     Running             1 (4s ago)   26s
    liveness-http    1/1     Running             2 (3s ago)   46s
    liveness-http    1/1     Running             3 (3s ago)   67s
    liveness-http    1/1     Running             4 (3s ago)   88s
    liveness-http    0/1     CrashLoopBackOff    4 (0s ago)   106s
    liveness-http    1/1     Running             5 (56s ago)   2m42s
    liveness-http    0/1     CrashLoopBackOff    5 (0s ago)    2m58s
    liveness-http    1/1     Running             6 (95s ago)   4m33s
    liveness-http    0/1     CrashLoopBackOff    6 (0s ago)    4m52s
    
    • 可见一直在重启
  • $ kubectl describe pod liveness-http 查看 pod 详细问题

    Name:         liveness-http
    Namespace:    default
    Priority:     0
    Node:         node1.k8s/10.211.55.11
    Start Time:   Thu, 18 Apr 2024 16:45:57 +0800
    Labels:       test=liveness
    Annotations:  <none>
    Status:       Running
    IP:           10.244.1.27
    IPs:
      IP:  10.244.1.27
    Containers:
      liveness:
        Container ID:  docker://0bde3a8c2ab79d2fd389659354771da10edbdfd29e3bb3d5c87fcbcb44f918b1
        Image:         mirrorgooglecontainers/liveness
        Image ID:      docker-pullable://mirrorgooglecontainers/liveness@sha256:854458862be990608ad916980f9d3c552ac978ff70ceb0f90508858ec8fc4a62
        Port:          <none>
        Host Port:     <none>
        Args:
          /server
        State:          Waiting
          Reason:       CrashLoopBackOff
        Last State:     Terminated
          Reason:       Error
          Exit Code:    2
          Started:      Thu, 18 Apr 2024 16:47:19 +0800
          Finished:     Thu, 18 Apr 2024 16:47:36 +0800
        Ready:          False
        Restart Count:  4
        Liveness:       http-get http://:8080/healthz delay=3s timeout=1s period=3s #success=1 #failure=3
        Environment:    <none>
        Mounts:
          /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nv6lw (ro)
    Conditions:
      Type              Status
      Initialized       True
      Ready             False
      ContainersReady   False
      PodScheduled      True
    Volumes:
      kube-api-access-nv6lw:
        Type:                    Projected (a volume that contains injected data from multiple sources)
        TokenExpirationSeconds:  3607
        ConfigMapName:           kube-root-ca.crt
        ConfigMapOptional:       <nil>
        DownwardAPI:             true
    QoS Class:                   BestEffort
    Node-Selectors:              <none>
    Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
    Events:
      Type     Reason     Age                 From               Message
      ----     ------     ----                ----               -------
      Normal   Scheduled  114s                default-scheduler  Successfully assigned default/liveness-http to node1.k8s
      Normal   Pulled     111s                kubelet            Successfully pulled image "mirrorgooglecontainers/liveness" in 2.298827654s
      Normal   Pulled     92s                 kubelet            Successfully pulled image "mirrorgooglecontainers/liveness" in 1.281083837s
      Normal   Created    74s (x3 over 111s)  kubelet            Created container liveness
      Normal   Started    74s (x3 over 111s)  kubelet            Started container liveness
      Normal   Pulled     74s                 kubelet            Successfully pulled image "mirrorgooglecontainers/liveness" in 1.048779521s
      Warning  Unhealthy  58s (x9 over 100s)  kubelet            Liveness probe failed: HTTP probe failed with statuscode: 500
      Normal   Killing    58s (x3 over 94s)   kubelet            Container liveness failed liveness probe, will be restarted
      Normal   Pulling    57s (x4 over 114s)  kubelet            Pulling image "mirrorgooglecontainers/liveness"
    
    • 这里可以看到
      • Liveness probe failed: HTTP probe failed with statuscode: 500
      • Container liveness failed liveness probe, will be restarted
    • 返回 500 之后,进行了重启
  • 这样,通过探针的健康检查,可以判断pod的运行健康状态