为什么需要Pod
Kubernetes项目之所以这么做的原因;
因为Kubernetes是谷歌公司基于Borg项目做出来的,谷歌工程师发现,他们部署的应用往往存在这进程与进程组的关系。具体说呢,就是这些应用之间有着密切的协作关系,使得他们必须部署在同一台机器上
而如果事先没有组的概念,像这样的运维关系很难处理;举个例子
rsyslogd是由3个进程组成的:一个imklog模块,一个imuxsock模块,一个rsyslogd自己的9函数主进程。这三个进程一定要运行在同一台机器上否则,他们之间基于 Socket的通信和文件交换,都会出现问题;现在,我要把rsyslogd这个应用给容器化,由于受限于的单进程模型,这三个模块必须被分别制作成三个不同容器运行,他们设置的内存配额都是1GB
注意:强调一下容器的“单进程模型”,并不是指容器里只能运行一个进程,而是容器没有管理多个进程的能力。这是因为容器里PID=1的进程就是应用本身,其他进程都是这个PID=1进程的子进程。可是,用户编写的应用,并不能够像正常操作系统里的init进程或者systemd那样拥有进程管理功能。比如,你的应用是一个java Web程序(PID=1),然后你执行docker exec在后台启动了一个nginx进程(PID=3)。可是,当nginx进程异常退出的时候,你怎么知道呢?这个进程退出后的垃圾收集工作,由谁做
假设我们Kubernetes集群上有两个节点:node-1上有3GB可用内存,node-2上有2.5GB可用内存
这时,假设我要用Docker Swarm来运行这个rsyslogd程序。为了能够让着三个容器都运行在同一台机器上,就必须在两个容器上设置一个affinity=main(与main容器有亲密性)的约束,即:它俩必须和main容器运行在同一台机器上
然后,我依次执行:“docker run main” “docker run imklog” 和“docker run imuxsock”,创建这三个容器;这样,这三个容器都进入Swarm的代调度队列。然后,main容器和imklog容器都先后出队列并被调度到node-2节点上(这个情况完全有可能的)
可是,当imuxsock容器出队列调度时,Swarm就有的懵了:node-2上的可用资源只有0.5GB了,并不足运行imuxsock容器;可是,根据affinity=main的约束,imuxsock容器有只能运行在node-2上;这就是一个典型的成组调度没有被妥善处理的例子
工业界与学术界,关于这个问题的讨论可谓旷日持久,也产生很多可选方案
比如,Mesos中就有一个资源囤积的机制,会在所有设置了Affinity约束的任务都到达时,才开始对它们统一进行调度。而谷歌在Omege论文中提出使用乐观调度处理冲突方法,即:先不管这些冲突,而是通过精心设计的回滚机制在出现冲突之后解决
以上的方法都谈不上完美。资源囤积带来了不可避免的调度效率失所与死锁的可能性;而乐观调度的复杂程度,不是常规技术团结队所能驾驭的。
但是,到了Kubernetes项目里,这样的问题迎刃而解:Pod是Kubernetes的最小调度单位,这就意味着,Kubernetes项目在调度时,自然就会去选择可用内存等于3GB的node-1节点进行绑定,而根本就不会考虑nod-2
像这样的容器间紧密协作,我们称为“超亲密关系”。这些具有“超亲密关系”容器的典型特征包括但不限于:互相之间发生直接的文件交换,使用localhost或者Socket文件进行本地通信、会发生非常频繁的远程调用、需要共享这些Linux Namespace(比如,一个容器要加入另一个容器的Linux Namespace)等等
这也就意味着,并不是所有有关系的容器都属于同一个Pod。比如,PHP容器和Mysql虽然会发生访问关系,但并不需要、也不应该部署同一个Pod里,更适合做成两个pod
如果只是处理这种超亲密关系这样的调度问题,有Borg和Omega论文珠玉在前,Kubernetes项目肯定可以在调度器层面把它解决掉
不过,pod在Kubernetes项目里还有更重要的意义,那就是容器设计模式
为理解这一层含义,就必须介绍一下Pod的实现原理
首先关于Pod最重要的一个事实是:它就是一个逻辑概念;也就是说Kubernetes真正处理的,还是宿主机操作系统是上的Linux容器的Namespace与Cgroups,而并不存在所谓的Pod边界或者隔离排环境;pod其实就是一组共享某些资源的容器
具体说:pod里所有容器都是共享同一个Network Namespace,并且可以声明挂载同一个Volume
那这么来看的话,一个有A、B两个容器的pod,不就等同一个容器(容器A)共享另一个容器(容器B)的网络和Volume的玩法了;这个好像通过 Docker run --net --volumes-from这样的命令就可以实现
docker run --net=B --volumes-from=B --name=A image-A ...
但是,你没有考虑过,如果真的这样的话,容器B就必须比容器A先启动,这样一个Pod里的多个容器就不是对等关系,而是拓扑关系;
所以在Kubernetes项目里,pod的实现需要一个中间容器,这个容器叫Infra容器。在这个pod中,infra容器永远都是第一个被创建出来的容器,而其他用户定义的容器,则通过join Network Namespace 的方式,与Infra容器关联在一起的
这个pod 里有两个用户容器A和B,还有一个Infra容器,很容易理解,在Kubernetes项目里,Infra容器一定占有少量的资源,所有它使用的是一个非常特殊的镜像叫做:k8s.gcr.io/pause。这个镜像时一个汇编语言编写的、永远处于暂停装态的容器,解压后的大小也只有100~200kb左右
而Infra容器Hold住network Namespace后,用户容器
pod 控制器
ReplicaSet:代用户创建指定数量的副本,并控制副本数量一直处于用户期望的数量状态;多退少补,支持滚动更新;自动扩缩容机制;不建议直接使用
*帮助
[root@master manifests]# kubectl explain rs
KIND: ReplicaSet
VERSION: extensions/v1beta1 DESCRIPTION:
DEPRECATED - This group version of ReplicaSet is deprecated by
apps/v1beta2/ReplicaSet. See the release notes for more information.
ReplicaSet ensures that a specified number of pod replicas are running at
any given time. FIELDS:
apiVersion <string>
APIVersion defines the versioned schema of this representation of an
object. Servers should convert recognized schemas to the latest internal
value, and may reject unrecognized values. More info:
https://git.k8s.io/community/contributors/devel/api-conventions.md#resources kind <string>
Kind is a string value representing the REST resource this object
represents. Servers may infer this from the endpoint the client submits
requests to. Cannot be updated. In CamelCase. More info:
https://git.k8s.io/community/contributors/devel/api-conventions.md#types-kinds metadata <Object> 控制器元数据
If the Labels of a ReplicaSet are empty, they are defaulted to be the same
as the Pod(s) that the ReplicaSet manages. Standard object's metadata. More
info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata spec <Object> 控制器的定义
Spec defines the specification of the desired behavior of the ReplicaSet.
More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status status <Object>
Status is the most recently observed status of the ReplicaSet. This data
may be out of date by some window of time. Populated by the system.
Read-only. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status
控制器元数据定义参数
[root@master manifests]# kubectl explain rs.metadata
KIND: ReplicaSet
VERSION: extensions/v1beta1 RESOURCE: metadata <Object> DESCRIPTION:
If the Labels of a ReplicaSet are empty, they are defaulted to be the same
as the Pod(s) that the ReplicaSet manages. Standard object's metadata. More
info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata ObjectMeta is metadata that all persisted resources must have, which
includes all objects users must create. FIELDS:
annotations <map[string]string>
Annotations is an unstructured key value map stored with a resource that
may be set by external tools to store and retrieve arbitrary metadata. They
are not queryable and should be preserved when modifying objects. More
info: http://kubernetes.io/docs/user-guide/annotations clusterName <string>
The name of the cluster which the object belongs to. This is used to
distinguish resources with same name and namespace in different clusters.
This field is not set anywhere right now and apiserver is going to ignore
it if set in create or update request. creationTimestamp <string>
CreationTimestamp is a timestamp representing the server time when this
object was created. It is not guaranteed to be set in happens-before order
across separate operations. Clients may not set this value. It is
represented in RFC3339 form and is in UTC. Populated by the system.
Read-only. Null for lists. More info:
https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata deletionGracePeriodSeconds <integer>
Number of seconds allowed for this object to gracefully terminate before it
will be removed from the system. Only set when deletionTimestamp is also
set. May only be shortened. Read-only. deletionTimestamp <string>
DeletionTimestamp is RFC 3339 date and time at which this resource will be
deleted. This field is set by the server when a graceful deletion is
requested by the user, and is not directly settable by a client. The
resource is expected to be deleted (no longer visible from resource lists,
and not reachable by name) after the time in this field, once the
finalizers list is empty. As long as the finalizers list contains items,
deletion is blocked. Once the deletionTimestamp is set, this value may not
be unset or be set further into the future, although it may be shortened or
the resource may be deleted prior to this time. For example, a user may
request that a pod is deleted in 30 seconds. The Kubelet will react by
sending a graceful termination signal to the containers in the pod. After
that 30 seconds, the Kubelet will send a hard termination signal (SIGKILL)
to the container and after cleanup, remove the pod from the API. In the
presence of network partitions, this object may still exist after this
timestamp, until an administrator or automated process can determine the
resource is fully terminated. If not set, graceful deletion of the object
has not been requested. Populated by the system when a graceful deletion is
requested. Read-only. More info:
https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata finalizers <[]string>
Must be empty before the object is deleted from the registry. Each entry is
an identifier for the responsible component that will remove the entry from
the list. If the deletionTimestamp of the object is non-nil, entries in
this list can only be removed. generateName <string>
GenerateName is an optional prefix, used by the server, to generate a
unique name ONLY IF the Name field has not been provided. If this field is
used, the name returned to the client will be different than the name
passed. This value will also be combined with a unique suffix. The provided
value has the same validation rules as the Name field, and may be truncated
by the length of the suffix required to make the value unique on the
server. If this field is specified and the generated name exists, the
server will NOT return a 409 - instead, it will either return 201 Created
or 500 with Reason ServerTimeout indicating a unique name could not be
found in the time allotted, and the client should retry (optionally after
the time indicated in the Retry-After header). Applied only if Name is not
specified. More info:
https://git.k8s.io/community/contributors/devel/api-conventions.md#idempotency generation <integer>
A sequence number representing a specific generation of the desired state.
Populated by the system. Read-only. initializers <Object>
An initializer is a controller which enforces some system invariant at
object creation time. This field is a list of initializers that have not
yet acted on this object. If nil or empty, this object has been completely
initialized. Otherwise, the object is considered uninitialized and is
hidden (in list/watch and get calls) from clients that haven't explicitly
asked to observe uninitialized objects. When an object is created, the
system will populate this list with the current set of initializers. Only
privileged users may set or modify this list. Once it is empty, it may not
be modified further by any user. DEPRECATED - initializers are an alpha
field and will be removed in v1.15. labels <map[string]string>
Map of string keys and values that can be used to organize and categorize
(scope and select) objects. May match selectors of replication controllers
and services. More info: http://kubernetes.io/docs/user-guide/labels managedFields <[]Object>
ManagedFields maps workflow-id and version to the set of fields that are
managed by that workflow. This is mostly for internal housekeeping, and
users typically shouldn't need to set or understand this field. A workflow
can be the user's name, a controller's name, or the name of a specific
apply path like "ci-cd". The set of fields is always in the version that
the workflow used when modifying the object. This field is alpha and can be
changed or removed without notice. name <string> 名字
Name must be unique within a namespace. Is required when creating
resources, although some resources may allow a client to request the
generation of an appropriate name automatically. Name is primarily intended
for creation idempotence and configuration definition. Cannot be updated.
More info: http://kubernetes.io/docs/user-guide/identifiers#names namespace <string> 属于哪个名称空间
Namespace defines the space within each name must be unique. An empty
namespace is equivalent to the "default" namespace, but "default" is the
canonical representation. Not all objects are required to be scoped to a
namespace - the value of this field for those objects will be empty. Must
be a DNS_LABEL. Cannot be updated. More info:
http://kubernetes.io/docs/user-guide/namespaces ownerReferences <[]Object>
List of objects depended by this object. If ALL objects in the list have
been deleted, this object will be garbage collected. If this object is
managed by a controller, then an entry in this list will point to this
controller, with the controller field set to true. There cannot be more
than one managing controller. resourceVersion <string>
An opaque value that represents the internal version of this object that
can be used by clients to determine when objects have changed. May be used
for optimistic concurrency, change detection, and the watch operation on a
resource or set of resources. Clients must treat these values as opaque and
passed unmodified back to the server. They may only be valid for a
particular resource or set of resources. Populated by the system.
Read-only. Value must be treated as opaque by clients and . More info:
https://git.k8s.io/community/contributors/devel/api-conventions.md#concurrency-control-and-consistency selfLink <string>
SelfLink is a URL representing this object. Populated by the system.
Read-only. uid <string>
UID is the unique in time and space value for this object. It is typically
generated by the server on successful creation of a resource and is not
allowed to change on PUT operations. Populated by the system. Read-only.
More info: http://kubernetes.io/docs/user-guide/identifiers#uids
控制器状态定义参数
[root@master manifests]# kubectl explain rs.spec
KIND: ReplicaSet
VERSION: extensions/v1beta1 RESOURCE: spec <Object> DESCRIPTION:
Spec defines the specification of the desired behavior of the ReplicaSet.
More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status ReplicaSetSpec is the specification of a ReplicaSet. FIELDS:
minReadySeconds <integer>
Minimum number of seconds for which a newly created pod should be ready
without any of its container crashing, for it to be considered available.
Defaults to 0 (pod will be considered available as soon as it is ready) replicas <integer> 副本个数
Replicas is the number of desired replicas. This is a pointer to
distinguish between explicit zero and unspecified. Defaults to 1. More
info:
https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller/#what-is-a-replicationcontroller selector <Object> 标签选择器
Selector is a label query over pods that should match the replica count. If
the selector is empty, it is defaulted to the labels present on the pod
template. Label keys and values that must match in order to be controlled
by this replica set. More info:
https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors template <Object> pod的定义
Template is the object that describes the pod that will be created if
insufficient replicas are detected. More info:
https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller#pod-template
pod的定义参数介绍
[root@master manifests]# kubectl explain rs.spec.template
KIND: ReplicaSet
VERSION: extensions/v1beta1 RESOURCE: template <Object> DESCRIPTION:
Template is the object that describes the pod that will be created if
insufficient replicas are detected. More info:
https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller#pod-template PodTemplateSpec describes the data a pod should have when created from a
template FIELDS:
metadata <Object> 元数据
Standard object's metadata. More info:
https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata spec <Object> pod 的目标状态定义
Specification of the desired behavior of the pod. More info:
https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status
pod元数据的定义
[root@master manifests]# kubectl explain rs.spec.template.metadata
KIND: ReplicaSet
VERSION: extensions/v1beta1 RESOURCE: metadata <Object> DESCRIPTION:
Standard object's metadata. More info:
https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata ObjectMeta is metadata that all persisted resources must have, which
includes all objects users must create. FIELDS:
annotations <map[string]string>
Annotations is an unstructured key value map stored with a resource that
may be set by external tools to store and retrieve arbitrary metadata. They
are not queryable and should be preserved when modifying objects. More
info: http://kubernetes.io/docs/user-guide/annotations clusterName <string>
The name of the cluster which the object belongs to. This is used to
distinguish resources with same name and namespace in different clusters.
This field is not set anywhere right now and apiserver is going to ignore
it if set in create or update request. creationTimestamp <string>
CreationTimestamp is a timestamp representing the server time when this
object was created. It is not guaranteed to be set in happens-before order
across separate operations. Clients may not set this value. It is
represented in RFC3339 form and is in UTC. Populated by the system.
Read-only. Null for lists. More info:
https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata deletionGracePeriodSeconds <integer>
Number of seconds allowed for this object to gracefully terminate before it
will be removed from the system. Only set when deletionTimestamp is also
set. May only be shortened. Read-only. deletionTimestamp <string>
DeletionTimestamp is RFC 3339 date and time at which this resource will be
deleted. This field is set by the server when a graceful deletion is
requested by the user, and is not directly settable by a client. The
resource is expected to be deleted (no longer visible from resource lists,
and not reachable by name) after the time in this field, once the
finalizers list is empty. As long as the finalizers list contains items,
deletion is blocked. Once the deletionTimestamp is set, this value may not
be unset or be set further into the future, although it may be shortened or
the resource may be deleted prior to this time. For example, a user may
request that a pod is deleted in 30 seconds. The Kubelet will react by
sending a graceful termination signal to the containers in the pod. After
that 30 seconds, the Kubelet will send a hard termination signal (SIGKILL)
to the container and after cleanup, remove the pod from the API. In the
presence of network partitions, this object may still exist after this
timestamp, until an administrator or automated process can determine the
resource is fully terminated. If not set, graceful deletion of the object
has not been requested. Populated by the system when a graceful deletion is
requested. Read-only. More info:
https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata finalizers <[]string>
Must be empty before the object is deleted from the registry. Each entry is
an identifier for the responsible component that will remove the entry from
the list. If the deletionTimestamp of the object is non-nil, entries in
this list can only be removed. generateName <string>
GenerateName is an optional prefix, used by the server, to generate a
unique name ONLY IF the Name field has not been provided. If this field is
used, the name returned to the client will be different than the name
passed. This value will also be combined with a unique suffix. The provided
value has the same validation rules as the Name field, and may be truncated
by the length of the suffix required to make the value unique on the
server. If this field is specified and the generated name exists, the
server will NOT return a 409 - instead, it will either return 201 Created
or 500 with Reason ServerTimeout indicating a unique name could not be
found in the time allotted, and the client should retry (optionally after
the time indicated in the Retry-After header). Applied only if Name is not
specified. More info:
https://git.k8s.io/community/contributors/devel/api-conventions.md#idempotency generation <integer>
A sequence number representing a specific generation of the desired state.
Populated by the system. Read-only. initializers <Object>
An initializer is a controller which enforces some system invariant at
object creation time. This field is a list of initializers that have not
yet acted on this object. If nil or empty, this object has been completely
initialized. Otherwise, the object is considered uninitialized and is
hidden (in list/watch and get calls) from clients that haven't explicitly
asked to observe uninitialized objects. When an object is created, the
system will populate this list with the current set of initializers. Only
privileged users may set or modify this list. Once it is empty, it may not
be modified further by any user. DEPRECATED - initializers are an alpha
field and will be removed in v1.15. labels <map[string]string> 标签定义
Map of string keys and values that can be used to organize and categorize
(scope and select) objects. May match selectors of replication controllers
and services. More info: http://kubernetes.io/docs/user-guide/labels managedFields <[]Object>
ManagedFields maps workflow-id and version to the set of fields that are
managed by that workflow. This is mostly for internal housekeeping, and
users typically shouldn't need to set or understand this field. A workflow
can be the user's name, a controller's name, or the name of a specific
apply path like "ci-cd". The set of fields is always in the version that
the workflow used when modifying the object. This field is alpha and can be
changed or removed without notice. name <string>
Name must be unique within a namespace. Is required when creating
resources, although some resources may allow a client to request the
generation of an appropriate name automatically. Name is primarily intended
for creation idempotence and configuration definition. Cannot be updated.
More info: http://kubernetes.io/docs/user-guide/identifiers#names namespace <string>
Namespace defines the space within each name must be unique. An empty
namespace is equivalent to the "default" namespace, but "default" is the
canonical representation. Not all objects are required to be scoped to a
namespace - the value of this field for those objects will be empty. Must
be a DNS_LABEL. Cannot be updated. More info:
http://kubernetes.io/docs/user-guide/namespaces ownerReferences <[]Object>
List of objects depended by this object. If ALL objects in the list have
been deleted, this object will be garbage collected. If this object is
managed by a controller, then an entry in this list will point to this
controller, with the controller field set to true. There cannot be more
than one managing controller. resourceVersion <string>
An opaque value that represents the internal version of this object that
can be used by clients to determine when objects have changed. May be used
for optimistic concurrency, change detection, and the watch operation on a
resource or set of resources. Clients must treat these values as opaque and
passed unmodified back to the server. They may only be valid for a
particular resource or set of resources. Populated by the system.
Read-only. Value must be treated as opaque by clients and . More info:
https://git.k8s.io/community/contributors/devel/api-conventions.md#concurrency-control-and-consistency selfLink <string>
SelfLink is a URL representing this object. Populated by the system.
Read-only. uid <string>
UID is the unique in time and space value for this object. It is typically
generated by the server on successful creation of a resource and is not
allowed to change on PUT operations. Populated by the system. Read-only.
More info: http://kubernetes.io/docs/user-guide/identifiers#uids
pod的状态定义的参数
[root@master manifests]# kubectl explain rs.spec.template.spec
KIND: ReplicaSet
VERSION: extensions/v1beta1 RESOURCE: spec <Object> DESCRIPTION:
Specification of the desired behavior of the pod. More info:
https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status PodSpec is a description of a pod. FIELDS:
activeDeadlineSeconds <integer>
Optional duration in seconds the pod may be active on the node relative to
StartTime before the system will actively try to mark it failed and kill
associated containers. Value must be a positive integer. affinity <Object>
If specified, the pod's scheduling constraints automountServiceAccountToken <boolean>
AutomountServiceAccountToken indicates whether a service account token
should be automatically mounted. containers <[]Object> -required- 容器的定义
List of containers belonging to the pod. Containers cannot currently be
added or removed. There must be at least one container in a Pod. Cannot be
updated. dnsConfig <Object>
Specifies the DNS parameters of a pod. Parameters specified here will be
merged to the generated DNS configuration based on DNSPolicy. dnsPolicy <string>
Set DNS policy for the pod. Defaults to "ClusterFirst". Valid values are
'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'. DNS
parameters given in DNSConfig will be merged with the policy selected with
DNSPolicy. To have DNS options set along with hostNetwork, you have to
specify DNS policy explicitly to 'ClusterFirstWithHostNet'. enableServiceLinks <boolean>
EnableServiceLinks indicates whether information about services should be
injected into pod's environment variables, matching the syntax of Docker
links. Optional: Defaults to true. hostAliases <[]Object>
HostAliases is an optional list of hosts and IPs that will be injected into
the pod's hosts file if specified. This is only valid for non-hostNetwork
pods. hostIPC <boolean>
Use the host's ipc namespace. Optional: Default to false. hostNetwork <boolean>
Host networking requested for this pod. Use the host's network namespace.
If this option is set, the ports that will be used must be specified.
Default to false. hostPID <boolean>
Use the host's pid namespace. Optional: Default to false. hostname <string>
Specifies the hostname of the Pod If not specified, the pod's hostname will
be set to a system-defined value. imagePullSecrets <[]Object>
ImagePullSecrets is an optional list of references to secrets in the same
namespace to use for pulling any of the images used by this PodSpec. If
specified, these secrets will be passed to individual puller
implementations for them to use. For example, in the case of docker, only
DockerConfig type secrets are honored. More info:
https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod initContainers <[]Object>
List of initialization containers belonging to the pod. Init containers are
executed in order prior to containers being started. If any init container
fails, the pod is considered to have failed and is handled according to its
restartPolicy. The name for an init container or normal container must be
unique among all containers. Init containers may not have Lifecycle
actions, Readiness probes, or Liveness probes. The resourceRequirements of
an init container are taken into account during scheduling by finding the
highest request/limit for each resource type, and then using the max of of
that value or the sum of the normal containers. Limits are applied to init
containers in a similar fashion. Init containers cannot currently be added
or removed. Cannot be updated. More info:
https://kubernetes.io/docs/concepts/workloads/pods/init-containers/ nodeName <string>
NodeName is a request to schedule this pod onto a specific node. If it is
non-empty, the scheduler simply schedules this pod onto that node, assuming
that it fits resource requirements. nodeSelector <map[string]string>
NodeSelector is a selector which must be true for the pod to fit on a node.
Selector which must match a node's labels for the pod to be scheduled on
that node. More info:
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ preemptionPolicy <string>
PreemptionPolicy is the Policy for preempting pods with lower priority. One
of Never, PreemptLowerPriority. Defaults to PreemptLowerPriority if unset.
This field is alpha-level and is only honored by servers that enable the
NonPreemptingPriority feature. priority <integer>
The priority value. Various system components use this field to find the
priority of the pod. When Priority Admission Controller is enabled, it
prevents users from setting this field. The admission controller populates
this field from PriorityClassName. The higher the value, the higher the
priority. priorityClassName <string>
If specified, indicates the pod's priority. "system-node-critical" and
"system-cluster-critical" are two special keywords which indicate the
highest priorities with the former being the highest priority. Any other
name must be defined by creating a PriorityClass object with that name. If
not specified, the pod priority will be default or zero if there is no
default. readinessGates <[]Object>
If specified, all readiness gates will be evaluated for pod readiness. A
pod is ready when all its containers are ready AND all conditions specified
in the readiness gates have status equal to "True" More info:
https://git.k8s.io/enhancements/keps/sig-network/0007-pod-ready%2B%2B.md restartPolicy <string>
Restart policy for all containers within the pod. One of Always, OnFailure,
Never. Default to Always. More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy runtimeClassName <string>
RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group,
which should be used to run this pod. If no RuntimeClass resource matches
the named class, the pod will not be run. If unset or empty, the "legacy"
RuntimeClass will be used, which is an implicit class with an empty
definition that uses the default runtime handler. More info:
https://git.k8s.io/enhancements/keps/sig-node/runtime-class.md This is a
beta feature as of Kubernetes v1.14. schedulerName <string>
If specified, the pod will be dispatched by specified scheduler. If not
specified, the pod will be dispatched by default scheduler. securityContext <Object>
SecurityContext holds pod-level security attributes and common container
settings. Optional: Defaults to empty. See type description for default
values of each field. serviceAccount <string>
DeprecatedServiceAccount is a depreciated alias for ServiceAccountName.
Deprecated: Use serviceAccountName instead. serviceAccountName <string>
ServiceAccountName is the name of the ServiceAccount to use to run this
pod. More info:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/ shareProcessNamespace <boolean>
Share a single process namespace between all of the containers in a pod.
When this is set containers will be able to view and signal processes from
other containers in the same pod, and the first process in each container
will not be assigned PID 1. HostPID and ShareProcessNamespace cannot both
be set. Optional: Default to false. This field is beta-level and may be
disabled with the PodShareProcessNamespace feature. subdomain <string>
If specified, the fully qualified Pod hostname will be
"<hostname>.<subdomain>.<pod namespace>.svc.<cluster domain>". If not
specified, the pod will not have a domainname at all. terminationGracePeriodSeconds <integer>
Optional duration in seconds the pod needs to terminate gracefully. May be
decreased in delete request. Value must be non-negative integer. The value
zero indicates delete immediately. If this value is nil, the default grace
period will be used instead. The grace period is the duration in seconds
after the processes running in the pod are sent a termination signal and
the time when the processes are forcibly halted with a kill signal. Set
this value longer than the expected cleanup time for your process. Defaults
to 30 seconds. tolerations <[]Object>
If specified, the pod's tolerations. volumes <[]Object>
List of volumes that can be mounted by containers belonging to the pod.
More info: https://kubernetes.io/docs/concepts/storage/volumes
pod的容器的相关定义
[root@master manifests]# kubectl explain rs.spec.template.spec.containers
KIND: ReplicaSet
VERSION: extensions/v1beta1 RESOURCE: containers <[]Object> DESCRIPTION:
List of containers belonging to the pod. Containers cannot currently be
added or removed. There must be at least one container in a Pod. Cannot be
updated. A single application container that you want to run within a pod. FIELDS:
args <[]string>
Arguments to the entrypoint. The docker image's CMD is used if this is not
provided. Variable references $(VAR_NAME) are expanded using the
container's environment. If a variable cannot be resolved, the reference in
the input string will be unchanged. The $(VAR_NAME) syntax can be escaped
with a double $$, ie: $$(VAR_NAME). Escaped references will never be
expanded, regardless of whether the variable exists or not. Cannot be
updated. More info:
https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#running-a-command-in-a-shell command <[]string>
Entrypoint array. Not executed within a shell. The docker image's
ENTRYPOINT is used if this is not provided. Variable references $(VAR_NAME)
are expanded using the container's environment. If a variable cannot be
resolved, the reference in the input string will be unchanged. The
$(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME).
Escaped references will never be expanded, regardless of whether the
variable exists or not. Cannot be updated. More info:
https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#running-a-command-in-a-shell env <[]Object> 容器里变量定义
List of environment variables to set in the container. Cannot be updated. envFrom <[]Object>
List of sources to populate environment variables in the container. The
keys defined within a source must be a C_IDENTIFIER. All invalid keys will
be reported as an event when the container is starting. When a key exists
in multiple sources, the value associated with the last source will take
precedence. Values defined by an Env with a duplicate key will take
precedence. Cannot be updated. image <string> 使用的容器镜像
Docker image name. More info:
https://kubernetes.io/docs/concepts/containers/images This field is
optional to allow higher level config management to default or override
container images in workload controllers like Deployments and StatefulSets. imagePullPolicy <string> 获取镜像的策略
Image pull policy. One of Always, Never, IfNotPresent. Defaults to Always
if :latest tag is specified, or IfNotPresent otherwise. Cannot be updated.
More info:
https://kubernetes.io/docs/concepts/containers/images#updating-images lifecycle <Object>
Actions that the management system should take in response to container
lifecycle events. Cannot be updated. livenessProbe <Object>
Periodic probe of container liveness. Container will be restarted if the
probe fails. Cannot be updated. More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes name <string> -required- 容器名字
Name of the container specified as a DNS_LABEL. Each container in a pod
must have a unique name (DNS_LABEL). Cannot be updated. ports <[]Object> 暴露端口的参数
List of ports to expose from the container. Exposing a port here gives the
system additional information about the network connections a container
uses, but is primarily informational. Not specifying a port here DOES NOT
prevent that port from being exposed. Any port which is listening on the
default "0.0.0.0" address inside a container will be accessible from the
network. Cannot be updated. readinessProbe <Object>
Periodic probe of container service readiness. Container will be removed
from service endpoints if the probe fails. Cannot be updated. More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes resources <Object>
Compute Resources required by this container. Cannot be updated. More info:
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ securityContext <Object>
Security options the pod should run with. More info:
https://kubernetes.io/docs/concepts/policy/security-context/ More info:
https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ stdin <boolean>
Whether this container should allocate a buffer for stdin in the container
runtime. If this is not set, reads from stdin in the container will always
result in EOF. Default is false. stdinOnce <boolean>
Whether the container runtime should close the stdin channel after it has
been opened by a single attach. When stdin is true the stdin stream will
remain open across multiple attach sessions. If stdinOnce is set to true,
stdin is opened on container start, is empty until the first client
attaches to stdin, and then remains open and accepts data until the client
disconnects, at which time stdin is closed and remains closed until the
container is restarted. If this flag is false, a container processes that
reads from stdin will never receive an EOF. Default is false terminationMessagePath <string>
Optional: Path at which the file to which the container's termination
message will be written is mounted into the container's filesystem. Message
written is intended to be brief final status, such as an assertion failure
message. Will be truncated by the node if greater than 4096 bytes. The
total message length across all containers will be limited to 12kb.
Defaults to /dev/termination-log. Cannot be updated. terminationMessagePolicy <string>
Indicate how the termination message should be populated. File will use the
contents of terminationMessagePath to populate the container status message
on both success and failure. FallbackToLogsOnError will use the last chunk
of container log output if the termination message file is empty and the
container exited with an error. The log output is limited to 2048 bytes or
80 lines, whichever is smaller. Defaults to File. Cannot be updated. tty <boolean>
Whether this container should allocate a TTY for itself, also requires
'stdin' to be true. Default is false. volumeDevices <[]Object>
volumeDevices is the list of block devices to be used by the container.
This is a beta feature. volumeMounts <[]Object>
Pod volumes to mount into the container's filesystem. Cannot be updated. workingDir <string>
Container's working directory. If not specified, the container runtime's
default will be used, which might be configured in the container image.
Cannot be updated.
pod里容器暴露端口的参数
[root@master manifests]# kubectl explain rs.spec.template.spec.containers.ports
KIND: ReplicaSet
VERSION: extensions/v1beta1 RESOURCE: ports <[]Object> DESCRIPTION:
List of ports to expose from the container. Exposing a port here gives the
system additional information about the network connections a container
uses, but is primarily informational. Not specifying a port here DOES NOT
prevent that port from being exposed. Any port which is listening on the
default "0.0.0.0" address inside a container will be accessible from the
network. Cannot be updated. ContainerPort represents a network port in a single container. FIELDS:
containerPort <integer> -required- 容器里的端口
Number of port to expose on the pod's IP address. This must be a valid port
number, 0 < x < 65536. hostIP <string>
What host IP to bind the external port to. hostPort <integer>
Number of port to expose on the host. If specified, this must be a valid
port number, 0 < x < 65536. If HostNetwork is specified, this must match
ContainerPort. Most containers do not need this. name <string> 名字
If specified, this must be an IANA_SVC_NAME and unique within the pod. Each
named port in a pod must have a unique name. Name for the port that can be
referred to by services. protocol <string> 协议
Protocol for port. Must be UDP, TCP, or SCTP. Defaults to "TCP".
pod容器状态探针的定义
[root@master manifests]# kubectl explain rs.spec.template.spec.containers.livenessProbe
KIND: ReplicaSet
VERSION: extensions/v1beta1 RESOURCE: livenessProbe <Object> DESCRIPTION:
Periodic probe of container liveness. Container will be restarted if the
probe fails. Cannot be updated. More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes Probe describes a health check to be performed against a container to
determine whether it is alive or ready to receive traffic. FIELDS:
exec <Object> 使用命令
One and only one of the following should be specified. Exec specifies the
action to take. failureThreshold <integer>
Minimum consecutive failures for the probe to be considered failed after
having succeeded. Defaults to 3. Minimum value is 1. httpGet <Object> 使用http
HTTPGet specifies the http request to perform. initialDelaySeconds <integer>
Number of seconds after the container has started before liveness probes
are initiated. More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes periodSeconds <integer>
How often (in seconds) to perform the probe. Default to 10 seconds. Minimum
value is 1. successThreshold <integer>
Minimum consecutive successes for the probe to be considered successful
after having failed. Defaults to 1. Must be 1 for liveness. Minimum value
is 1. tcpSocket <Object> 使用tcp
TCPSocket specifies an action involving a TCP port. TCP hooks not yet
supported timeoutSeconds <integer>
Number of seconds after which the probe times out. Defaults to 1 second.
Minimum value is 1. More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
编写一个rs控制器的yaml文件,并启动pod
[root@master manifests]# cat rs-01.yaml
apiVersion: apps/v1 #API的版本
kind: ReplicaSet #控制器对象
metadata: #控制器元数据
name: rs-myapp #控制的名字
namespace: default #控制器的名称空间
spec: #期望状态定义
replicas: 3 #期望的副本数量
selector: #便签选择器定义
matchLabels: #使用哪个标签选择器
app: rs-cx #标签的定义
rs: cx #标签定义
template: pod定义
metadata: pod 元数据定义
labels: 定义pod 标签
app: rs-cx
rs: cx
spec: pod期望状态定义
containers: 容器定义
- name: myapp-rs 容器的名字
image: ikubernetes/myapp:v1 镜像的定义
ports: 暴露端口的定义
- name: http 端口名字定义
containerPort: 80 容器里暴露的端口
创建这个控制器类型的pod
kubectl create -f rs-01.yaml
查看创建pod
kubectl get pods -o wide -l rs=cx
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rs-myapp-f5pg7 1/1 Running 0 168m 10.244.1.43 node01 <none> <none>
rs-myapp-wzjkz 1/1 Running 0 168m 10.244.1.45 node01 <none> <none>
rs-myapp-z6kx4 1/1 Running 0 168m 10.244.2.21 node02 <none> <none>
删除一个pod 自动创建
[root@master manifests]# kubectl delete pods rs-myapp-z6kx4
pod "rs-myapp-z6kx4" deleted
[root@master manifests]# kubectl get pods -o wide -l rs=cx
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rs-myapp-f5pg7 1/1 Running 0 170m 10.244.1.43 node01 <none> <none>
rs-myapp-pmvw8 1/1 Running 0 9s 10.244.2.23 node02 <none> <none>
rs-myapp-wzjkz 1/1 Running 0 170m 10.244.1.45 node01 <none> <none>
查看创建的pod 的详细信息
[root@master manifests]# kubectl describe pods rs-myapp-wzjkz
Name: rs-myapp-wzjkz
Namespace: default
Priority: 0
Node: node01/192.168.183.12 运行在呢个节点
Start Time: Sat, 10 Aug 2019 13:01:49 +0800
Labels: app=rs-cx 标签
rs=cx
Annotations: <none>
Status: Running 状态
IP: 10.244.1.45 pod的IP地址
Controlled By: ReplicaSet/rs-myapp
Containers:
myapp-rs:
Container ID: docker://42b4318ab99e8d36aa5716ae8fa459ceb70adf4b68f4d1ebbbbcd79527457175
Image: ikubernetes/myapp:v1 镜像
Image ID: docker-pullable://ikubernetes/myapp@sha256:9c3dc30b5219788b2b8a4b065f548b922a34479577befb54b03330999d30d513
Port: 80/TCP容器端口
Host Port: 0/TCP
State: Running
Started: Sat, 10 Aug 2019 13:04:47 +0800
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-2m2ts (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-2m2ts:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-2m2ts
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
多退的示例
[root@master manifests]# kubectl label pods pod-demo app=rs-cx --overwrite
pod/pod-demo labeled
[root@master manifests]# kubectl label pods pod-demo rs=cx
pod/pod-demo labeled
[root@master manifests]# kubectl get pods --show-labels
NAME READY STATUS RESTARTS AGE LABELS
myapp-84cd4b7f95-g6ldp 1/1 Running 6 16d pod-template-hash=84cd4b7f95,run=myapp
nginx-5896f46c8-zblcs 1/1 Running 6 16d chenxi=cx,pod-template-hash=5896f46c8,run=nginx
pod-demo 2/2 Running 8 5d16h app=rs-cx,rs=cx,tier=frontend
rs-myapp-f5pg7 1/1 Running 0 3h5m app=rs-cx,rs=cx
rs-myapp-pmvw8 0/1 Terminating 0 15m app=rs-cx,rs=cx
rs-myapp-wzjkz 1/1 Running 0 3h5m app=rs-cx,rs=cx
[root@master manifests]# kubectl get pods --show-labels 自动随机删除一个pod
NAME READY STATUS RESTARTS AGE LABELS
myapp-84cd4b7f95-g6ldp 1/1 Running 6 16d pod-template-hash=84cd4b7f95,run=myapp
nginx-5896f46c8-zblcs 1/1 Running 6 16d chenxi=cx,pod-template-hash=5896f46c8,run=nginx
pod-demo 2/2 Running 8 5d16h app=rs-cx,rs=cx,tier=frontend
rs-myapp-f5pg7 1/1 Running 0 3h5m app=rs-cx,rs=cx
rs-myapp-wzjkz 1/1 Running 0 3h5m app=rs-cx,rs=cx
动态修改pod的个数
[root@master manifests]# kubectl edit rs rs-myapp # Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: extensions/v1beta1
kind: ReplicaSet
metadata:
creationTimestamp: "2019-08-10T05:01:49Z"
generation: 1
name: rs-myapp
namespace: default
resourceVersion: "587241"
selfLink: /apis/extensions/v1beta1/namespaces/default/replicasets/rs-myapp
uid: c3a57f4b-dde9-4b0c-804e-5026271b70f9
spec:
replicas: 5
selector:
matchLabels:
app: rs-cx
rs: cx
template:
metadata:
creationTimestamp: null
labels:
app: rs-cx
rs: cx
spec:
containers:
- image: ikubernetes/myapp:v1
imagePullPolicy: IfNotPresent
name: myapp-rs
ports:
- containerPort: 80
name: http
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 3
fullyLabeledReplicas: 3
observedGeneration: 1
readyReplicas: 3
replicas: 3
[root@master manifests]# kubectl get pods -o wide -l rs=cx 扩容到5个
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rs-myapp-5z6hp 1/1 Running 0 71s 10.244.2.26 node02 <none> <none>
rs-myapp-f5pg7 1/1 Running 0 3h17m 10.244.1.43 node01 <none> <none>
rs-myapp-j75tx 1/1 Running 0 8m9s 10.244.2.24 node02 <none> <none>
rs-myapp-kh659 1/1 Running 0 71s 10.244.2.25 node02 <none> <none>
rs-myapp-wzjkz 1/1 Running 0 3h17m 10.244.1.45 node01 <none> <none>
Deployment:工作在ReplicaSet之上,支持滚动更新与回滚操作,支持声明式的配置
DaemonSet: 保证每个节点上运行一个特定的pod副本,或指定类型的节点运行一个pod副本
job 运行一次性任务的pod控制器
CronJob:周期性任务的pod控制器
StatefulSet:有状态的pod控制