学习极客时间上的《深入剖析Kubernetes》
秉持眼过千遍不如手过一遍的原则。动手实践并记录结果
对应章节:21 | 容器化守护进程的意义:DaemonSet
nodeAffinity
原文中的nodeSelector和nodeAffinity的设置的yaml,apply之后一直处于pending状态。机智的我查看了一下flannel的pod设置,发现原文中使用了matchExpressions
,而flannel的则使用了matchFields
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 apiVersion: v1 kind: Pod metadata: name: node-affinity-pod spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchFields: - key: metadata.name operator: In values: - node2 containers: - name: busybox image: busybox imagePullPolicy: IfNotPresent stdin: true tty: true
这样,我指定了调度到node2
上
1 2 3 $ kubectl get pod node-affinity-pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES node-affinity-pod 1/1 Running 0 23s 172.1.1.100 node2 <none> <none>
当然,这样只是指定了node来调度,但并不惟一。比如修改上面yaml文件中的name后,再创建一个pod,同样可以创建成功
1 2 3 4 $ kubectl get pod node-affinity-pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES node-affinity-pod 1/1 Running 0 23s 172.1.1.100 node2 <none> <none> node-affinity-pod2 1/1 Running 0 14s 172.1.1.101 node2 <none> <none>
但如果我将node设置为node1
(node1是我的环境的master节点)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 apiVersion: v1 kind: Pod metadata: name: node-affinity-pod spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchFields: - key: metadata.name operator: In values: - node1 containers: - name: busybox image: busybox imagePullPolicy: IfNotPresent stdin: true tty: true
1 2 3 $ kubectl get pod node-affinity-pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES node-affinity-pod 0/1 Pending 0 96s <none> <none> <none> <none>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 $ kubectl describe pod node-affinity-pod Name: node-affinity-pod Namespace: default Priority: 0 Node: <none> Labels: <none> Annotations: Status: Pending ... QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 36s (x3 over 2m) default-scheduler 0/4 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 3 node(s) didn' t match node selector.
1 node(s) had taint {node-role.kubernetes.io/master: }
,由于master节点不允许普通pod调度上去,所以,pod处于pending状态。
污点
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 apiVersion: v1 kind: Pod metadata: name: toleration-pod spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchFields: - key: metadata.name operator: In values: - node1 tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule containers: - name: busybox image: busybox imagePullPolicy: IfNotPresent stdin: true tty: true
改造了上面的pod,增加了对node-role.kubernetes.io/master
的容忍
1 2 3 $ kubectl get pod toleration-pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES toleration-pod 1/1 Running 0 1s 172.1.0.89 node1 <none> <none>
现在,pod已经被调度到了node1上
这样,就解决了master node上不能被调度的问题。同样,课程中提到了unschedulable
的污点容忍。
DaemonSet 自动地给被管理的 Pod 加上了这个特殊的 Toleration,就使得这些 Pod 可以忽略这个限制,继而保证每个节点上都会被调度一个 Pod
DaemonSet
创建ds
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 apiVersion: apps/v1 kind: DaemonSet metadata: name: test-ds spec: selector: matchLabels: name: my-test template: metadata: labels: name: my-test spec: tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule containers: - name: my-test-busybox image: busybox imagePullPolicy: IfNotPresent stdin: true tty: true
创建了一个test-ds
的DaemonSet,在污点部分,容忍了master的污点。使其可以被调度在master节点上
查看结果
1 2 3 kubectl get ds NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE test -ds 4 4 4 4 4 <none> 3m7s
1 2 3 4 5 6 $ kubectl get pods -l name=my-test -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test -ds-5nxj9 1/1 Running 0 4m33s 172.1.1.103 node2 <none> <none>test -ds-bc9jx 1/1 Running 0 4m33s 172.1.2.54 bqi-k8s-node3 <none> <none>test -ds-kgxm5 1/1 Running 0 4m33s 172.1.3.9 k8s-node4 <none> <none>test -ds-wvhm2 1/1 Running 0 4m33s 172.1.0.90 node1 <none> <none>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 $ kubectl describe pod test -ds-kgxm5 Name: test -ds-kgxm5 Namespace: default Priority: 0 Node: k8s-node4/10.160.18.184 Start Time: Fri, 31 Jul 2020 11:50:29 +0800 Labels: controller-revision-hash=7cdb9f7c5c name=my-test pod-template-generation=1 Annotations: <none> Status: Running IP: 172.1.3.9 IPs: IP: 172.1.3.9 Controlled By: DaemonSet/test -ds ... QoS Class: BestEffort Node-Selectors: <none> Tolerations: node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/disk-pressure:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/not-ready:NoExecute node.kubernetes.io/pid-pressure:NoSchedule node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unschedulable:NoSchedule Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 4m55s default-scheduler Successfully assigned default/test -ds-kgxm5 to k8s-node4 Normal Pulling 4m54s kubelet, k8s-node4 Pulling image "busybox" Normal Pulled 4m53s kubelet, k8s-node4 Successfully pulled image "busybox" Normal Created 4m53s kubelet, k8s-node4 Created container my-test-busybox Normal Started 4m52s kubelet, k8s-node4 Started container my-test-busybox
可以看到,每个node上都创建了一个pod,并且Tolerations字段中,除了node-role.kubernetes.io/master:NoSchedule
,还自动增加了很多污点
kill一个pod
1 2 3 4 5 6 7 8 $ kubectl delete pod test -ds-kgxm5 pod "test-ds-kgxm5" deleted $ kubectl get pods -l name=my-test -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test -ds-5nxj9 1/1 Running 0 9m9s 172.1.1.103 node2 <none> <none>test -ds-bc9jx 1/1 Running 0 9m9s 172.1.2.54 bqi-k8s-node3 <none> <none>test -ds-dckg2 1/1 Running 0 5s 172.1.3.10 k8s-node4 <none> <none>test -ds-wvhm2 1/1 Running 0 9m9s 172.1.0.90 node1 <none> <none>
更新
1 2 3 4 5 6 7 8 9 10 11 12 $ kubectl get pods -l name=my-test -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test -ds-5nxj9 1/1 Running 0 5h27m 172.1.1.103 node2 <none> <none>test -ds-dckg2 1/1 Running 0 5h18m 172.1.3.10 k8s-node4 <none> <none>test -ds-sbsbq 0/1 ContainerCreating 0 23s <none> bqi-k8s-node3 <none> <none>test -ds-wvhm2 1/1 Running 0 5h27m 172.1.0.90 node1 <none> <none>$ kubectl get pods -l name=my-test -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test -ds-5nxj9 1/1 Running 0 5h28m 172.1.1.103 node2 <none> <none>test -ds-dckg2 1/1 Running 0 5h19m 172.1.3.10 k8s-node4 <none> <none>test -ds-sbsbq 0/1 ImagePullBackOff 0 42s 172.1.2.55 bqi-k8s-node3 <none> <none>test -ds-wvhm2 1/1 Running 0 5h28m 172.1.0.90 node1 <none> <none>
可以看到,DaemonSet的控制器会选择一个pod进行更新,当遇到更新失败时,将停止更新
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 $ kubectl describe pod test -ds-5nxj9 Name: test -ds-5nxj9 Namespace: default Priority: 0 Node: node2/10.160.18.181 Start Time: Fri, 31 Jul 2020 11:50:29 +0800 Labels: controller-revision-hash=7cdb9f7c5c name=my-test pod-template-generation=1 ... $ kubectl describe pod test -ds-dckg2 Name: test -ds-dckg2 Namespace: default Priority: 0 Node: k8s-node4/10.160.18.184 Start Time: Fri, 31 Jul 2020 11:59:33 +0800 Labels: controller-revision-hash=7cdb9f7c5c name=my-test pod-template-generation=1 ... $ kubectl describe pod test -ds-sbsbq Name: test -ds-sbsbq Namespace: default Priority: 0 Node: bqi-k8s-node3/10.160.18.183 Start Time: Fri, 31 Jul 2020 17:18:02 +0800 Labels: controller-revision-hash=6755d9c956 name=my-test pod-template-generation=2
也可以看到,labels中:
controller-revision-hash更新为一个新的
pod-template-generation更新为2
现在,修改镜像为一个可用的镜像
1 2 3 4 5 6 7 8 9 10 11 12 $ kubectl get pods -l name=my-test -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test -ds-dckg2 1/1 Terminating 0 5h25m 172.1.3.10 k8s-node4 <none> <none>test -ds-jd99w 1/1 Running 0 47s 172.1.2.56 bqi-k8s-node3 <none> <none>test -ds-nw5lk 1/1 Running 0 9s 172.1.1.104 node2 <none> <none>test -ds-wvhm2 1/1 Running 0 5h34m 172.1.0.90 node1 <none> <none>$ kubectl get pods -l name=my-test -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test -ds-72z5v 1/1 Running 0 40s 172.1.3.11 k8s-node4 <none> <none>test -ds-jd99w 1/1 Running 0 118s 172.1.2.56 bqi-k8s-node3 <none> <none>test -ds-nw5lk 1/1 Running 0 80s 172.1.1.104 node2 <none> <none>test -ds-wvhm2 1/1 Terminating 0 5h35m 172.1.0.90 node1 <none> <none>
可以看到,当更新成功后,对应的pod被逐个替换
1 2 3 4 5 6 7 8 9 $ kubectl describe pod test -ds-72z5v Name: test -ds-72z5v Namespace: default Priority: 0 Node: k8s-node4/10.160.18.184 Start Time: Fri, 31 Jul 2020 17:25:23 +0800 Labels: controller-revision-hash=86b8bf4df4 name=my-test pod-template-generation=3
更新后的pod:
controller-revision-hash被更新为一个新的
pod-template-generation也增加到了3
小结
DaemonSet分别采用了遍历node来创建pod以及toleration等措施,保证了DaemonSet对应的pod在每一个node上被创建。
通过 nodeAffinity 和 Toleration 这两个调度器的小功能,保证了每个节点上有且只有一个 Pod
同时,通过controller-revision进行版本管理