阿里云源
Ubuntu20.04
master + slave
安装docker
分别在两个node上安装docker-ce
1 2 3 4 5 6 $ apt-get update $ apt-get -y install apt-transport-https ca-certificates curl software-properties-common $ curl -fsSL http://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add - $ add-apt-repository "deb [arch=amd64] http://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable" $ apt update $ apt-get -y install docker-ce
安装kubeadm,kubectl,kubelet
分别在两个node上安装
1 2 3 4 5 6 7 $ apt-get update && apt-get install -y apt-transport-https $ curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add - $ cat <<EOF >/etc/apt/sources.list.d/kubernetes.list deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main EOF $ apt-get update $ apt-get install -y kubelet kubeadm kubectl
初始化master节点
1 $ kubeadm init --pod-network-cidr=172.172.0.0/16 --image-repository registry.aliyuncs.com/google_containers
安装成功后,会有如下信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME /.kube sudo cp -i /etc/kubernetes/admin.conf $HOME /.kube/config sudo chown $(id -u):$(id -g) $HOME /.kube/config You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ Then you can join any number of worker nodes by running the following on each as root: kubeadm join 10.160.18.180:6443 --token 5xxosi.3du1z15pevcvnyyx \ --discovery-token-ca-cert-hash sha256:4cc4977482e04ac0ca845bf3520a6a5fa8a0cf6ac8233e734a47e0250c259f73
根据提示,执行
1 2 3 $ mkdir -p $HOME /.kube $ sudo cp -i /etc/kubernetes/admin.conf $HOME /.kube/config $ sudo chown $(id -u):$(id -g) $HOME /.kube/config
安装flannel
参考: https://github.com/coreos/flannel
1 $ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
安装dashboard
参考:https://github.com/kubernetes/dashboard
1 $ kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0/aio/deploy/recommended.yaml
修改dashboard配置
修改spec
中的type
为NodePort
1 2 3 4 5 6 7 8 9 10 11 12 13 14 $ kubectl -n kubernetes-dashboard edit service kubernetes-dashboard ....... spec: clusterIP: 10.101.212.193 externalTrafficPolicy: Cluster ports: - nodePort: 32609 port: 443 protocol: TCP targetPort: 8443 selector: k8s-app: kubernetes-dashboard sessionAffinity: None type : NodePort
修改成功后,查看port信息
1 2 3 $ kubectl -n kubernetes-dashboard get service kubernetes-dashboard NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes-dashboard NodePort 10.101.212.193 <none> 443:32609/TCP 27m
现在,可以通过https://<master-ip>:<NodePort>
(这里的port是32609)来访问dashboard了
虽然页面已经展示出来了,但需要使用token或Kubeconfig才能访问
创建sample-user
创建服务账号
新建dashboard-adminuser.yaml
并写入:
1 2 3 4 5 apiVersion: v1 kind: ServiceAccount metadata: name: admin-user namespace: kubernetes-dashboard
执行:
1 $ kubectl apply -f dashboard-adminuser.yaml
创建ClusterRoleBinding
新建cluster-role-binding.yaml
并写入:
1 2 3 4 5 6 7 8 9 10 11 12 apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: admin-user roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: admin-user namespace: kubernetes-dashboard
执行:
1 $ kubetcl apply -f cluster-role-binding.yaml
获取token
1 2 3 4 5 6 7 8 9 10 11 12 13 14 $ kubectl -n kubernetes-dashboard describe secret $(kubectl -n kubernetes-dashboard get secret | grep admin-user | awk '{print $1}' ) Name: admin-user-token-jmggp Namespace: kubernetes-dashboard Labels: <none> Annotations: kubernetes.io/service-account.name: admin-user kubernetes.io/service-account.uid: 58210c16-0fac-438c-8867-d0a3e7b950b9 Type: kubernetes.io/service-account-token Data ==== ca.crt: 1025 bytes namespace: 20 bytes token: eyJhbGciOiJSUzI1NiIsImtpZCI6IlhTSnlXMUhXTlNnUmd4MlVMTzdtbm14YVdiSzNUdjk4UnVoZ3RRbUFXZGsifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJhZG1pbi11c2VyLXRva2VuLWptZ2dwIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImFkbWluLXVzZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI1ODIxMGMxNi0wZmFjLTQzOGMtODg2Ny1kMGEzZTdiOTUwYjkiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZXJuZXRlcy1kYXNoYm9hcmQ6YWRtaW4tdXNlciJ9.F4TKNO_6Guu-vcLUtELUOhRI2dGMcZ3V1et2evono_a6f-TvCR9c4pbyYCnRdCG6_MumTmyE5W1g3zHioVnb5TgnGwfmAfIWLltwwLEOxOdLfO7oqM8zrYfzZnIH16SoOZQYMU7xIk5MhE5WN265n8Q2kpDMraf0L06_nqNy1pq8h9eaX0QIntosl4fmf9KVew0geLCKbknEwpnzGGfSCcKLLgE7a45ACWwStJiL29t69gcKJ6ze33MXpA5_irk2nKkavXbKEk7ejapgYK66nOxJnDKgbNVDcBP47xHrPjGeeupB6bw6uUMWxA6z4kJUTVRepk6yTMGVDPzB9Muicw
现在,可以使用token登录Dashboard了
slave节点加入集群
1 2 $ kubeadm join 10.160.18.180:6443 --token 5xxosi.3du1z15pevcvnyyx \ --discovery-token-ca-cert-hash sha256:4cc4977482e04ac0ca845bf3520a6a5fa8a0cf6ac8233e734a47e0250c259f73
问题及解决方案
docker cgroup driver问题
问题日志
1 [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd" . Please follow the guide at https://kubernetes.io/docs/setup/cri/
解决方法
在/etc/docker/
下创建daemon.json
1 2 3 4 5 cat > /etc/docker/daemon.json <<EOF { "exec-opts" : ["native.cgroupdriver=systemd" ] } EOF
重启docker进程
1 2 $ systemctl restart docker $ systemctl status docker
swap问题
问题日志
1 [ERROR Swap]: running with swap on is not supported. Please disable swap
解决方法
但这只是暂时关闭了swap,重启node后,就会再次打开。需要修改/etc/fstab
,在swap那行加上#
Node处于NotReady状态
node处于NotReady状态的原因有很多。可以一步一步处理
1 2 3 4 $ kubectl get nodes NAME STATUS ROLES AGE VERSION node1 Ready master 4h56m v1.18.2 node2 NotReady <none> 4h6m v1.18.2
先查看错误原因:
1 2 3 4 5 6 7 8 9 10 11 12 $ kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE coredns-7ff77c879f-2k7rw 1/1 Running 1 4h47m coredns-7ff77c879f-q76jr 1/1 Running 1 4h47m etcd-node1 1/1 Running 2 4h47m kube-apiserver-node1 1/1 Running 2 4h47m kube-controller-manager-node1 1/1 Running 2 4h47m kube-flannel-ds-amd64-2jn8n 0/1 Init:ImagePullBackOff 0 3h49m kube-flannel-ds-amd64-ftpxl 1/1 Running 1 3h49m kube-proxy-5q8wp 1/1 Running 2 4h47m kube-proxy-wfcjq 0/1 ContainerCreating 0 5m46s kube-scheduler-node1 1/1 Running 2 4h47m
k8s有些服务会在各个节点上启动,比如这里的proxy,flannel。
1 2 3 4 5 6 7 8 9 10 11 $ kubectl describe pod -n kube-system kube-flannel-ds-amd64-2jn8n ..... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulling 6m51s kubelet, node2 Pulling image "quay.io/coreos/flannel:v0.12.0-amd64" Warning Failed 5m48s kubelet, node2 Failed to pull image "quay.io/coreos/flannel:v0.12.0-amd64" : rpc error: code = Unknown desc = Error response from daemon: Get https://quay.io/v2/coreos/flannel/manifests/v0.12.0-amd64: received unexpected HTTP status: 500 Internal Server Error Warning Failed 5m48s kubelet, node2 Error: ErrImagePull Normal BackOff 5m47s kubelet, node2 Back-off pulling image "quay.io/coreos/flannel:v0.12.0-amd64" Warning Failed 5m47s kubelet, node2 Error: ImagePullBackOff Normal Pulling 5m36s (x2 over 5m51s) kubelet, node2 Pulling image "quay.io/coreos/flannel:v0.12.0-amd64"
最常见的是ImagePull失败。比如master node上镜像拉取正常,而其他节点拉取失败。
解决方法
1. 在slave节点上手工拉取镜像
1 $ docker pull quay.io/coreos/flannel:v0.12.0-amd64
2. 将master节点上的镜像导入slave节点
1 2 3 4 5 6 7 8 9 10 11 12 (master)$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE kubernetesui/dashboard v2.0.0 8b32422733b3 3 weeks ago 222MB registry.aliyuncs.com/google_containers/kube-proxy v1.18.2 0d40868643c6 4 weeks ago 117MB registry.aliyuncs.com/google_containers/kube-scheduler v1.18.2 a3099161e137 4 weeks ago 95.3MB registry.aliyuncs.com/google_containers/kube-apiserver v1.18.2 6ed75ad404bd 4 weeks ago 173MB registry.aliyuncs.com/google_containers/kube-controller-manager v1.18.2 ace0a8c17ba9 4 weeks ago 162MB kubernetesui/metrics-scraper v1.0.4 86262685d9ab 7 weeks ago 36.9MB quay.io/coreos/flannel v0.12.0-amd64 4e9f801d2217 2 months ago 52.8MB registry.aliyuncs.com/google_containers/pause 3.2 80d28bedfe5d 3 months ago 683kB registry.aliyuncs.com/google_containers/coredns 1.6.7 67da37a9a360 3 months ago 43.8MB registry.aliyuncs.com/google_containers/etcd 3.4.3-0 303ce5db0e90 6 months ago 288MB
1 (master)$ docker save quay.io/coreos/flannel > flannel.tar
将文件传输到slave节点上
slave节点上导入镜像
1 2 3 4 5 6 7 (slave)$ docker load < flannel.tar 256a7af3acb1: Loading layer [==================================================>] 5.844MB/5.844MB d572e5d9d39b: Loading layer [==================================================>] 10.37MB/10.37MB 57c10be5852f: Loading layer [==================================================>] 2.249MB/2.249MB 7412f8eefb77: Loading layer [==================================================>] 35.26MB/35.26MB 05116c9ff7bf: Loading layer [==================================================>] 5.12kB/5.12kB Loaded image: quay.io/coreos/flannel:v0.12.0-amd64
Now, all is OK