基于kubeadm搭建k8s高可用集群

您所在的位置：网站首页 › kubeadm搭建高可用 › 基于kubeadm搭建k8s高可用集群

基于kubeadm搭建k8s高可用集群

2023-08-09 21:20| 来源: 网络整理| 查看: 265

实践环境准备服务器说明

我这里使用的是五台CentOS-7.7的虚拟机，具体信息如下表：

系统版本

IP地址

节点角色

CPU

Memory

Hostname

CentOS-7.7

192.168.243.138

master

>=2

>=2G

CentOS-7.7

192.168.243.136

master

>=2

>=2G

CentOS-7.7

192.168.243.141

master

>=2

>=2G

CentOS-7.7

192.168.243.139

worker

>=2

>=2G

CentOS-7.7

192.168.243.140

worker

>=2

>=2G

这五台机器均需事先安装好Docker，由于安装过程比较简单这里不进行介绍，可以参考官方文档：

https://docs.docker.com/engine/install/centos/系统设置（所有节点）

1、主机名必须每个节点都不一样，并且保证所有点之间可以通过hostname互相访问。设置hostname：

# 查看主机名 $ hostname # 修改主机名 $ hostnamectl set-hostname

配置host，使所有节点之间可以通过hostname互相访问：

$ vim /etc/hosts 192.168.243.138 m1 192.168.243.136 m2 192.168.243.141 m3 192.168.243.139 s1 192.168.243.140 s2

2、安装依赖包：

# 更新yum $ yum update # 安装依赖包 $ yum install -y conntrack ipvsadm ipset jq sysstat curl iptables libseccomp

3、关闭防火墙、swap，重置iptables：

# 关闭防火墙 $ systemctl stop firewalld && systemctl disable firewalld # 重置iptables $ iptables -F && iptables -X && iptables -F -t nat && iptables -X -t nat && iptables -P FORWARD ACCEPT # 关闭swap $ swapoff -a $ sed -i '/swap/s/^$.*$$/#\1/g' /etc/fstab # 关闭selinux $ setenforce 0 # 关闭dnsmasq(否则可能导致docker容器无法解析域名) $ service dnsmasq stop && systemctl disable dnsmasq # 重启docker服务 $ systemctl restart docker

4、系统参数设置：

# 制作配置文件 $ cat > /etc/sysctl.d/kubernetes.conf > $HOME/.bash_profile [root@m1 ~]# source $HOME/.bash_profile高可用集群部署部署keepalived - apiserver高可用（任选两个master节点）

1、在两个主节点上执行如下命令安装keepalived（一主一备），我这里选择在m1和m2节点上进行安装：

$ yum install -y keepalived

2、分别在两台机器上创建keepalived配置文件的存放目录：

$ mkdir -p /etc/keepalived

3、在m1（角色为master）上创建配置文件如下：

[root@m1 ~]# vim /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { router_id keepalive-master } vrrp_script check_apiserver { # 检测脚本路径 script "/etc/keepalived/check-apiserver.sh" # 多少秒检测一次 interval 3 # 失败的话权重-2 weight -2 } vrrp_instance VI-kube-master { state MASTER # 定义节点角色 interface ens32 # 网卡名称 virtual_router_id 68 priority 100 dont_track_primary advert_int 3 virtual_ipaddress { # 自定义虚拟ip 192.168.243.100 } track_script { check_apiserver } }

4、在m2（角色为backup）上创建配置文件如下：

[root@m2 ~]# vim /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { router_id keepalive-backup } vrrp_script check_apiserver { script "/etc/keepalived/check-apiserver.sh" interval 3 weight -2 } vrrp_instance VI-kube-master { state BACKUP interface ens32 virtual_router_id 68 priority 99 dont_track_primary advert_int 3 virtual_ipaddress { 192.168.243.100 } track_script { check_apiserver } }

5、分别在m1和m2节点上创建keepalived的检测脚本，这个脚本比较简单，可以自行根据需求去完善：

$ vim /etc/keepalived/check-apiserver.sh #!/bin/sh netstat -ntlp |grep 6443 || exit 1

6、完成上述步骤后，启动keepalived：

# 分别在master和backup上启动keepalived服务 $ systemctl enable keepalived && service keepalived start # 检查状态 $ service keepalived status # 查看日志 $ journalctl -f -u keepalived # 查看虚拟ip $ ip a部署第一个k8s主节点

使用kubeadm创建的k8s集群，大部分组件都是以docker容器的方式去运行的，所以kubeadm在初始化master节点的时候需要拉取相应的组件镜像。但是kubeadm默认是从Google的k8s.gcr.io上拉取镜像，因此在国内自然是无法成功拉取到所需的镜像。

要解决这种情况要么***，要么手动拉取国内与之对应的镜像到本地然后改下tag。我这里选择后者，首先查看kubeadm需要拉取的镜像列表：

[root@m1 ~]# kubeadm config images list W0830 19:17:13.056761 81487 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io] k8s.gcr.io/kube-apiserver:v1.19.0 k8s.gcr.io/kube-controller-manager:v1.19.0 k8s.gcr.io/kube-scheduler:v1.19.0 k8s.gcr.io/kube-proxy:v1.19.0 k8s.gcr.io/pause:3.2 k8s.gcr.io/etcd:3.4.9-1 k8s.gcr.io/coredns:1.7.0 [root@m1 ~]#

我这里是从阿里云的容器镜像仓库去拉取，但是有个问题就是版本号可能会与kubeadm中定义的对不上，这就需要我们自行到镜像仓库查询确认：

https://cr.console.aliyun.com/cn-hangzhou/instances/images

例如，我这里kubeadm列出的版本号是v1.19.0，但阿里云镜像仓库上却是v1.19.0-rc.1。找到对应的版本号后，为了避免重复的工作，我这里就写了个shell脚本去完成镜像的拉取及修改tag：

[root@m1 ~]# vim pullk8s.sh #!/bin/bash ALIYUN_KUBE_VERSION=v1.19.0-rc.1 KUBE_VERSION=v1.19.0 KUBE_PAUSE_VERSION=3.2 ETCD_VERSION=3.4.9-1 DNS_VERSION=1.7.0 username=registry.cn-hangzhou.aliyuncs.com/google_containers images=( kube-proxy-amd64:${ALIYUN_KUBE_VERSION} kube-scheduler-amd64:${ALIYUN_KUBE_VERSION} kube-controller-manager-amd64:${ALIYUN_KUBE_VERSION} kube-apiserver-amd64:${ALIYUN_KUBE_VERSION} pause:${KUBE_PAUSE_VERSION} etcd-amd64:${ETCD_VERSION} coredns:${DNS_VERSION} ) for image in ${images[@]} do docker pull ${username}/${image} # 此处需删除“-amd64”，否则kuadm还是无法识别本地镜像 new_image=`echo $image|sed 's/-amd64//g'` if [[ $new_image == *$ALIYUN_KUBE_VERSION* ]] then new_kube_image=`echo $new_image|sed "s/$ALIYUN_KUBE_VERSION//g"` docker tag ${username}/${image} k8s.gcr.io/${new_kube_image}$KUBE_VERSION else docker tag ${username}/${image} k8s.gcr.io/${new_image} fi docker rmi ${username}/${image} done [root@m1 ~]# sh pullk8s.sh

脚本执行完后，此时查看Docker镜像列表应如下：

[root@m1 ~]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE k8s.gcr.io/kube-proxy v1.19.0 b2d80fe68e4f 6 weeks ago 120MB k8s.gcr.io/kube-controller-manager v1.19.0 a7cd7b6717e8 6 weeks ago 116MB k8s.gcr.io/kube-apiserver v1.19.0 1861e5423d80 6 weeks ago 126MB k8s.gcr.io/kube-scheduler v1.19.0 6d4fe43fdd0d 6 weeks ago 48.4MB k8s.gcr.io/etcd 3.4.9-1 d4ca8726196c 2 months ago 253MB k8s.gcr.io/coredns 1.7.0 bfe3a36ebd25 2 months ago 45.2MB k8s.gcr.io/pause 3.2 80d28bedfe5d 6 months ago 683kB [root@m1 ~]#

创建kubeadm用于初始化master节点的配置文件：

[root@m1 ~]# vim kubeadm-config.yaml apiVersion: kubeadm.k8s.io/v1beta2 kind: ClusterConfiguration kubernetesVersion: v1.19.0 # 指定控制面板的访问端点，这里的ip为keepalived的虚拟ip controlPlaneEndpoint: "192.168.243.100:6443" networking: # This CIDR is a Calico default. Substitute or remove for your CNI provider. podSubnet: "172.22.0.0/16" # 指定pod所使用的网段

然后执行如下命令进行初始化：

[root@m1 ~]# kubeadm init --config=kubeadm-config.yaml --upload-certs W0830 20:05:29.447773 88394 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io] [init] Using Kubernetes version: v1.19.0 [preflight] Running pre-flight checks [WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service' [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "ca" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local m1] and IPs [10.96.0.1 192.168.243.138 192.168.243.100] [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "front-proxy-ca" certificate and key [certs] Generating "front-proxy-client" certificate and key [certs] Generating "etcd/ca" certificate and key [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [localhost m1] and IPs [192.168.243.138 127.0.0.1 ::1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [localhost m1] and IPs [192.168.243.138 127.0.0.1 ::1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "sa" key and public key [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "kubelet.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Starting the kubelet [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests" [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s [kubelet-check] Initial timeout of 40s passed. [apiclient] All control plane components are healthy after 173.517640 seconds [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace [kubelet] Creating a ConfigMap "kubelet-config-1.19" in namespace kube-system with the configuration for the kubelets in the cluster [upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace [upload-certs] Using certificate key: a455fb8227dd15882b57b11f3587187181b972d95524bb3ef43e78f76360121e [mark-control-plane] Marking the node m1 as control-plane by adding the label "node-role.kubernetes.io/master=''" [mark-control-plane] Marking the node m1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule] [bootstrap-token] Using token: 5l7pv5.5iiq4atzlazq0b7x [bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials [bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token [bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster [bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace [kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key [addons] Applied essential addon: CoreDNS [addons] Applied essential addon: kube-proxy Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ You can now join any number of the control-plane node running the following command on each as root: kubeadm join 192.168.243.100:6443 --token 5l7pv5.5iiq4atzlazq0b7x \ --discovery-token-ca-cert-hash sha256:0fdc9947984a1c655861349dbd251d581bd6ec336c1ab8d9013cf302412b2140 \ --control-plane --certificate-key a455fb8227dd15882b57b11f3587187181b972d95524bb3ef43e78f76360121e Please note that the certificate-key gives access to cluster sensitive data, keep it secret! As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use "kubeadm init phase upload-certs --upload-certs" to reload certs afterward. Then you can join any number of worker nodes by running the following on each as root: kubeadm join 192.168.243.100:6443 --token 5l7pv5.5iiq4atzlazq0b7x \ --discovery-token-ca-cert-hash sha256:0fdc9947984a1c655861349dbd251d581bd6ec336c1ab8d9013cf302412b2140 [root@m1 ~]# 拷贝一下这里打印出来的两条kubeadm join命令，后面添加其他master节点以及worker节点时需要用到

然后在master节点上执行如下命令拷贝配置文件：

[root@m1 ~]# mkdir -p $HOME/.kube [root@m1 ~]# cp -i /etc/kubernetes/admin.conf $HOME/.kube/config [root@m1 ~]# chown $(id -u):$(id -g) $HOME/.kube/config

查看当前的Pod信息：

[root@m1 ~]# kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-f9fd979d6-kg4lf 0/1 Pending 0 9m9s kube-system coredns-f9fd979d6-t8xzj 0/1 Pending 0 9m9s kube-system etcd-m1 1/1 Running 0 9m22s kube-system kube-apiserver-m1 1/1 Running 1 9m22s kube-system kube-controller-manager-m1 1/1 Running 1 9m22s kube-system kube-proxy-rjgnw 1/1 Running 0 9m9s kube-system kube-scheduler-m1 1/1 Running 1 9m22s [root@m1 ~]#

使用curl命令请求一下健康检查接口，返回ok代表没问题：

[root@m1 ~]# curl -k https://192.168.243.100:6443/healthz ok [root@m1 ~]# 部署网络插件 - calico

创建配置文件存放目录：

[root@m1 ~]# mkdir -p /etc/kubernetes/addons

在该目录下创建calico-rbac-kdd.yaml配置文件：

[root@m1 ~]# vi /etc/kubernetes/addons/calico-rbac-kdd.yaml # Calico Version v3.1.3 # https://docs.projectcalico.org/v3.1/releases#v3.1.3 kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: calico-node rules: - apiGroups: [""] resources: - namespaces verbs: - get - list - watch - apiGroups: [""] resources: - pods/status verbs: - update - apiGroups: [""] resources: - pods verbs: - get - list - watch - patch - apiGroups: [""] resources: - services verbs: - get - apiGroups: [""] resources: - endpoints verbs: - get - apiGroups: [""] resources: - nodes verbs: - get - list - update - watch - apiGroups: ["extensions"] resources: - networkpolicies verbs: - get - list - watch - apiGroups: ["networking.k8s.io"] resources: - networkpolicies verbs: - watch - list - apiGroups: ["crd.projectcalico.org"] resources: - globalfelixconfigs - felixconfigurations - bgppeers - globalbgpconfigs - bgpconfigurations - ippools - globalnetworkpolicies - globalnetworksets - networkpolicies - clusterinformations - hostendpoints verbs: - create - get - list - update - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: calico-node roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: calico-node subjects: - kind: ServiceAccount name: calico-node namespace: kube-system

然后分别执行如下命令完成calico的安装：

[root@m1 ~]# kubectl apply -f /etc/kubernetes/addons/calico-rbac-kdd.yaml [root@m1 ~]# kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

查看状态：

[root@m1 ~]# kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system calico-kube-controllers-5bc4fc6f5f-pdjls 1/1 Running 0 2m47s kube-system calico-node-tkdmv 1/1 Running 0 2m47s kube-system coredns-f9fd979d6-kg4lf 1/1 Running 0 23h kube-system coredns-f9fd979d6-t8xzj 1/1 Running 0 23h kube-system etcd-m1 1/1 Running 1 23h kube-system kube-apiserver-m1 1/1 Running 2 23h kube-system kube-controller-manager-m1 1/1 Running 2 23h kube-system kube-proxy-rjgnw 1/1 Running 1 23h kube-system kube-scheduler-m1 1/1 Running 2 23h [root@m1 ~]# 将其它master节点加入集群

使用之前保存的kubeadm join命令加入集群，但是要注意master和worker的join命令是不同的不要搞错了。分别在m2和m3上执行：

$ kubeadm join 192.168.243.100:6443 --token 5l7pv5.5iiq4atzlazq0b7x \ --discovery-token-ca-cert-hash sha256:0fdc9947984a1c655861349dbd251d581bd6ec336c1ab8d9013cf302412b2140 \ --control-plane --certificate-key a455fb8227dd15882b57b11f3587187181b972d95524bb3ef43e78f76360121eTips：master节点的join命令包含--control-plane --certificate-key参数

然后等待一会，该命令执行成功会输出如下内容：

[preflight] Running pre-flight checks [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' [preflight] Running pre-flight checks before initializing the new control plane instance [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' [download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local m3] and IPs [10.96.0.1 192.168.243.141 192.168.243.100] [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "front-proxy-client" certificate and key [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [localhost m3] and IPs [192.168.243.141 127.0.0.1 ::1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [localhost m3] and IPs [192.168.243.141 127.0.0.1 ::1] [certs] Valid certificates and keys now exist in "/etc/kubernetes/pki" [certs] Using the existing "sa" key [kubeconfig] Generating kubeconfig files [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [check-etcd] Checking that the etcd cluster is healthy [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Starting the kubelet [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... [etcd] Announced new etcd member joining to the existing etcd cluster [etcd] Creating static Pod manifest for "etcd" [etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace [mark-control-plane] Marking the node m3 as control-plane by adding the label "node-role.kubernetes.io/master=''" [mark-control-plane] Marking the node m3 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule] This node has joined the cluster and a new control plane instance was created: * Certificate signing request was sent to apiserver and approval was received. * The Kubelet was informed of the new secure connection details. * Control plane (master) label and taint were applied to the new node. * The Kubernetes control plane instances scaled up. * A new etcd member was added to the local/stacked etcd cluster. To start administering your cluster from this node, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config Run 'kubectl get nodes' to see this node join the cluster.

然后按照提示完成kubectl配置文件的拷贝：

$ mkdir -p $HOME/.kube $ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config $ sudo chown $(id -u):$(id -g) $HOME/.kube/config

并且此时6443端口应该是被监听的：

[root@m2 ~]# netstat -lntp |grep 6443 tcp6 0 0 :::6443 :::* LISTEN 31910/kube-apiserve [root@m2 ~]#

但join命令执行成功不一定代表就加入集群成功，此时需要回到m1节点上去查看节点是否为Ready状态：

[root@m1 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION m1 Ready master 24h v1.19.0 m2 NotReady master 3m47s v1.19.0 m3 NotReady master 3m31s v1.19.0 [root@m1 ~]#

可以看到m2和m3都是NotReady状态，代表没有成功加入到集群。于是我使用如下命令查看日志：

$ journalctl -f

发现是万恶的网络问题（墙）导致无法成功拉取pause镜像：

8月 31 20:09:11 m2 kubelet[10122]: W0831 20:09:11.713935 10122 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d 8月 31 20:09:12 m2 kubelet[10122]: E0831 20:09:12.442430 10122 kubelet.go:2103] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized 8月 31 20:09:17 m2 kubelet[10122]: E0831 20:09:17.657880 10122 kuberuntime_manager.go:730] createPodSandbox for pod "calico-node-jksvg_kube-system(5b76b6d7-0bd9-4454-a674-2d2fa4f6f35e)" failed: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.2": Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

于是在m2和m3上执行如下命令拷贝m1上之前用于拉取国内镜像的脚本并执行：

$ scp -r m1:/root/pullk8s.sh /root/pullk8s.sh $ sh /root/pullk8s.sh

执行完成并等待几分钟后，回到m1节点再次查看nodes信息，这次就都是Ready状态了：

[root@m1 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION m1 Ready master 24h v1.19.0 m2 Ready master 14m v1.19.0 m3 Ready master 13m v1.19.0 [root@m1 ~]# 将worker节点加入集群

与上一小节的步骤基本是相同的，只不过是在s1和s2节点上执行而已，kubeadm join命令不要搞错了就行，所以这里简略带过：

# 使用之前保存的join命令加入集群 $ kubeadm join 192.168.243.100:6443 --token 5l7pv5.5iiq4atzlazq0b7x \ --discovery-token-ca-cert-hash sha256:0fdc9947984a1c655861349dbd251d581bd6ec336c1ab8d9013cf302412b2140 # 耐心等待一会，可以观察下日志 $ journalctl -f

成功将所有的worker节点加入集群后，至此我们就完成了k8s高可用集群的搭建。此时集群的node信息如下：

[root@m1 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION m1 Ready master 24h v1.19.0 m2 Ready master 60m v1.19.0 m3 Ready master 60m v1.19.0 s1 Ready 9m45s v1.19.0 s2 Ready 119s v1.19.0 [root@m1 ~]#

pod信息如下：

[root@m1 ~]# kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system calico-kube-controllers-5bc4fc6f5f-pdjls 1/1 Running 0 73m kube-system calico-node-8m8lz 1/1 Running 0 9m43s kube-system calico-node-99xps 1/1 Running 0 60m kube-system calico-node-f48zw 1/1 Running 0 117s kube-system calico-node-jksvg 1/1 Running 0 60m kube-system calico-node-tkdmv 1/1 Running 0 73m kube-system coredns-f9fd979d6-kg4lf 1/1 Running 0 24h kube-system coredns-f9fd979d6-t8xzj 1/1 Running 0 24h kube-system etcd-m1 1/1 Running 1 24h kube-system kube-apiserver-m1 1/1 Running 2 24h kube-system kube-controller-manager-m1 1/1 Running 2 24h kube-system kube-proxy-22h6p 1/1 Running 0 9m43s kube-system kube-proxy-khskm 1/1 Running 0 60m kube-system kube-proxy-pkrgm 1/1 Running 0 60m kube-system kube-proxy-rjgnw 1/1 Running 1 24h kube-system kube-proxy-t4pxl 1/1 Running 0 117s kube-system kube-scheduler-m1 1/1 Running 2 24h [root@m1 ~]# 集群可用性测试创建nginx ds

在m1节点上创建nginx-ds.yml配置文件，内容如下：

apiVersion: v1 kind: Service metadata: name: nginx-ds labels: app: nginx-ds spec: type: NodePort selector: app: nginx-ds ports: - name: http port: 80 targetPort: 80 --- apiVersion: apps/v1 kind: DaemonSet metadata: name: nginx-ds labels: addonmanager.kubernetes.io/mode: Reconcile spec: selector: matchLabels: app: nginx-ds template: metadata: labels: app: nginx-ds spec: containers: - name: my-nginx image: nginx:1.7.9 ports: - containerPort: 80

然后执行如下命令创建nginx ds：

[root@m1 ~]# kubectl create -f nginx-ds.yml service/nginx-ds created daemonset.apps/nginx-ds created [root@m1 ~]# 检查各种ip连通性

稍等一会后，检查Pod状态是否正常：

[root@m1 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-ds-6nnpm 1/1 Running 0 2m32s 172.22.152.193 s1 nginx-ds-bvpqj 1/1 Running 0 2m32s 172.22.78.129 s2 [root@m1 ~]#

在每个节点上去尝试ping Pod IP：

[root@s1 ~]# ping 172.22.152.193 PING 172.22.152.193 (172.22.152.193) 56(84) bytes of data. 64 bytes from 172.22.152.193: icmp_seq=1 ttl=63 time=0.269 ms 64 bytes from 172.22.152.193: icmp_seq=2 ttl=63 time=0.240 ms 64 bytes from 172.22.152.193: icmp_seq=3 ttl=63 time=0.228 ms 64 bytes from 172.22.152.193: icmp_seq=4 ttl=63 time=0.229 ms ^C --- 172.22.152.193 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 2999ms rtt min/avg/max/mdev = 0.228/0.241/0.269/0.022 ms [root@s1 ~]#

然后检查Service的状态：

[root@m1 ~]# kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 443/TCP 2d1h nginx-ds NodePort 10.105.139.228 80:31145/TCP 3m21s [root@m1 ~]#

在每个节点上尝试下访问该服务，能正常访问代表Service的IP也是通的：

[root@m1 ~]# curl 10.105.139.228:80 Welcome to nginx! body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } Welcome to nginx!

If you see this page, the nginx web server is successfully installed and working. Further configuration is required.

For online documentation and support please refer to nginx.org. Commercial support is available at nginx.com.

Thank you for using nginx.

[root@m1 ~]#

然后在每个节点检查NodePort的可用性，nginx-ds的NodePort为31145。如下能正常访问代表NodePort也是正常的：

[root@m3 ~]# curl 192.168.243.140:31145 Welcome to nginx! body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } Welcome to nginx!

If you see this page, the nginx web server is successfully installed and working. Further configuration is required.

For online documentation and support please refer to nginx.org. Commercial support is available at nginx.com.

Thank you for using nginx.

[root@m3 ~]# 检查dns可用性

需要创建一个Nginx Pod，首先定义一个pod-nginx.yaml配置文件，内容如下：

apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: nginx:1.7.9 ports: - containerPort: 80

然后基于该配置去创建Pod：

[root@m1 ~]# kubectl create -f pod-nginx.yaml pod/nginx created [root@m1 ~]#

使用如下命令进入到Pod里：

[root@m1 ~]# kubectl exec nginx -i -t -- /bin/bash

查看dns配置：

root@nginx:/# cat /etc/resolv.conf nameserver 10.96.0.10 search default.svc.cluster.local svc.cluster.local cluster.local localdomain options ndots:5 root@nginx:/#

接着测试是否可以正确解析Service的名称。如下能根据nginx-ds这个名称解析出对应的IP：10.105.139.228，代表dns也是正常的：

root@nginx:/# ping nginx-ds PING nginx-ds.default.svc.cluster.local (10.105.139.228): 48 data bytes高可用测试

到m1节点上执行如下命令将其关机：

[root@m1 ~]# init 0

然后查看虚拟IP是否成功漂移到了m2节点上：

[root@m2 ~]# ip a |grep 192.168.243.100 inet 192.168.243.100/32 scope global ens32 [root@m2 ~]#

接着测试能否在m2或m3节点上使用kubectl与集群进行交互，能正常交互则代表集群具备了一定程度的高可用性：

[root@m2 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION m1 NotReady master 3d v1.19.0 m2 Ready master 16m v1.19.0 m3 Ready master 13m v1.19.0 s1 Ready 2d v1.19.0 s2 Ready 47h v1.19.0 [root@m2 ~]# 部署dashboard

dashboard是k8s提供的一个可视化操作界面，用于简化我们对集群的操作和管理，在界面上我们可以很方便的查看各种信息、操作Pod、Service等资源，以及创建新的资源等。dashboard的仓库地址如下，

https://github.com/kubernetes/dashboard

dashboard的部署也比较简单，首先定义dashboard-all.yaml配置文件，内容如下：

apiVersion: v1 kind: Namespace metadata: name: kubernetes-dashboard --- apiVersion: v1 kind: ServiceAccount metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard namespace: kubernetes-dashboard --- kind: Service apiVersion: v1 metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard namespace: kubernetes-dashboard spec: ports: - port: 443 targetPort: 8443 nodePort: 30005 type: NodePort selector: k8s-app: kubernetes-dashboard --- apiVersion: v1 kind: Secret metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard-certs namespace: kubernetes-dashboard type: Opaque --- apiVersion: v1 kind: Secret metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard-csrf namespace: kubernetes-dashboard type: Opaque data: csrf: "" --- apiVersion: v1 kind: Secret metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard-key-holder namespace: kubernetes-dashboard type: Opaque --- kind: ConfigMap apiVersion: v1 metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard-settings namespace: kubernetes-dashboard --- kind: Role apiVersion: rbac.authorization.k8s.io/v1 metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard namespace: kubernetes-dashboard rules: # Allow Dashboard to get, update and delete Dashboard exclusive secrets. - apiGroups: [""] resources: ["secrets"] resourceNames: ["kubernetes-dashboard-key-holder", "kubernetes-dashboard-certs", "kubernetes-dashboard-csrf"] verbs: ["get", "update", "delete"] # Allow Dashboard to get and update 'kubernetes-dashboard-settings' config map. - apiGroups: [""] resources: ["configmaps"] resourceNames: ["kubernetes-dashboard-settings"] verbs: ["get", "update"] # Allow Dashboard to get metrics. - apiGroups: [""] resources: ["services"] resourceNames: ["heapster", "dashboard-metrics-scraper"] verbs: ["proxy"] - apiGroups: [""] resources: ["services/proxy"] resourceNames: ["heapster", "http:heapster:", "https:heapster:", "dashboard-metrics-scraper", "http:dashboard-metrics-scraper"] verbs: ["get"] --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard rules: # Allow Metrics Scraper to get metrics from the Metrics server - apiGroups: ["metrics.k8s.io"] resources: ["pods", "nodes"] verbs: ["get", "list", "watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard namespace: kubernetes-dashboard roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: kubernetes-dashboard subjects: - kind: ServiceAccount name: kubernetes-dashboard namespace: kubernetes-dashboard --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: kubernetes-dashboard roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: kubernetes-dashboard subjects: - kind: ServiceAccount name: kubernetes-dashboard namespace: kubernetes-dashboard --- kind: Deployment apiVersion: apps/v1 metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard namespace: kubernetes-dashboard spec: replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: k8s-app: kubernetes-dashboard template: metadata: labels: k8s-app: kubernetes-dashboard spec: containers: - name: kubernetes-dashboard image: kubernetesui/dashboard:v2.0.3 imagePullPolicy: Always ports: - containerPort: 8443 protocol: TCP args: - --auto-generate-certificates - --namespace=kubernetes-dashboard # Uncomment the following line to manually specify Kubernetes API server Host # If not specified, Dashboard will attempt to auto discover the API server and connect # to it. Uncomment only if the default does not work. # - --apiserver-host=http://my-address:port volumeMounts: - name: kubernetes-dashboard-certs mountPath: /certs # Create on-disk volume to store exec logs - mountPath: /tmp name: tmp-volume livenessProbe: httpGet: scheme: HTTPS path: / port: 8443 initialDelaySeconds: 30 timeoutSeconds: 30 securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true runAsUser: 1001 runAsGroup: 2001 volumes: - name: kubernetes-dashboard-certs secret: secretName: kubernetes-dashboard-certs - name: tmp-volume emptyDir: {} serviceAccountName: kubernetes-dashboard nodeSelector: "kubernetes.io/os": linux # Comment the following tolerations if Dashboard must not be deployed on master tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule --- kind: Service apiVersion: v1 metadata: labels: k8s-app: dashboard-metrics-scraper name: dashboard-metrics-scraper namespace: kubernetes-dashboard spec: ports: - port: 8000 targetPort: 8000 selector: k8s-app: dashboard-metrics-scraper --- kind: Deployment apiVersion: apps/v1 metadata: labels: k8s-app: dashboard-metrics-scraper name: dashboard-metrics-scraper namespace: kubernetes-dashboard spec: replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: k8s-app: dashboard-metrics-scraper template: metadata: labels: k8s-app: dashboard-metrics-scraper annotations: seccomp.security.alpha.kubernetes.io/pod: 'runtime/default' spec: containers: - name: dashboard-metrics-scraper image: kubernetesui/metrics-scraper:v1.0.4 ports: - containerPort: 8000 protocol: TCP livenessProbe: httpGet: scheme: HTTP path: / port: 8000 initialDelaySeconds: 30 timeoutSeconds: 30 volumeMounts: - mountPath: /tmp name: tmp-volume securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true runAsUser: 1001 runAsGroup: 2001 serviceAccountName: kubernetes-dashboard nodeSelector: "kubernetes.io/os": linux # Comment the following tolerations if Dashboard must not be deployed on master tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule volumes: - name: tmp-volume emptyDir: {}

创建dashboard服务：

[root@m1 ~]# kubectl create -f dashboard-all.yaml namespace/kubernetes-dashboard created serviceaccount/kubernetes-dashboard created service/kubernetes-dashboard created secret/kubernetes-dashboard-certs created secret/kubernetes-dashboard-csrf created secret/kubernetes-dashboard-key-holder created configmap/kubernetes-dashboard-settings created role.rbac.authorization.k8s.io/kubernetes-dashboard created clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created deployment.apps/kubernetes-dashboard created service/dashboard-metrics-scraper created deployment.apps/dashboard-metrics-scraper created [root@m1 ~]#

查看deployment运行情况：

[root@m1 ~]# kubectl get deployment kubernetes-dashboard -n kubernetes-dashboard NAME READY UP-TO-DATE AVAILABLE AGE kubernetes-dashboard 1/1 1 1 29s [root@m1 ~]#

查看dashboard pod运行情况：

[root@m1 ~]# kubectl --namespace kubernetes-dashboard get pods -o wide |grep dashboard dashboard-metrics-scraper-7b59f7d4df-q4jqj 1/1 Running 0 5m27s 172.22.152.198 s1 kubernetes-dashboard-5dbf55bd9d-nqvjz 1/1 Running 0 5m27s 172.22.202.17 m1 [root@m1 ~]#

查看dashboard service的运行情况：

[root@m1 ~]# kubectl get services kubernetes-dashboard -n kubernetes-dashboard NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes-dashboard NodePort 10.104.217.178 443:30005/TCP 5m57s [root@m1 ~]#

查看30005端口是否有被正常监听：

[root@m1 ~]# netstat -ntlp |grep 30005 tcp 0 0 0.0.0.0:30005 0.0.0.0:* LISTEN 4085/kube-proxy [root@m1 ~]# 访问dashboard

为了集群安全，从 1.7 开始，dashboard 只允许通过 https 访问，我们使用NodePort的方式暴露服务，可以使用 https://NodeIP:NodePort 地址访问。例如使用curl进行访问：

[root@m1 ~]# curl https://192.168.243.138:30005 -k Kubernetes Dashboard [root@m1 ~]# 由于dashboard的证书是自签的，所以这里需要加-k参数指定不验证证书进行https请求

关于自定义证书

默认dashboard的证书是自动生成的，肯定是非安全的证书，如果大家有域名和对应的安全证书可以自己替换掉。使用安全的域名方式访问dashboard。

在dashboard-all.yaml中增加dashboard启动参数，可以指定证书文件，其中证书文件是通过secret注进来的。

- –tls-cert-file - dashboard.cer - –tls-key-file - dashboard.key

登录dashboard

Dashboard 默认只支持 token 认证，所以如果使用 KubeConfig 文件，需要在该文件中指定 token，我们这里使用token的方式登录。

首先创建service account：

[root@m1 ~]# kubectl create sa dashboard-admin -n kube-system serviceaccount/dashboard-admin created [root@m1 ~]#

创建角色绑定关系：

[root@m1 ~]# kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kube-system:dashboard-admin clusterrolebinding.rbac.authorization.k8s.io/dashboard-admin created [root@m1 ~]#

查看dashboard-admin的secret名称：

[root@m1 ~]# kubectl get secrets -n kube-system | grep dashboard-admin | awk '{print $1}' dashboard-admin-token-ph7h2 [root@m1 ~]#

打印secret的token：

[root@m1 ~]# ADMIN_SECRET=$(kubectl get secrets -n kube-system | grep dashboard-admin | awk '{print $1}') [root@m1 ~]# kubectl describe secret -n kube-system ${ADMIN_SECRET} | grep -E '^token' | awk '{print $2}' eyJhbGciOiJSUzI1NiIsImtpZCI6IkVnaDRYQXgySkFDOGdDMnhXYXJWbkY2WVczSDVKeVJRaE5vQ0ozOG5PanMifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4tcGg3aDIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNjA1ZWY3OTAtOWY3OC00NDQzLTgwMDgtOWRiMjU1MjU0MThkIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmRhc2hib2FyZC1hZG1pbiJ9.xAO3njShhTRkgNdq45nO7XNy242f8XVs-W4WBMui-Ts6ahdZECoNegvWjLDCEamB0UW72JeG67f2yjcWohANwfDCHobRYPkOhzrVghkdULbrCCGai_fe60Svwf_apSmlKP3UUdu16M4GxopaTlINZpJY_z5KJ4kLq66Y1rjAA6j9TI4Ue4EazJKKv0dciv6NsP28l7-nvUmhj93QZpKqY3PQ7vvcPXk_sB-jjSSNJ5ObWuGeDBGHgQMRI4F1XTWXJBYClIucsbu6MzDA8yop9S7Ci8D00QSa0u3M_rqw-3UHtSxQee41uVVjIASfnCEVayKDIbJzG3gc2AjqGqJhkQ [root@m1 ~]#

获取到token后，使用浏览器访问https://192.168.243.138:30005，由于是dashboard是自签的证书，所以此时浏览器会提示警告。不用理会直接点击“高级” -> “继续前往”即可：

然后输入token：

成功登录后首页如下：

可视化界面也没啥可说的，这里就不进一步介绍了，可以自行探索一下。

【本文地址】

公司简介

联系我们