Kubernetes集群能保证在其上运行的各种服务、应用在不同物理节点上漂移以达到高可用的目的。但Kubernetes本身的高可用在生产环境中需要特别注意,参照官网给出的文章Building High-Availability Clusters ,可以搭建一个高可用的K8s环境。 下面我们使用三个master节点搭建集群。
一 准备kubelet 在每个master node上安装kubelet,kubelet负责以static pod形式运行etcd、kube-apiserver、kube-controller-manager等组件,这些组件由kubelet提供异常重启,资源管理等功能。 从官方下载kubelet ,解压后cp到/usr/bin 详细过程参考之前的文章,安装docker-ce,部署并使用systemd运行kubelet。
kubelet配置
1 KUBELET_ARGS="--pod-manifest-path=/etc/kubernetes/staticPods --allow_privileged=true --cluster-dns=10.172.0.2 --cluster-domain=wxk8s.local -pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google-containers/pause-amd64:3.0 --logtostderr=false --log-dir=/var/log/kubernetes --v=2"
–pod-manifest-path 指定了静态pod配置目录
二 etcd集群 K8s使用etcd存储数据,所以etcd集群化能提供数据安全、数据冗余及高可用。 我们在三个master node上都部署一个etcd节点,因为pod的ip是随机分配,所以使用etcd discovery服务部署集群。
1 curl -w "\n" 'https://discovery.etcd.io/new?size=3'
访问连接,得出一个token,之后在etcd配置中指定。 把下面的yaml放进/etc/kubernetes/staticPods,kubelet会创建出pod。etcd.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 apiVersion: v1 kind: Pod metadata: name: etcd-server1 spec: hostNetwork: true containers: - image: registry.cn-hangzhou.aliyuncs.com/google_containers/etcd-amd64:3.2.18 name: etcd-container command: - /usr/local/bin/etcd - --name - etcd-server1 - --initial-advertise-peer-urls - https://${master_node_ip}:2380 - --listen-peer-urls - https://${master_node_ip}:2380 - --advertise-client-urls - https://${master_node_ip}:4001 - --listen-client-urls - https://0.0.0.0:4001 - --data-dir - /data/etcd - --discovery - https://discovery.etcd.io/540c9ade1637bf4f24947ab1db14de7e - --client-cert-auth - --trusted-ca-file=/etc/etcd/ssl/ca.pem - --cert-file=/etc/etcd/ssl/etcdserver.pem - --key-file=/etc/etcd/ssl/etcdserver-key.pem - --peer-client-cert-auth - --peer-trusted-ca-file=/etc/etcd/ssl/ca.pem - --peer-cert-file=/etc/etcd/ssl/etcdserver.pem - --peer-key-file=/etc/etcd/ssl/etcdserver-key.pem ports: - containerPort: 2380 hostPort: 2380 name: serverport - containerPort: 4001 hostPort: 4001 name: clientport volumeMounts: - mountPath: /data/etcd name: dataetcd - mountPath: /etc/etcd/ssl name: etcdssl readOnly: true - mountPath: /usr/lib/ssl name: usrlibssl readOnly: true - mountPath: /etc/ssl name: etcssl readOnly: true volumes: - hostPath: path: /data/etcd name: dataetcd - hostPath: path: /etc/etcd/ssl name: etcdssl - hostPath: path: /usr/lib/ssl name: usrlibssl - hostPath: path: /etc/ssl name: etcssl
–name etcd节点名称
${master_node_ip} 替换为master节点的ip
–discovery 填写上面获取的etcd token url
–data-dir etcd数据存储目录,使用hostPath方式挂载节点目录
–client-cert-auth 客户端tls认证,可以使用cfssl工具生成tls证书
–peer-client-cert-auth 集群节点间的tls认证,可以使用cfssl工具生成tls证书
三个节点启动后,登录到容器验证一下
1 2 3 4 5 6 7 8 etcdctl --ca-file /etc/etcd/ssl/ca.pem \ --cert-file /etc/etcd/ssl/etcdclient.pem \ --key-file /etc/etcd/ssl/etcdclient-key.pem \ --endpoints https://localhost:4001 cluster-health member b134bae85b06bb33 is healthy: got healthy result from https://172.31.32.15:4001 member f22cd359aefb3870 is healthy: got healthy result from https://172.31.32.4:4001 member f7f2a252878f0d60 is healthy: got healthy result from https://172.31.32.2:4001 cluster is healthy
三 kube-apiserver apiserver比较简单,其需要tls认证,推荐使用easyrsa工具创建一个自签证书,tls证书需要包括负载均衡IP ,参考之前的文章。 剩下的工作是按实际修改好yaml,分别在三个master node上部署。apiserver.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 apiVersion: v1 kind: Pod metadata: name: kube-apiserver spec: hostNetwork: true containers: - name: kube-apiserver image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver-amd64:v1.9.7 command: - /bin/sh - -c - /usr/local/bin/kube-apiserver --bind-address=0.0.0.0 --storage-backend=etcd3 --etcd-servers=https://172.31.32.2:4001,https://172.31.32.4:4001,https://172.31.32.15:4001 --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota,DefaultTolerationSeconds --service-cluster-ip-range=10.172.0.0/16 --client-ca-file=/etc/kubernetes/pki/ca.crt --tls-cert-file=/etc/kubernetes/pki/server.crt --tls-private-key-file=/etc/kubernetes/pki/server.key --secure-port=443 --service-node-port-range=1-65535 --v=2 --allow-privileged=true --endpoint-reconciler-type=lease --etcd-cafile=/etc/kubernetes/pki/ca.pem --etcd-certfile=/etc/kubernetes/pki/etcdclient.pem --etcd-keyfile=/etc/kubernetes/pki/etcdclient-key.pem 1>>/var/log/kube-apiserver.log 2>&1 ports: - containerPort: 443 hostPort: 443 name: https - containerPort: 7080 hostPort: 7080 name: http - containerPort: 8080 hostPort: 8080 name: local volumeMounts: - mountPath: /etc/kubernetes/pki name: kubepki readOnly: true - mountPath: /var/log/kube-apiserver.log name: logfile - mountPath: /etc/ssl name: etcssl readOnly: true - mountPath: /usr/lib/ssl name: usrlibssl readOnly: true volumes: - hostPath: path: /etc/kubernetes/pki name: kubepki - hostPath: path: /var/log/kubernetes/kube-apiserver.log name: logfile - hostPath: path: /etc/ssl name: etcssl - hostPath: path: /usr/lib/ssl name: usrlibssl
–endpoint-reconciler-type=lease 多个apiserver同时运行时,需要加这个lease锁用于防止冲突和相互告知状态。
负载均衡 apiserver有三个,需要使用一些负载均衡的手段达到高可用目的,这里如果选择在云上运行就比较简单,直接用云提供的负载均衡。 如果自己搭建的物理节点,则可以考虑haproxy + keepalive方案。tls证书需要包括负载均衡IP。
四、其他组件 包括kube-scheduler、kube-controller-manager,这两个部署简单,也是static pod方式,在所有master node上部署。kube-controller-manager.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 apiVersion: v1 kind: Pod metadata: name: kube-controller-manager spec: containers: - command: - /bin/sh - -c - /usr/local/bin/kube-controller-manager --master=127.0.0.1:8080 --service_account_private_key_file=/etc/kubernetes/pki/server.key --root-ca-file=/etc/kubernetes/pki/ca.crt --v=2 --leader-elect=true 1>>/var/log/kube-controller-manager.log 2>&1 image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager-amd64:v1.9.7 livenessProbe: httpGet: path: /healthz port: 10252 initialDelaySeconds: 15 timeoutSeconds: 1 name: kube-controller-manager volumeMounts: - mountPath: /etc/kubernetes name: kubeconf readOnly: true - mountPath: /var/log/kube-controller-manager.log name: logfile - mountPath: /etc/ssl name: etcssl readOnly: true - mountPath: /usr/lib/ssl name: usrlibssl readOnly: true hostNetwork: true volumes: - hostPath: path: /var/log/kubernetes/kube-controller-manager.log name: logfile - hostPath: path: /etc/kubernetes name: kubeconf - hostPath: path: /etc/ssl name: etcssl - hostPath: path: /usr/lib/ssl name: usrlibssl
–service_account_private_key_file 用于server account权限,不填会出权限问题
–leader-elect 类似于一个排他锁,保证只有一个controller-manager作用于集群
kube-scheduler.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 apiVersion: v1 kind: Pod metadata: name: kube-scheduler spec: hostNetwork: true containers: - name: kube-scheduler image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler-amd64:v1.9.7 command: - /bin/sh - -c - /usr/local/bin/kube-scheduler --master=127.0.0.1:8080 --v=2 --leader-elect=true 1>>/var/log/kube-scheduler.log 2>&1 livenessProbe: httpGet: path: /healthz port: 10251 initialDelaySeconds: 15 timeoutSeconds: 1 volumeMounts: - mountPath: /var/log/kube-scheduler.log name: logfile - mountPath: /etc/kubernetes name: kubeconf readOnly: true - mountPath: /etc/ssl name: etcssl readOnly: true - mountPath: /usr/lib/ssl name: usrlibssl readOnly: true volumes: - hostPath: path: /var/log/kubernetes/kube-scheduler.log name: logfile - hostPath: path: /etc/kubernetes name: kubeconf - hostPath: path: /etc/ssl name: etcssl - hostPath: path: /usr/lib/ssl name: usrlibssl