Contents
  1. 1. 服务器规划
  2. 2. 容器环境
    1. 2.1. containerd
    2. 2.2. crictl
  3. 3. master部分
    1. 3.1. kubelet standalone
    2. 3.2. 证书
    3. 3.3. etcd集群
    4. 3.4. kube-apiserver
    5. 3.5. kubectl
    6. 3.6. kube-controller-manager
    7. 3.7. kube-scheduler
  4. 4. worker节点部分
    1. 4.1. 服务器优化和基础软件安装
      1. 4.1.1. ulimit
      2. 4.1.2. sysctl.conf
    2. 4.2. kubelet
    3. 4.3. kube-proxy
  5. 5. 网络和基础容器
    1. 5.1. calico
    2. 5.2. coredns
    3. 5.3. local dns cache
    4. 5.4. kubelet-csr-approver
    5. 5.5. metrics server

最近要新装一套k8s,二进制方式可以说比较难的了,过程有点像搭积木,好处是能更加清晰了解k8s的内部组件,这次选1.29这个比较新的版本。另外众所周知的原因,下面涉及的容器尽量使用国内源或者自己tag push的版本。

服务器规划

因为是正式的环境,选用3个master,k8s需要运行etcd,kube-apiserver,kube-controller-manager,kube-scheduler,另外我选择使用一个standalone运行的kubelet去容器化运行这些组件,这样还要安装containerd。如果不是在公有云上部署的话可能需要haproxy和keepalive做apiserver的负载均衡。node节点使用ipvs处理service,服务器OS统一使用ubuntu 22.04升级到5.19内核。
服务器规划如下

hosts ip component
master1 10.1.1.1 containerd,crictl,kubelet,etcd,kube-apiserver,kube-controller-manager,kube-scheduler
master2 10.1.1.2 containerd,crictl,kubelet,etcd,kube-apiserver,kube-controller-manager,kube-scheduler
master3 10.1.1.3 containerd,crictl,kubelet,etcd,kube-apiserver,kube-controller-manager,kube-scheduler
node1 10.1.1.4 containerd,crictl,kubelet,kube-proxy,ipvs
node2 10.1.1.5 containerd,crictl,kubelet,kube-proxy,ipvs

容器环境

这个集群使用containerd做容器runtime,还有crictl是客户端,这2个每个节点都安装

containerd

因为k8s好像是从1.1x版本推出了cri概念,到了1.24开始不支持docker做runtime了,现在推荐安装containerd或者cri-o做容器runtime,使用二进制安装,配置文件注意一下就好。
先从containerd的Versioning and release说明了解kubernetes和containerd的版本关系,然后从containerd官方GitHubrunc下载二进制版本
containerd配置可以参考k8s官方: 容器运行时

把下载的二进制包解压之后containerd放到/usr/local/bin,runc放到/usr/local/sbin
之后配置containerd

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
plugin_dir = ""
disabled_plugins = []
required_plugins = []
oom_score = 0

[grpc]
address = "/run/containerd/containerd.sock"
tcp_address = ""
tcp_tls_cert = ""
tcp_tls_key = ""
uid = 0
gid = 0
max_recv_message_size = 16777216
max_send_message_size = 16777216

[ttrpc]
address = ""
uid = 0
gid = 0

[debug]
address = ""
uid = 0
gid = 0
level = ""

[metrics]
address = ""
grpc_histogram = false

[cgroup]
path = ""

[timeouts]
"io.containerd.timeout.shim.cleanup" = "5s"
"io.containerd.timeout.shim.load" = "5s"
"io.containerd.timeout.shim.shutdown" = "3s"
"io.containerd.timeout.task.state" = "2s"

[plugins]
[plugins."io.containerd.gc.v1.scheduler"]
pause_threshold = 0.02
deletion_threshold = 0
mutation_threshold = 100
schedule_delay = "0s"
startup_delay = "100ms"
[plugins."io.containerd.grpc.v1.cri"]
disable_tcp_service = true
stream_server_address = "127.0.0.1"
stream_server_port = "0"
stream_idle_timeout = "4h0m0s"
enable_selinux = false
sandbox_image = "ccr.ccs.tencentyun.com/google_container/pause-amd64:3.1"
stats_collect_period = 10
systemd_cgroup = false
enable_tls_streaming = false
max_container_log_line_size = 16384
disable_cgroup = false
disable_apparmor = false
restrict_oom_score_adj = false
max_concurrent_downloads = 3
disable_proc_mount = false
[plugins."io.containerd.grpc.v1.cri".containerd]
snapshotter = "overlayfs"
default_runtime_name = "runc"
no_pivot = false
[plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
runtime_type = ""
runtime_engine = ""
runtime_root = ""
privileged_without_host_devices = false
[plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime]
runtime_type = ""
runtime_engine = ""
runtime_root = ""
privileged_without_host_devices = false
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
runtime_engine = ""
runtime_root = ""
privileged_without_host_devices = false
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/net.d"
max_conf_num = 1
conf_template = ""
[plugins."io.containerd.grpc.v1.cri".registry]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
endpoint = ["https://mirror.ccs.tencentyun.com","https://m.daocloud.io/docker.io"]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."quay.io"]
endpoint = ["https://m.daocloud.io/quay.io"]
[plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming]
tls_cert_file = ""
tls_key_file = ""
[plugins."io.containerd.internal.v1.opt"]
path = "/opt/containerd"
[plugins."io.containerd.internal.v1.restart"]
interval = "10s"
[plugins."io.containerd.metadata.v1.bolt"]
content_sharing_policy = "shared"
[plugins."io.containerd.monitor.v1.cgroups"]
no_prometheus = false
[plugins."io.containerd.runtime.v1.linux"]
shim = "containerd-shim"
runtime = "runc"
runtime_root = ""
no_shim = false
shim_debug = false
[plugins."io.containerd.runtime.v2.task"]
platforms = ["linux/amd64"]
[plugins."io.containerd.service.v1.diff-service"]
default = ["walking"]
[plugins."io.containerd.snapshotter.v1.devmapper"]
root_path = ""
pool_name = ""
base_image_size = ""
  • sandbox_image 这个要配好,因为在kubelet 1.27已经删除了–pod-infra-container-image 参数,都从containerd这里配了
  • ubuntu22 默认使用了cgroup v2(参考上面的k8s官方链接),所以在[plugins.”io.containerd.grpc.v1.cri”.containerd.runtimes.runc.options]下面加上SystemdCgroup = true,同时kubelet也要设置systemd cgroup

创建一个systemd配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target local-fs.target

[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/containerd

Type=notify
Delegate=yes
KillMode=process
Restart=always
RestartSec=5
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
LimitNOFILE=1048576
# Comment TasksMax if your systemd version does not supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
OOMScoreAdjust=-999

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload && sudo systemctl start containerd
应该能看到containerd起来了

crictl

也一样是二进制安装,从官方GitHub下载,解压放到/usr/local/bin,然后写配置

1
2
3
cat > /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///run/containerd/containerd.sock
EOF

完成之后运行命令测试下
sudo crictl images

master部分

kubelet standalone

master的kubelet主要作为基座运行容器,kubelet本身足够稳定,也能统一风格,当然使用docker compose估计也没问题吧。
kubelet standalone这种方式记得是老版本官方推荐的部署方式,不知道有没有记错,网上很多教程用systemd跑etcd、apiserver等组件,感觉配置繁琐
standalone运行简单点,从k8s GitHub 下载二进制版本,因为这个是独立跑的,版本可以不一样,不过我这里还是用了1.29,解压之后把kubelet放到/usr/local/bin
创建systemd配置kubelet.service,这个文件可以复用到node节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes
After=containerd.service
Requires=containerd.service

[Service]
ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpuset/system.slice/kubelet.service
ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/hugetlb/system.slice/kubelet.service
EnvironmentFile=/etc/kubernetes/kubelet
WorkingDirectory=/var/lib/kubelet
ExecStart=/usr/local/bin/kubelet $KUBELET_ARGS
Restart=on-failure

[Install]
WantedBy=multi-user.target

kubelet,是二进制启动参数文件,现在很多参数都废弃了改成写进KubeletConfiguration

1
2
3
KUBELET_ARGS="--config=/etc/kubernetes/kubeletConfig \
--cert-dir=/etc/kubernetes/pki \
--v=2"
  • 旧版本如果使用containerd这里还要配 –container-runtime=remote –container-runtime-endpoint=unix:///run/containerd/containerd.sock

kubeletConfig,是主要运行配置文件,大部分参数在这里,standalone的配置简单些

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
port: 10250
readOnlyPort: 10255
cgroupDriver: systemd
failSwapOn: true
staticPodPath: /etc/kubernetes/staticPods
imageGcHighThreshold: 70
imageGcLowThreshold: 50
RuntimeRequestTimeout: 10m
EnforceNodeAllocatable: ["pods"]
evictionHard:
memory.available: 300M
authentication:
anonymous:
enabled: true
webhook:
enabled: false
authorization:
mode: AlwaysAllow

之后systemd启动kubelet就可以在/etc/kubernetes/staticPods放容器yaml文件了。

证书

k8s内部组件之间通讯和认证等环节很多都需要x509证书,各种证书的用途可以参考kubernetes文档,我们使用cfssl工具生成各种证书。
下载cfssl、cfssljson放到/usr/local/bin
我们大概需要准备下面这些证书

ca cert kind usage
etcd-ca.pem,etcd-ca-key.pem etcd-peer.pem,etcd-peer-key.pem server,clinet apiserver与etcd通讯加密
k8s-ca.pem,k8s-ca-key.pem apiserver.pem,apiserver-key.pem server apiserver服务器证书
controller-manager.pem,controller-manager-key.pem server,clinet controller-manager证书,验证controller-manager权限
kube-scheduler.pem,kube-scheduler-key.pem server,clinet scheduler证书,验证scheduler权限
admin.pem, admin-key.pem client kubectl管理集群
kubelet-server.pem,kubelet-server-key.pem server kubelet服务端证书 ,手动指定kubelet服务端证书
kubelet-ca.pem,kubelet-ca-key.pem kubelet-client.pem,kubelet-client-key.pem client apiserver与kubelet通讯认证
agg-ca.pem, agg-ca-key.pem agg.pem,agg-key.pem clinet aggregation层认证

目前证书算法可以用rsa和ecdsa,生成证书的过程参考之前的文章
ca证书例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
"CA":{
"expiry":"876000h"
},
"CN": "kubernetes",
"key": {
"algo": "rsa",
"size": 4096
},
"names": [
{
"C": "CN",
"ST": "GD",
"L": "Shenzhen",
"O": "dingding",
"OU": "System"
}
]
}
  • ca的expiry如果不写默认是5年,如果不知道5年后会踩坑

etcd集群

etcd是k8s存储引擎,后期备份很重要,容器化搭建集群很简单,把配置复制到/etc/kubernetes/staticPods目录
etcd yaml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
apiVersion: v1
kind: Pod
metadata:
name: etcd-server1
spec:
hostNetwork: true
containers:
- image: quay.io/coreos/etcd:v3.5.15
name: etcd-container
command:
- etcd
- --name
- etcd1
- --data-dir
- /data/etcd
- --listen-peer-urls
- https://10.1.1.1:2380
- --listen-client-urls
- https://0.0.0.0:4001
- --initial-advertise-peer-urls
- https://10.1.1.1:2380
- --advertise-client-urls
- https://10.1.1.1:4001
- --initial-cluster
- etcd1=https://10.1.1.1:2380,etcd2=https://10.1.1.2:2380,etcd3=https://10.1.1.3:2380
- --initial-cluster-token
- dtalk-k8s-etcd
- --client-cert-auth
- --trusted-ca-file=/etc/etcdssl/etcd-ca.pem
- --cert-file=/etc/etcdssl/etcd-peer.pem
- --key-file=/etc/etcdssl/etcd-peer-key.pem
- --peer-client-cert-auth
- --peer-trusted-ca-file=/etc/etcdssl/etcd-ca.pem
- --peer-cert-file=/etc/etcdssl/etcd-peer.pem
- --peer-key-file=/etc/etcdssl/etcd-peer-key.pem
ports:
- containerPort: 2380
hostPort: 2380
name: serverport
- containerPort: 4001
hostPort: 4001
name: clientport
volumeMounts:
- mountPath: /data/etcd
name: dataetcd
- mountPath: /etc/etcdssl
name: etcdssl
readOnly: true
- mountPath: /usr/lib/ssl
name: usrlibssl
readOnly: true
- mountPath: /etc/ssl
name: etcssl
readOnly: true
volumes:
- hostPath:
path: /data/etcd
name: dataetcd
- hostPath:
path: /data/etcdssl
name: etcdssl
- hostPath:
path: /usr/lib/ssl
name: usrlibssl
- hostPath:
path: /etc/ssl
name: etcssl
  • 注意修改ip和–name参数,在每个master都复制一份
  • ETCD_NAME:节点名称,集群中唯一
  • ETCD_DATA_DIR:数据目录
  • ETCD_LISTEN_PEER_URLS:集群通信监听地址
  • ETCD_LISTEN_CLIENT_URLS:客户端访问监听地址
  • ETCD_INITIAL_ADVERTISE_PEER_URLS:集群通告地址
  • ETCD_ADVERTISE_CLIENT_URLS:客户端通告地址
  • ETCD_INITIAL_CLUSTER:集群节点地址
  • ETCD_INITIAL_CLUSTER_TOKEN:集群 Token
  • ETCD_INITIAL_CLUSTER_STATE:加入集群的当前状态,new 是新集群,existing 表示加入已有集群

节点都启动之后使用crsctl logs看看日志,如果没报错就跑一下health命令

1
2
3
etcdctl --cacert=/etc/etcdssl/etcd-ca.pem --cert=/etc/etcdssl/etcd-peer.pem --key=/etc/etcdssl/etcd-peer-key.pem \
--endpoints=https://10.1.1.1:4001,https://10.1.1.2:4001,https://10.1.1.3:4001 \
--write-out=table endpoint health

kube-apiserver

apiserver证书csr

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
{
"CN": "kubernetes",
"hosts": [
"127.0.0.1",
"10.1.1.1",
"10.1.1.2",
"10.1.1.3",
"10.1.1.10",
"192.168.0.1",
"localhost",
"kubernetes",
"kubernetes.default",
"kubernetes.default.svc",
"kubernetes.default.svc.cluster",
"kubernetes.default.svc.cluster.local"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "GD",
"L": "Shenzhen",
"O": "dingding",
"OU": "System"
}
]
}
  • hosts要写上3个master和负载均衡ip,192.168.0.1是集群service第一个ip,也是kubernetes.default service

创建一个token.csv,用于kubelet node初始认证

1
2
3
cat > token.csv << EOF
$(head -c 16 /dev/urandom | od -An -t x | tr -d ' '),kubelet-bootstrap,10001,"system:kubelet-bootstrap"
EOF

apiserver的yml如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
apiVersion: v1
kind: Pod
metadata:
name: kube-apiserver
spec:
hostNetwork: true
containers:
- name: kube-apiserver
image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver-amd64:v1.29.7
command:
- kube-apiserver
args:
- --bind-address=0.0.0.0
- --advertise-address=10.1.1.1
- --secure-port=7443
- --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,NodeRestriction
- --service-cluster-ip-range=192.168.0.0/16
- --enable-bootstrap-token-auth
- --authorization-mode=RBAC
- --token-auth-file=/etc/kubernetes/pki/token.csv
- --client-ca-file=/etc/kubernetes/pki/k8s-ca.pem
- --tls-cert-file=/etc/kubernetes/pki/apiserver.pem
- --tls-private-key-file=/etc/kubernetes/pki/apiserver-key.pem
- --service-account-key-file=/etc/kubernetes/pki/k8s-ca-key.pem
- --service-account-signing-key-file=/etc/kubernetes/pki/k8s-ca-key.pem
- --service-account-issuer=https://kubernetes.default.svc
- --service-node-port-range=1-65535
- --allow-privileged=true
- --endpoint-reconciler-type=lease
- --storage-backend=etcd3
- --etcd-servers=https://10.1.1.1:4001,https://10.1.1.2:4001,https://10.1.1.3:4001
- --etcd-cafile=/etc/kubernetes/pki/etcd-ca.pem
- --etcd-certfile=/etc/kubernetes/pki/etcd-peer.pem
- --etcd-keyfile=/etc/kubernetes/pki/etcd-peer-key.pem
- --requestheader-client-ca-file=/etc/kubernetes/pki/agg-ca.pem
- --proxy-client-cert-file=/etc/kubernetes/pki/agg.pem
- --proxy-client-key-file=/etc/kubernetes/pki/agg-key.pem
- --kubelet-client-certificate=/etc/kubernetes/pki/kubelet-client.pem
- --kubelet-client-key=/etc/kubernetes/pki/kubelet-client-key.pem
- --requestheader-allowed-names=aggregator
- --requestheader-extra-headers-prefix=X-Remote-Extra-
- --requestheader-group-headers=X-Remote-Group
- --requestheader-username-headers=X-Remote-User
- --enable-aggregator-routing=true
- --v=2
ports:
- containerPort: 7443
hostPort: 7443
name: https
volumeMounts:
- mountPath: /etc/kubernetes/pki
name: kubepki
readOnly: true
- mountPath: /etc/ssl
name: etcssl
readOnly: true
- mountPath: /usr/lib/ssl
name: usrlibssl
readOnly: true
volumes:
- hostPath:
path: /etc/kubernetes/pki
name: kubepki
- hostPath:
path: /etc/ssl
name: etcssl
- hostPath:
path: /usr/lib/ssl
name: usrlibssl
  • 注意修改–advertise-address 的ip
  • –enable-bootstrap-token-auth 和 –token-auth-file是用于kubelet node认证,详情可以看:kubelet tls bootstrap
  • –requestheader-client-ca-file 和 –proxy-client开头的证书是用于扩展api认证,–requestheader-allowed-names指定允许的证书CN,参考官方config aggregate layer
  • –kubelet-client开头的证书用于apiserver请求kubelet通信鉴权
  • 老版本apiserver可以绑个localhost:8080的insecure给本地用,这样controller和scheduler配置都方便点,不用那么多cert,现在新版本不给用了

kubectl

先配置好kubectl,用于后续操作apiserver
把二进制kubectl放进/usr/local/bin
使用admin.pem证书生成kubeconfig文件

1
2
3
4
5
kubectl config set-cluster kubernetes --certificate-authority=k8s-ca.pem --embed-certs=true --server=https://10.1.1.10:7443 --kubeconfig=kube.config
kubectl config set-credentials admin --client-certificate=admin.pem --client-key=admin-key.pem --embed-certs=true --kubeconfig=kube.config
kubectl config set-context kubernetes --cluster=kubernetes --user=admin --kubeconfig=kube.config
kubectl config use-context kubernetes --kubeconfig=kube.config
mv kube.config ~/.kube/config

配好之后运行kubectl version看看能不能正确返回server端版本

kube-controller-manager

controller-manager证书csr

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
"CN": "system:kube-controller-manager",
"key": {
"algo": "rsa",
"size": 2048
},
"hosts": [
"127.0.0.1",
"10.1.1.1",
"10.1.1.2",
"10.1.1.3"
],
"names": [
{
"C": "CN",
"ST": "GD",
"L": "Shenzhen",
"O": "system:kube-controller-manager",
"OU": "system"
}
]
}
  • CN和O都是system:kube-controller-manager,k8s rbac认证体系中证书CN=user,O=group,system:kube-controller-manager是内置的适用于controller-manager的权限

生成kube-controller-manager.kubeconfig,用于与apiserver通讯

1
2
3
4
kubectl config set-cluster kubernetes --certificate-authority=k8s-ca.pem --embed-certs=true --server=https://10.1.1.10:7443 --kubeconfig=kube-controller-manager.kubeconfig
kubectl config set-credentials system:kube-controller-manager --client-certificate=controller-manager.pem --client-key=controller-manager-key.pem --embed-certs=true --kubeconfig=kube-controller-manager.kubeconfig
kubectl config set-context system:kube-controller-manager --cluster=kubernetes --user=system:kube-controller-manager --kubeconfig=kube-controller-manager.kubeconfig
kubectl config use-context system:kube-controller-manager --kubeconfig=kube-controller-manager.kubeconfig

controller-manager.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
apiVersion: v1
kind: Pod
metadata:
name: kube-controller-manager
spec:
containers:
- command:
- kube-controller-manager
args:
- --bind-address=127.0.0.1
- --kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig
- --service-account-private-key-file=/etc/kubernetes/pki/k8s-ca-key.pem
- --root-ca-file=/etc/kubernetes/pki/k8s-ca.pem
- --feature-gates=RotateKubeletServerCertificate=true
- --controllers=*,bootstrapsigner,tokencleaner
- --cluster-signing-key-file=/etc/kubernetes/pki/k8s-ca-key.pem
- --cluster-signing-cert-file=/etc/kubernetes/pki/k8s-ca.pem
- --tls-cert-file=/etc/kubernetes/pki/controller-manager.pem
- --tls-private-key-file=/etc/kubernetes/pki/controller-manager-key.pem
- --use-service-account-credentials=true
- --v=2
- --leader-elect=true
image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager-amd64:v1.29.7
livenessProbe:
tcpSocket:
port: 10257
initialDelaySeconds: 15
timeoutSeconds: 1
name: kube-controller-manager
volumeMounts:
- mountPath: /etc/kubernetes
name: kubeconf
readOnly: true
- mountPath: /var/log/kubernetes
name: logfile
- mountPath: /etc/ssl
name: etcssl
readOnly: true
- mountPath: /usr/lib/ssl
name: usrlibssl
readOnly: true
hostNetwork: true
volumes:
- hostPath:
path: /var/log/kubernetes
name: logfile
- hostPath:
path: /etc/kubernetes
name: kubeconf
- hostPath:
path: /etc/ssl
name: etcssl
- hostPath:
path: /usr/lib/ssl
name: usrlibssl
  • cluster-signing两个证书是用来给kubelet证书签名的,kubelet会自动轮换证书,详情可以看:kubelet tls bootstrap

kube-scheduler

scheduler证书csr

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
"CN": "system:kube-scheduler",
"key": {
"algo": "rsa",
"size": 2048
},
"hosts": [
"127.0.0.1",
"10.1.1.1",
"10.1.1.2",
"10.1.1.3"
],
"names": [
{
"C": "CN",
"ST": "GD",
"L": "Shenzhen",
"O": "system:kube-scheduler",
"OU": "system"
}
]
}
  • system:kube-scheduler是内置scheduler权限

生成kube-scheduler.kubeconfig

1
2
3
4
kubectl config set-cluster kubernetes --certificate-authority=k8s-ca.pem --embed-certs=true --server=https://10.1.1.10:7443 --kubeconfig=kube-scheduler.kubeconfig
kubectl config set-credentials system:kube-scheduler --client-certificate=kube-scheduler.pem --client-key=kube-scheduler-key.pem --embed-certs=true --kubeconfig=kube-scheduler.kubeconfig
kubectl config set-context system:kube-scheduler --cluster=kubernetes --user=system:kube-scheduler --kubeconfig=kube-scheduler.kubeconfig
kubectl config use-context system:kube-scheduler --kubeconfig=kube-scheduler.kubeconfig

kube-scheduler.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
apiVersion: v1
kind: Pod
metadata:
name: kube-scheduler
spec:
hostNetwork: true
containers:
- name: kube-scheduler
image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler-amd64:v1.29.7
command:
- kube-scheduler
args:
- --kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig
- --v=2
- --leader-elect=true
livenessProbe:
tcpSocket:
port: 10259
initialDelaySeconds: 15
timeoutSeconds: 1
volumeMounts:
- mountPath: /var/log/kubernetes
name: logfile
- mountPath: /etc/kubernetes
name: kubeconf
readOnly: true
- mountPath: /etc/ssl
name: etcssl
readOnly: true
- mountPath: /usr/lib/ssl
name: usrlibssl
readOnly: true
volumes:
- hostPath:
path: /var/log/kubernetes
name: logfile
- hostPath:
path: /etc/kubernetes
name: kubeconf
- hostPath:
path: /etc/ssl
name: etcssl
- hostPath:
path: /usr/lib/ssl
name: usrlibssl

通过kubectl get cs验证集群安装完成

worker节点部分

服务器优化和基础软件安装

worker节点需要优化一下systctl,时间同步,ulimit等,然后安装个ipvs用于service负载均衡,传统使用iptables实现的service规则多少有点性能低吧,新安装的就按推荐的来。
时间同步如果是云服务器应该已经配好,没有的话使用systemd-timesyncd配置比较简单

ulimit

/etc/security/limits.conf 加上

1
2
3
4
5
6
* soft nofile 655360
* hard nofile 131072
* soft nproc 655350
* hard nproc 655350
* soft memlock unlimited
* hard memlock unlimited

sysctl.conf

这个配置是ubuntu20的,22应该也差不多吧

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
net.core.netdev_max_backlog=10000
net.core.somaxconn=32768
net.ipv4.conf.all.rp_filter=1
net.ipv4.tcp_max_syn_backlog=8096
fs.inotify.max_user_instances=8192
fs.file-max=2097152
fs.inotify.max_user_watches=524288
net.core.bpf_jit_enable=1
net.core.bpf_jit_harden=1
net.core.bpf_jit_kallsyms=1
net.core.dev_weight_tx_bias=1
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_rmem=4096 12582912 16777216
net.ipv4.tcp_wmem=4096 12582912 16777216
net.core.rps_sock_flow_entries=8192
net.ipv4.neigh.default.gc_thresh1=2048
net.ipv4.neigh.default.gc_thresh2=4096
net.ipv4.neigh.default.gc_thresh3=8192
net.ipv4.tcp_max_orphans=32768
net.ipv4.tcp_max_tw_buckets=32768
net.ipv4.tcp_fastopen = 3
vm.max_map_count=262144
kernel.threads-max=30058
net.ipv4.ip_forward=1

kubelet

二进制安装参考上面的步骤,但是在worker节点上配置要复杂一些
先创建token.csv和bootstrap文件,用于kubelet和apiserver初始交互,主要是通过一个低权限的token交互自动创建用于真正通讯的证书和kubeconfig

1
2
3
4
5
6
BOOTSTRAP_TOKEN=$(awk -F "," '{print $1}' /etc/kubernetes/token.csv)
kubectl config set-cluster kubernetes --certificate-authority=ca.pem --embed-certs=true --server=https://10.1.1.10:7443 --kubeconfig=kubelet-bootstrap.kubeconfig
kubectl config set-credentials kubelet-bootstrap --token=${BOOTSTRAP_TOKEN} --kubeconfig=kubelet-bootstrap.kubeconfig
kubectl config set-context default --cluster=kubernetes --user=kubelet-bootstrap --kubeconfig=kubelet-bootstrap.kubeconfig
kubectl config use-context default --kubeconfig=kubelet-bootstrap.kubeconfig

systemd配置用上面章节的,kubelet参数文件:

1
2
3
4
5
KUBELET_ARGS="--config=/etc/kubernetes/kubeletConfig \
--bootstrap-kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig \
--kubeconfig=/etc/kubernetes/kubelet.kubeconfig \
--cert-dir=/etc/kubernetes/pki \
--v=2"
  • kubelet.kubeconfig 是自动生成的,只要配置好–bootstrap-kubeconfig就ok
  • –rotate-certificates 废弃了,不知道哪个版本就会删掉,改成在kubeletConfig配置

kubeletConfig是重点配置文件,大部分配置写在这里

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
port: 10250
readOnlyPort: 10255
serializeImagePulls: false
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 2m0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/kubelet-ca.pem
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 5m0s
cacheUnauthorizedTTL: 30s
cgroupDriver: systemd
cgroupsPerQOS: true
staticPodPath: /etc/kubernetes/staticPods
imageGcHighThreshold: 70
imageGcLowThreshold: 50
featureGates:
RotateKubeletServerCertificate: true
rotateCertificates: true
serverTLSBootstrap: true
RuntimeRequestTimeout: 10m
EnforceNodeAllocatable: ["pods","system-reserved","kube-reserved"]
SystemReservedCgroup: /system.slice
KubeReservedCgroup: /system.slice/kubelet.service
systemReserved:
cpu: 500m
memory: 512Mi
kubeReserved:
cpu: 500m
memory: 512Mi
evictionHard:
memory.available: 300Mi
clusterDomain: "cluster.local"
clusterDNS:
- "169.254.20.10"
  • clientCAFile 是kubelet的认证ca,apiserver请求kubelet用独立一套ca
  • cgroupDriver: systemd ubuntu 22.04 默认使用了cgroup v2
  • tlsCertFile 和 tlsPrivateKeyFile 手动指定kubelet服务端证书,注意ca是k8s-ca.pem
  • serverTLSBootstrap 向apiserver发送csr请求kubelet serving证书,但是出于安全原因,Kubernetes 核心中所实现的 CSR 批复控制器并不会自动批复节点的服务证书。 要使用 RotateKubeletServerCertificate 功能特性, 集群运维人员需要运行一个定制的控制器或者手动批复服务证书的请求。参考kubelet tls的证书轮换中的说明
  • 以上两个都不设置则kubelet会自签一套kubelet.crt,kubelet.key
  • clusterDNS 这里的ip是后面配置local dns的ip,如果不用local dns就写cluster service 网段的第二个ip 192.168.0.2

kubelet开启了x509并且在apiserver有设置kubelet client证书,需要设置一个clusterrolebinding授权,user是证书CN

1
2
kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --user=kubelet-bootstrap
kubectl create clusterrolebinding kubelet-admin --clusterrole=system:kubelet-api-admin --user=kubeletadmin

设置RBAC权限,自动批复和续约 bootstrap的csr

1
2
kubectl create clusterrolebinding  auto-approve-csrs-for-group --clusterrole=system:certificates.k8s.io:certificatesigningrequests:nodeclient --group=system:kubeletbootstrap --user=kubelet-bootstrap
kubectl create clusterrolebinding auto-approve-renewals-for-nodes --clusterrole=system:certificates.k8s.io:certificatesigningrequests:selfnodeclient --group=system:nodes

手动批复kubelet serving csr

1
2
3
4
5
# kubectl -n kube-system get csr
NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION
csr-brvcz 85s kubernetes.io/kubelet-serving system:node:<node name> <none> Pending

# kubectl certificate approve csr-brvcz

condition变成 Approved,Issued 就可以了,在cert dir会出现kubelet-server-date.pem

kube-proxy

kube-proxy的部署使用daemonset方式,直接使用 ServiceAccount 的 token 认证,不需要签发证书,也就不用担心证书过期问题
参考了bookstack.cn

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
APISERVER="https://10.1.1.10:7443"
CLUSTER_CIDR="172.18.0.0/16"

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-proxy
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:kube-proxy
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:node-proxier
subjects:
- kind: ServiceAccount
name: kube-proxy
namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-proxy
namespace: kube-system
labels:
app: kube-proxy
data:
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: ${APISERVER}
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 10
contentType: application/vnd.kubernetes.protobuf
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 5
# 集群中 Pod IP 的 CIDR 范围
clusterCIDR: ${CLUSTER_CIDR}
configSyncPeriod: 15m0s
conntrack:
# 每个核心最大能跟踪的NAT连接数,默认32768
maxPerCore: 32768
min: 131072
tcpCloseWaitTimeout: 1h0m0s
tcpEstablishedTimeout: 24h0m0s
enableProfiling: false
healthzBindAddress: 0.0.0.0:10256
iptables:
# SNAT 所有 Service 的 CLUSTER IP
masqueradeAll: false
masqueradeBit: 14
minSyncPeriod: 0s
syncPeriod: 30s
ipvs:
minSyncPeriod: 0s
# ipvs 调度类型,默认是 rr,支持的所有类型:
# rr: round-robin
# lc: least connection
# dh: destination hashing
# sh: source hashing
# sed: shortest expected delay
# nq: never queue
scheduler: rr
syncPeriod: 30s
metricsBindAddress: 0.0.0.0:10249
# 使用 ipvs 模式转发 service
mode: ipvs
# 设置 kube-proxy 进程的 oom-score-adj 值,范围 [-1000,1000]
# 值越低越不容易被杀死,这里设置为 —999 防止发生系统OOM时将 kube-proxy 杀死
oomScoreAdj: -999
EOF
  • CLUSTER_CIDR 集群POD IP,使用私有地址,同下面calico配置中的ip pool一样

创建DaemonSet

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
ARCH="amd64"
VERSION="v1.29.7"
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
k8s-app: kube-proxy-ds-${ARCH}
name: kube-proxy-ds-${ARCH}
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: kube-proxy-ds-${ARCH}
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
k8s-app: kube-proxy-ds-${ARCH}
spec:
priorityClassName: system-node-critical
containers:
- name: kube-proxy
image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy-${ARCH}:${VERSION}
imagePullPolicy: IfNotPresent
command:
- /usr/local/bin/kube-proxy
- --config=/var/lib/kube-proxy/config.conf
- --hostname-override=\$(NODE_NAME)
securityContext:
privileged: true
volumeMounts:
- mountPath: /var/lib/kube-proxy
name: kube-proxy
- mountPath: /run/xtables.lock
name: xtables-lock
readOnly: false
- mountPath: /lib/modules
name: lib-modules
readOnly: true
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
hostNetwork: true
serviceAccountName: kube-proxy
volumes:
- name: kube-proxy
configMap:
name: kube-proxy
- name: xtables-lock
hostPath:
path: /run/xtables.lock
type: FileOrCreate
- name: lib-modules
hostPath:
path: /lib/modules
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- operator: Exists
nodeSelector:
kubernetes.io/arch: ${ARCH}
EOF

到这里基本上kube组件安装就完成了,这时候可以看看各个组件的log有没有明显报错,没有就继续进行网络配置

如果出现大量 system:node:nodeName 相关的权限问题,可以看看system:node这个clusterrolebinding有没有正确设置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:node
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:node
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:nodes

网络和基础容器

calico

这个calico也是挺稳的,用了很久,如果在腾讯云需要配置IPIP Always模式,配置可以在官网下载
使用kubernetes api datastore,小于50节点的配置manifest

1
curl https://raw.githubusercontent.com/projectcalico/calico/v3.27.4/manifests/calico.yaml -O

文件很长,主要修改的地方是

  • CALICO_IPV4POOL_CIDR 改为172.18.0.0/16
  • CALICO_IPV4POOL_IPIP 改为Always,CrossSubnet是跨网络才使用IPIP封装

改好后kubectl apply就行,之后通过ping pod ip确认网络打通
如果有问题可以通过ip route,ip addr等命令查看路由是否正确

coredns

corends是k8s的重要组件,负责集群内dns解析,coredns有很多plugins,除了kubernetes外,hosts和rewrite等也很有用,可以看官网解析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
apiVersion: v1
kind: ServiceAccount
metadata:
name: coredns
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:coredns
rules:
- apiGroups:
- ""
resources:
- endpoints
- services
- pods
- namespaces
verbs:
- list
- watch
- apiGroups:
- discovery.k8s.io
resources:
- endpointslices
verbs:
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:coredns
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:coredns
subjects:
- kind: ServiceAccount
name: coredns
namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
ready :8181
}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: coredns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/name: "CoreDNS"
app.kubernetes.io/name: coredns
spec:
# replicas: not specified here:
# 1. Default is 1.
# 2. Will be tuned in real time if DNS horizontal auto-scaling is turned on.
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
k8s-app: kube-dns
app.kubernetes.io/name: coredns
template:
metadata:
labels:
k8s-app: kube-dns
app.kubernetes.io/name: coredns
spec:
priorityClassName: system-cluster-critical
serviceAccountName: coredns
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
nodeSelector:
kubernetes.io/os: linux
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values: ["kube-dns"]
topologyKey: kubernetes.io/hostname
containers:
- name: coredns
image: ccr.ccs.tencentyun.com/bxrapp/coredns:1.11.1
imagePullPolicy: IfNotPresent
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
args: [ "-conf", "/etc/coredns/Corefile" ]
volumeMounts:
- name: config-volume
mountPath: /etc/coredns
readOnly: true
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- containerPort: 9153
name: metrics
protocol: TCP
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- NET_BIND_SERVICE
drop:
- all
readOnlyRootFilesystem: true
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /ready
port: 8181
scheme: HTTP
dnsPolicy: Default
volumes:
- name: config-volume
configMap:
name: coredns
items:
- key: Corefile
path: Corefile
---
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
annotations:
prometheus.io/port: "9153"
prometheus.io/scrape: "true"
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "CoreDNS"
app.kubernetes.io/name: coredns
spec:
selector:
k8s-app: kube-dns
app.kubernetes.io/name: coredns
clusterIP: 192.168.0.2
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
- name: metrics
port: 9153
protocol: TCP
  • 需要在Corefile加上 ready :8181,不然deployment的readiness检查会不通过

local dns cache

本地dns缓存主要是能分担主DNS压力,还有避免 netfilter 做 DNAT 导致 conntrack 冲突引发DNS 5 秒延时
原理是使用DaemonSet在每个worker节点上起一个CoreDns,pod通过Kubelet或者在Pod注入DNSConfig 配置修改成本地local dns,解析dns先从local dns查找缓存,缓存不中再回归到主DNS解析。

参考
本地 DNS 缓存,imroc.cc
腾讯在 TKE 集群中使用 NodeLocal DNS Cache

镜像底层库 DNS 解析行为默认使用 UDP 在同一个 socket 并发 A 和 AAAA 记录请求,由于 UDP 无状态,两个请求可能会并发创建 conntrack 表项,如果最终 DNAT 成同一个集群 DNS 的 Pod IP 就会导致 conntrack 冲突,由于 conntrack 的创建和插入是不加锁的,最终后面插入的 conntrack 表项就会被丢弃,从而请求超时,默认 5s 后重试,造成现象就是 DNS 5 秒延时; 底层库是 glibc 的容器镜像可以通过配 resolv.conf 参数来控制 DNS 解析行为,不用 TCP 或者避免相同五元组并发(使用串行解析 A 和 AAAA 避免并发或者使用不同 socket 发请求避免相同源端口),但像基于 alpine 镜像的容器由于底层库是 musl libc,不支持这些 resolv.conf 参数,也就无法规避,所以最佳方案还是使用本地 DNS 缓存。

创建sa和svc

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
name: node-local-dns
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: v1
kind: Service
metadata:
name: kube-dns-upstream
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "KubeDNSUpstream"
spec:
ports:
- name: dns
port: 53
protocol: UDP
targetPort: 53
- name: dns-tcp
port: 53
protocol: TCP
targetPort: 53
selector:
k8s-app: kube-dns
EOF

创建DaemonSet

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
UPSTREAM_CLUSTER_IP=$(kubectl -n kube-system get services kube-dns-upstream -o jsonpath="{.spec.clusterIP}")
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: node-local-dns
namespace: kube-system
labels:
addonmanager.kubernetes.io/mode: Reconcile
data:
Corefile: |
cluster.local:53 {
errors
cache {
success 9984 30
denial 9984 5
}
reload
loop
bind 169.254.20.10
forward . ${UPSTREAM_CLUSTER_IP} {
force_tcp
}
prometheus :9253
health 169.254.20.10:8080
}
in-addr.arpa:53 {
errors
cache 30
reload
loop
bind 169.254.20.10
forward . ${UPSTREAM_CLUSTER_IP} {
force_tcp
}
prometheus :9253
}
ip6.arpa:53 {
errors
cache 30
reload
loop
bind 169.254.20.10
forward . ${UPSTREAM_CLUSTER_IP} {
force_tcp
}
prometheus :9253
}
.:53 {
errors
cache 30
reload
loop
bind 169.254.20.10
forward . /etc/resolv.conf {
force_tcp
}
prometheus :9253
}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-local-dns
namespace: kube-system
labels:
k8s-app: node-local-dns
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
spec:
updateStrategy:
rollingUpdate:
maxUnavailable: 10%
selector:
matchLabels:
k8s-app: node-local-dns
template:
metadata:
labels:
k8s-app: node-local-dns
spec:
priorityClassName: system-node-critical
serviceAccountName: node-local-dns
hostNetwork: true
dnsPolicy: Default # Don't use cluster DNS.
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
containers:
- name: node-cache
image: ccr.ccs.tencentyun.com/bxrapp/k8s-dns-node-cache:1.23.1
resources:
requests:
cpu: 25m
memory: 5Mi
args: [ "-localip", "169.254.20.10", "-conf", "/etc/Corefile", "-upstreamsvc", "kube-dns-upstream" ]
securityContext:
privileged: true
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- containerPort: 9253
name: metrics
protocol: TCP
livenessProbe:
httpGet:
host: 169.254.20.10
path: /health
port: 8080
initialDelaySeconds: 60
timeoutSeconds: 5
volumeMounts:
- mountPath: /run/xtables.lock
name: xtables-lock
readOnly: false
- name: config-volume
mountPath: /etc/coredns
- name: kube-dns-config
mountPath: /etc/kube-dns
volumes:
- name: xtables-lock
hostPath:
path: /run/xtables.lock
type: FileOrCreate
- name: kube-dns-config
configMap:
name: kube-dns
optional: true
- name: config-volume
configMap:
name: node-local-dns
items:
- key: Corefile
path: Corefile.base
EOF

在所有worker的kubeleConfig修改–cluster-dns,重启

1
2
sed -i 's/192.168.0.2/169.254.20.10/g' kubeleConfig
sudo systemctl restart kubelet

我们在之前配置已经写好了就不需要改

完成之后我们在pod里面nslookup可以看到dns server已经修改成169.254.20.10

kubelet-csr-approver

这个是自动审批kubelet serving cert的组件,有了它才能实现自动审批和轮换kubelet server证书,部署和配置看GitHub
主要配置有

  • PROVIDER_REGEX hostname正则规则
  • PROVIDER_IP_PREFIXES worker节点的ip网段,需要dns正确解析hostname

审批成功可以看到有日志

1
{"level":"INFO","ts":"2024-08-19T09:44:04.648Z","caller":"controller/csr_controller.go:169","msg":"CSR approved","controller":"certificatesigningrequest","controllerGroup":"certificates.k8s.io","controllerKind":"CertificateSigningRequest","CertificateSigningRequest":{"name":"csr-w9x65"},"namespace":"","name":"csr-w9x65","reconcileID":"f0deda4b-9a4f-4bae-8491-f0ff019cdf3d"}

metrics server

metrics-server是apiserver的重要扩展api,kubectl top和HPA需要依赖metrics-server,在apiserver中设置了aggregation层认证的x509证书就是用于扩展api鉴权的。
使用清单安装:

1
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

运行之后可能有2个问题

  1. apiserver访问不了metrics-server,因为我们apiserver不能访问pod ip,需要加上hostNetwork: true,并且修改–secure-port=10251避免冲突;或者在master节点安装kube-proxy,calico,但是太麻烦。
  2. kubelet x509证书问题,设置kubeletConfig中的参数serverTLSBootstrap: true,正确审批证书。或者简单的在metrics-server启动参数设置–kubelet-insecure-tls。参考Zeng Xu’s BLOG
Contents
  1. 1. 服务器规划
  2. 2. 容器环境
    1. 2.1. containerd
    2. 2.2. crictl
  3. 3. master部分
    1. 3.1. kubelet standalone
    2. 3.2. 证书
    3. 3.3. etcd集群
    4. 3.4. kube-apiserver
    5. 3.5. kubectl
    6. 3.6. kube-controller-manager
    7. 3.7. kube-scheduler
  4. 4. worker节点部分
    1. 4.1. 服务器优化和基础软件安装
      1. 4.1.1. ulimit
      2. 4.1.2. sysctl.conf
    2. 4.2. kubelet
    3. 4.3. kube-proxy
  5. 5. 网络和基础容器
    1. 5.1. calico
    2. 5.2. coredns
    3. 5.3. local dns cache
    4. 5.4. kubelet-csr-approver
    5. 5.5. metrics server