0%

问题背景

这个问题[1]定位到VMWare虚拟化层面后就一直搁置下来了,但考虑到测试环境存在大量虚拟机部署的情况,并且支持虚拟化部署也是早晚的事,所以后续也在一直关注VMWareRedhat的相关资料。

原因分析

幸运的是,从Redhat的官网里还真的找到了相关问题[2],通过官网说明了解到,该问题出现的环境是Redhat 8.38.4,使用vmxnet3的适配器,并且使用UDP隧道协议,比如vxlanGRE

官方给出的解决方案是:

  1. 升级Redhat8.5 (kernel-4.18.0-348.el8)及以后版本。
  2. 升级VMware ESXi6.7P077.0U3 (7.0.3) 及以后版本。

上述更新包括对vmxnet3 NIC不支持的隧道禁用 tx-checksum-ip-generic 的逻辑,因此最终结果与下面的解决方法相同。

1
ethtool -K DEVNAME tx-checksum-ip-generic off

但是,实测结果显示,升级RedhatVMware均无法解决,而临时禁用的命令是可以的。考虑到临时命令还涉及到持久化的问题,还是需要另找方法。

既然使用vmxnet3网卡不行,是不是能换网卡类型?果然,根据这个思路,继续查找资料[3],发现使用E1000类型的网卡可以解决该问题,实测结果也符合预期。

解决方案

临时方案:在创建虚拟机时,把网络适配器的类型改为 E1000E1000e

永久方案:依然需要等待VMWareRedhat的官方修复。

参考资料

  1. https://lyyao09.github.io/2022/06/05/k8s/K8S%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5-VMWare%E8%99%9A%E6%8B%9F%E5%8C%96%E7%8E%AF%E5%A2%83%E4%B8%8BPod%E8%B7%A8VXLAN%E9%80%9A%E4%BF%A1%E5%BC%82%E5%B8%B8/
  2. https://access.redhat.com/solutions/5881451
  3. https://zhangguanzhang.github.io/2022/07/28/redhat84-vxlan-esxi

问题背景

为了验证最新版本的k8s是否已修复某个bug,需要快速搭建一个k8s环境,本文选取资料[1]中的kubeasz工具,并记录部署过程及相关问题。

部署过程

先下载工具脚本、kubeasz代码、二进制、默认容器镜像。

使用如下命令开始安装:

1
2
3
4
5
6
7
8
9
10
11
12
[root@node01 k8s]# ./ezdown -S
2023-03-22 13:39:40 INFO Action begin: start_kubeasz_docker
2023-03-22 13:39:41 INFO try to run kubeasz in a container
2023-03-22 13:39:41 DEBUG get host IP: 10.10.11.49
2023-03-22 13:39:41 DEBUG generate ssh key pair
# 10.10.11.49 SSH-2.0-OpenSSH_6.6.1
f1b442b7fdaf757c7787536b17d12d76208a2dd7884d56fbd1d35817dc2e94ca
2023-03-22 13:39:41 INFO Action successed: start_kubeasz_docker

[root@node01 k8s]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f1b442b7fdaf easzlab/kubeasz:3.5.0 "sleep 36000" 15 seconds ago Up 14 seconds kubeasz

执行后看不出是成功,还是失败。根据文档说明,进入容器内手动执行命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[root@node01 ~]# docker exec -it kubeasz ezctl start-aio
2023-03-22 06:15:05 INFO get local host ipadd: 10.10.11.49
2023-03-22 06:15:05 DEBUG generate custom cluster files in /etc/kubeasz/clusters/default
2023-03-22 06:15:05 DEBUG set versions
2023-03-22 06:15:05 DEBUG disable registry mirrors
2023-03-22 06:15:05 DEBUG cluster default: files successfully created.
2023-03-22 06:15:05 INFO next steps 1: to config '/etc/kubeasz/clusters/default/hosts'
2023-03-22 06:15:05 INFO next steps 2: to config '/etc/kubeasz/clusters/default/config.yml'
ansible-playbook -i clusters/default/hosts -e @clusters/default/config.yml playbooks/90.setup.yml
2023-03-22 06:15:05 INFO cluster:default setup step:all begins in 5s, press any key to abort:

PLAY [kube_master,kube_node,etcd,ex_lb,chrony] **********************************************************************************************************************************************************

TASK [Gathering Facts] **********************************************************************************************************************************************************************************
fatal: [10.10.11.49]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: root@10.10.11.49: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).", "unreachable": true}

PLAY RECAP **********************************************************************************************************************************************************************************************
10.10.11.49 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0

从日志看,提示权限有问题。实际测试可以正常的ssh免密登录:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
bash-5.1# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
/root/.ssh/id_rsa already exists.
Overwrite (y/n)?
bash-5.1# ssh-copy-id root@10.10.11.49
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
expr: warning: '^ERROR: ': using '^' as the first character
of a basic regular expression is not portable; it is ignored
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@10.10.11.49's password:

Number of key(s) added: 1

Now try logging into the machine, with: "ssh 'root@10.10.11.49'"
and check to make sure that only the key(s) you wanted were added.

bash-5.1# ssh root@10.10.11.49
root@10.10.11.49's password:

查看相关配置文件,权限正常:

1
2
3
4
5
6
[root@node01 kubeasz]# ll ~/.ssh
total 16
-rw------- 1 root root 1752 Mar 22 14:25 authorized_keys
-rw------- 1 root root 2602 Mar 22 14:25 id_rsa
-rw-r--r-- 1 root root 567 Mar 22 14:25 id_rsa.pub
-rw-r--r-- 1 root root 1295 Mar 22 13:39 known_hosts

不清楚具体哪里有问题,参考资料[2],尝试改为用用户名密码执行。

在容器内配置用户密码,检查通过:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
bash-5.1# vi /etc/ansible/hosts
[webservers]
10.10.11.49

[webservers:vars]
ansible_ssh_pass='******'
ansible_ssh_user='root'

bash-5.1# ansible webservers -m ping
10.10.11.49 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python"
},
"changed": false,
"ping": "pong"
}

修改安装集群依赖的clusters/default/hosts文件,同样增加用户密码配置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[etcd]
10.10.11.49

[etcd:vars]
ansible_ssh_pass='******'
ansible_ssh_user='root'

# master node(s)
[kube_master]
10.10.11.49

[kube_master:vars]
ansible_ssh_pass='******'
ansible_ssh_user='root'


# work node(s)
[kube_node]
10.10.11.49

[kube_node:vars]
ansible_ssh_pass='******'
ansible_ssh_user='root'

执行命令,提示缺少sshpass工具:

1
2
3
4
5
6
7
8
9
10
11
[root@node01 kubeasz]# docker exec -it kubeasz ezctl setup default all
ansible-playbook -i clusters/default/hosts -e @clusters/default/config.yml playbooks/90.setup.yml
2023-03-22 07:35:46 INFO cluster:default setup step:all begins in 5s, press any key to abort:

PLAY [kube_master,kube_node,etcd,ex_lb,chrony] **********************************************************************************************************************************************************

TASK [Gathering Facts] **********************************************************************************************************************************************************************************
fatal: [10.10.11.49]: FAILED! => {"msg": "to use the 'ssh' connection type with passwords, you must install the sshpass program"}

PLAY RECAP **********************************************************************************************************************************************************************************************
10.10.11.49 : ok=0 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0

安装sshpass依赖包:

1
2
3
4
5
6
bash-5.1# apk add sshpass
fetch https://dl-cdn.alpinelinux.org/alpine/v3.16/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.16/community/x86_64/APKINDEX.tar.gz
(1/1) Installing sshpass (1.09-r0)
Executing busybox-1.35.0-r17.trigger
OK: 21 MiB in 47 packages

重复执行命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[root@node01 kubeasz]# docker exec -it kubeasz ezctl setup default all
ansible-playbook -i clusters/default/hosts -e @clusters/default/config.yml playbooks/90.setup.yml
2023-03-22 07:36:37 INFO cluster:default setup step:all begins in 5s, press any key to abort:

...

TASK [kube-node : 轮询等待kube-proxy启动] *********************************************************************************************************************************************************************
changed: [10.10.11.49]
FAILED - RETRYING: 轮询等待kubelet启动 (4 retries left).
FAILED - RETRYING: 轮询等待kubelet启动 (3 retries left).
FAILED - RETRYING: 轮询等待kubelet启动 (2 retries left).
FAILED - RETRYING: 轮询等待kubelet启动 (1 retries left).

TASK [kube-node : 轮询等待kubelet启动] ************************************************************************************************************************************************************************
fatal: [10.10.11.49]: FAILED! => {"attempts": 4, "changed": true, "cmd": "systemctl is-active kubelet.service", "delta": "0:00:00.014621", "end": "2023-03-22 15:42:07.230186", "msg": "non-zero return code", "rc": 3, "start": "2023-03-22 15:42:07.215565", "stderr": "", "stderr_lines": [], "stdout": "activating", "stdout_lines": ["activating"]}

PLAY RECAP **********************************************************************************************************************************************************************************************
10.10.11.49 : ok=85 changed=78 unreachable=0 failed=1 skipped=123 rescued=0 ignored=0
localhost : ok=33 changed=30 unreachable=0 failed=0 skipped=11 rescued=0 ignored=0

kubelet阶段失败,查看kubelet服务:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[root@node01 log]# service kubelet status -l
Redirecting to /bin/systemctl status -l kubelet.service
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Wed 2023-03-22 15:56:31 CST; 1s ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Process: 147581 ExecStart=/opt/kube/bin/kubelet --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd/containerd.sock --hostname-override=10.10.11.49 --kubeconfig=/etc/kubernetes/kubelet.kubeconfig --root-dir=/var/lib/kubelet --v=2 (code=exited, status=1/FAILURE)
Main PID: 147581 (code=exited, status=1/FAILURE)

Mar 22 15:56:31 node01 kubelet[147581]: I0322 15:56:31.719832 147581 manager.go:228] Version: {KernelVersion:3.10.0-862.11.6.el7.x86_64 ContainerOsVersion:CentOS Linux 7 (Core) DockerVersion: DockerAPIVersion: CadvisorVersion: CadvisorRevision:}
Mar 22 15:56:31 node01 kubelet[147581]: I0322 15:56:31.720896 147581 server.go:659] "--cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /"
Mar 22 15:56:31 node01 kubelet[147581]: I0322 15:56:31.721939 147581 container_manager_linux.go:267] "Container manager verified user specified cgroup-root exists" cgroupRoot=[]
Mar 22 15:56:31 node01 kubelet[147581]: I0322 15:56:31.722392 147581 container_manager_linux.go:272] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName:
Mar 22 15:56:31 node01 kubelet[147581]: I0322 15:56:31.722503 147581 topology_manager.go:134] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container"
Mar 22 15:56:31 node01 kubelet[147581]: I0322 15:56:31.722609 147581 container_manager_linux.go:308] "Creating device plugin manager"
Mar 22 15:56:31 node01 kubelet[147581]: I0322 15:56:31.722689 147581 manager.go:125] "Creating Device Plugin manager" path="/var/lib/kubelet/device-plugins/kubelet.sock"
Mar 22 15:56:31 node01 kubelet[147581]: I0322 15:56:31.722763 147581 server.go:66] "Creating device plugin registration server" version="v1beta1" socket="/var/lib/kubelet/device-plugins/kubelet.sock"
Mar 22 15:56:31 node01 kubelet[147581]: I0322 15:56:31.722905 147581 state_mem.go:36] "Initialized new in-memory state store"
Mar 22 15:56:31 node01 kubelet[147581]: E0322 15:56:31.726502 147581 run.go:74] "command failed" err="failed to run Kubelet: validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"

根据日志报错,参考资料[3],删除 /etc/containerd/config.toml 文件并重启 containerd 即可:

1
2
mv /etc/containerd/config.toml /root/config.toml.bak
systemctl restart containerd

重复执行命令,后台查看发现calico-node启动失败,查看日志如下:

1
2
3
4
5
6
7
8
9
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 41s default-scheduler Successfully assigned kube-system/calico-node-rqpjm to 10.10.11.49
Normal Pulling 20s (x2 over 31s) kubelet Pulling image "easzlab.io.local:5000/calico/cni:v3.23.5"
Warning Failed 19s (x2 over 31s) kubelet Failed to pull image "easzlab.io.local:5000/calico/cni:v3.23.5": rpc error: code = Unknown desc = failed to pull and unpack image "easzlab.io.local:5000/calico/cni:v3.23.5": failed to resolve reference "easzlab.io.local:5000/calico/cni:v3.23.5": failed to do request: Head "https://easzlab.io.local:5000/v2/calico/cni/manifests/v3.23.5": http: server gave HTTP response to HTTPS client
Warning Failed 19s (x2 over 31s) kubelet Error: ErrImagePull
Normal BackOff 5s (x2 over 30s) kubelet Back-off pulling image "easzlab.io.local:5000/calico/cni:v3.23.5"
Warning Failed 5s (x2 over 30s) kubelet Error: ImagePullBackOff

查看docker层面配置,并测试拉起镜像正常:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[root@node01 ~]# cat /etc/docker/daemon.json
{
"max-concurrent-downloads": 10,
"insecure-registries": ["easzlab.io.local:5000"],
"log-driver": "json-file",
"log-level": "warn",
"log-opts": {
"max-size": "10m",
"max-file": "3"
},
"data-root":"/var/lib/docker"
}

[root@node01 log]# docker pull easzlab.io.local:5000/calico/cni:v3.23.5
v3.23.5: Pulling from calico/cni
Digest: sha256:9c5055a2b5bc0237ab160aee058135ca9f2a8f3c3eee313747a02edcec482f29
Status: Image is up to date for easzlab.io.local:5000/calico/cni:v3.23.5
easzlab.io.local:5000/calico/cni:v3.23.5

查看containerd层面,并测试拉起镜像也正常:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@node01 log]# ctr image pull --plain-http=true easzlab.io.local:5000/calico/cni:v3.23.5
easzlab.io.local:5000/calico/cni:v3.23.5: resolved |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:9c5055a2b5bc0237ab160aee058135ca9f2a8f3c3eee313747a02edcec482f29: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:cc0e45adf05a30a90384ba7024dbabdad9ae0bcd7b5a535c28dede741298fea3: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:47c5dbbec31222325790ebad8c07d270a63689bd10dc8f54115c65db7c30ad1f: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:8efc3d73e2741a93be09f68c859da466f525b9d0bddb1cd2b2b633f14f232941: done |++++++++++++++++++++++++++++++++++++++|
config-sha256:1c979d623de9aef043cb4ff489da5636d61c39e30676224af0055240e1816382: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:4c98a4f67c5a7b1058111d463051c98b23e46b75fc943fc2535899a73fc0c9f1: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:51729c6e2acda05a05e203289f5956954814d878f67feb1a03f9941ec5b4008b: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:050b055d5078c5c6ad085d106c232561b0c705aa2173edafd5e7a94a1e908fc5: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:7430548aa23e56c14da929bbe5e9a2af0f9fd0beca3bd95e8925244058b83748: done |++++++++++++++++++++++++++++++++++++++|
elapsed: 3.1 s total: 103.0 (33.2 MiB/s)
unpacking linux/amd64 sha256:9c5055a2b5bc0237ab160aee058135ca9f2a8f3c3eee313747a02edcec482f29...
done: 6.82968396s

根据资料[4],查看containerd配置,并新增私有仓库的配置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[root@node01 ~]# containerd config default > /etc/containerd/config.toml

[root@node01 ~]# vim /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = ""

[plugins."io.containerd.grpc.v1.cri".registry.auths]

[plugins."io.containerd.grpc.v1.cri".registry.configs]

[plugins."io.containerd.grpc.v1.cri".registry.headers]

[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."easzlab.io.local:5000"]
endpoint = ["http://easzlab.io.local:5000"]

[root@node01 ~]# service containerd restart

查看pod状态,又卡在了ContainerCreating状态:

1
2
3
4
5
6
7
8
9
[root@node01 ~]# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-89b744d6c-klzwh 1/1 Running 0 5m35s
kube-system calico-node-wmvff 1/1 Running 0 5m35s
kube-system coredns-6665999d97-mp7xc 0/1 ContainerCreating 0 5m35s
kube-system dashboard-metrics-scraper-57566685b4-8q5fm 0/1 ContainerCreating 0 5m35s
kube-system kubernetes-dashboard-57db9bfd5b-h6jp4 0/1 ContainerCreating 0 5m35s
kube-system metrics-server-6bd9f986fc-njpnj 0/1 ContainerCreating 0 5m35s
kube-system node-local-dns-wz9bg 1/1 Running 0 5m31s

选择一个describe查看:

1
2
3
4
5
6
7
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 6m7s default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
Normal Scheduled 5m47s default-scheduler Successfully assigned kube-system/coredns-6665999d97-mp7xc to 10.10.11.49
Warning FailedCreatePodSandBox 5m46s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "072c164d79f4874a8d851d36115ea04b75a2155dae3cecdc764e923c9f38f86b": plugin type="calico" failed (add): failed to find plugin "calico" in path [/opt/cni/bin]
Normal SandboxChanged 33s (x25 over 5m46s) kubelet Pod sandbox changed, it will be killed and re-created.

从日志看,是cni插件不存在的问题,手动拷贝之后,查看pod状态:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[root@node01 bin]# cd /opt/cni/bin/
[root@node01 bin]# chmod +x *
[root@node01 bin]# ll -h
total 186M
-rwxr-xr-x 1 root root 3.7M Mar 22 17:46 bandwidth
-rwxr-xr-x 1 root root 56M Mar 22 17:46 calico
-rwxr-xr-x 1 root root 56M Mar 22 17:46 calico-ipam
-rwxr-xr-x 1 root root 2.4M Mar 22 17:46 flannel
-rwxr-xr-x 1 root root 3.1M Mar 22 17:46 host-local
-rwxr-xr-x 1 root root 56M Mar 22 17:46 install
-rwxr-xr-x 1 root root 3.2M Mar 22 17:46 loopback
-rwxr-xr-x 1 root root 3.6M Mar 22 17:46 portmap
-rwxr-xr-x 1 root root 3.3M Mar 22 17:46 tuning

[root@node01 bin]# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-89b744d6c-mpfgq 1/1 Running 0 37m
kube-system calico-node-h9sm2 1/1 Running 0 37m
kube-system coredns-6665999d97-8pdbd 1/1 Running 0 37m
kube-system dashboard-metrics-scraper-57566685b4-c2l8w 1/1 Running 0 37m
kube-system kubernetes-dashboard-57db9bfd5b-74lmb 1/1 Running 0 37m
kube-system metrics-server-6bd9f986fc-d9crl 1/1 Running 0 37m
kube-system node-local-dns-kvgv6 1/1 Running 0 37m

部署完成。

参考资料

https://github.com/easzlab/kubeasz/blob/master/docs/setup/quickStart.md

https://www.jianshu.com/p/c48b4a24c7d4

https://www.cnblogs.com/immaxfang/p/16721407.html

https://github.com/containerd/containerd/issues/4938

背景

说起开发工具Goland,做Go语言开发的同学应该都不陌生,但由于大部分同学的电脑资源有限,尤其是公司里配备的电脑,在本地使用Goland多多少少有些不够顺畅。

如果公司内服务器资源充足,再加上容器化技术的加持,把Goland以容器的形式部署在服务器上运行是一个不错的解决方法。带着这个想法搜索资料[1]发现,Goland 官方还真的开发了容器版,并且提供网页和客户端两种登录方式。下面给出内网环境下的配置步骤及采坑记录:

注:

  1. 以下操作同样适用于Jetbrains旗下的其他开发工具,比如IDEA
  2. Goland的注册方法不在本次教程讨论范围之内,请自行解决。

镜像获取

Goland网页版功能是jetbrains官方[2]提供的Docker镜像,所以内网配置的前提是先从外网拉取到需要的镜像,然后导出镜像包并拷贝到内网中:

1
2
docker pull registry.jetbrains.team/p/prj/containers/projector-goland
docker save -o projector-goland.tar registry.jetbrains.team/p/prj/containers/projector-goland

注:如果无法拉取官方的镜像,可以在公众号后台回复 docker goland 即可获取goland 网页版镜像。

服务启动

拿到镜像后,找一个安装了docker的服务器或虚机,使用docker run命令启动:

1
2
3
4
5
6
7
8
9
10
11
docker run -itd \
-u root \
-p 8887:8887 \
--net=host \
--privileged \
-v /home/admin/goland-dir:/root \
-v /etc/localtime:/etc/localtime \
-v /home/admin/goland-dir/sources.list:/etc/apt/sources.list \
--name goland \
--restart always \
registry.jetbrains.team/p/prj/containers/projector-goland

重要)部分参数说明:

  1. 指定用户:可选,默认不指定用户,容器启动时会使用一个非root用户projector-user,这里使用root用户启动是为了避免后续操作的权限问题;
  2. 指定主机网络:可选,方便使用代理拉取代码,没有代理的话先从外网下载也可以;
  3. 指定特权模式:可选,方便调试,没有开启的话直接使用GoLand调试会提示权限问题;
  4. 挂载点1:必选,默认用户下,将/home/projector-user挂载到本地,root用户下直接将root目录挂载到本地;
  5. 挂载点2:可选,保持容器时间与主机时间一致;
  6. 挂载点3:可选,配置内网依赖源,方便下载gcc等编译所需的依赖;

浏览器访问

容器正常启动后,在浏览器中通过http://x.x.x.x:8887地址登录网页版GoLand

客户端访问

如果不习惯使用浏览器,官方还提供了原生客户端,我们通过地址[3]下载,打开后输入地址即可。

导入项目示例

以导入K8S源码为例,登录到容器内,使用git命令拉取kubernetes源码:

1
2
3
projector-user@storage:~/go/src/github.com$ git clone https://github.com/kubernetes/kubernetes.git
Cloning into 'kubernetes'...
fatal: unable to access 'https://github.com/kubernetes/kubernetes.git/': server certificate verification failed. CAfile: none CRLfile: none

拉取失败,提示CA证书问题,通过以下命令解决:

1
git config --global http.sslVerify false

又拉取失败:

1
2
3
4
5
projector-user@storage:~/go/src/github.com$ git clone https://github.com/kubernetes/kubernetes.git
Cloning into 'kubernetes'...
fatal: unable to update url base from redirection:
asked for: https://github.com/kubernetes/kubernetes.git/info/refs?service=git-upload-pack
redirect: http://x.x.x.x/proxy.html?template=default&tabs=pwd&vlanid=0&url=https://github.com%2Fkubernetes%2Fkubernetes.git%2Finfo%2Frefs%3Fservice%3Dgit-upload-pack

因为未配置代理,通过以下命令解决:

1
2
设置:git config --global http.proxy http://user:password@http://x.x.x.xx:8080
查看:git config --get --global http.proxy

注:密码中如果存在特殊字符,请先转义。

再次尝试拉取,拉取成功:

1
2
3
4
5
6
7
8
9
projector-user@storage:~/go/src/github.com$ git clone https://github.com/kubernetes/kubernetes.git
Cloning into 'kubernetes'...
remote: Enumerating objects: 1258541, done.
remote: Counting objects: 100% (316/316), done.
remote: Compressing objects: 100% (201/201), done.
remote: Total 1258541 (delta 131), reused 150 (delta 111), pack-reused 1258225
Receiving objects: 100% (1258541/1258541), 773.55 MiB | 805.00 KiB/s, done.
Resolving deltas: 100% (906256/906256), done.
Checking out files: 100% (23196/23196), done.

常见问题

1. 复制粘贴问题

根据参考资料[4],可以通过设置环境变量ORG_JETBRAINS_PROJECTOR_SERVER_SSL_PROPERTIES_PATH解决:

1
docker run -e ORG_JETBRAINS_PROJECTOR_SERVER_SSL_PROPERTIES_PATH=/root/ssl/ssl.properties ...

ssl的配置文件举例:

1
2
3
4
STORE_TYPE=JKS
FILE_PATH=/root/ssl/keystore
STORE_PASSWORD=xxx
KEY_PASSWORD=xxx

通过查看启动日志确认ssl是否配置成功,如下日志所示,WebSocket SSL is enabled: /root/ssl/ssl.properties表示配置成功,此时在浏览器用https://xxx:8887/?wss访问即可。

1
2
3
4
5
6
7
8
9
10
11
12
13
Found IDE: goland
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
[DEBUG] :: IdeState :: Starting attempts to Init ProjectorClassLoader
[DEBUG] :: IdeState :: Starting attempts to attach IJ injector agent
[DEBUG] :: IdeState :: Starting attempts to initialize IDEA: fix AA and disable smooth scrolling (at start)
[DEBUG] :: IdeState :: Starting attempts to Getting IDE colors
[DEBUG] :: ProjectorServer :: Daemon thread starts
[DEBUG] :: IdeState :: Starting attempts to search for editors
[INFO] :: ProjectorServer :: ProjectorServer is starting on host 0.0.0.0/0.0.0.0 and port 8887
[INFO] :: HttpWsServerBuilder :: WebSocket SSL is enabled: /root/ssl/ssl.properties
[INFO] :: HttpWsServer :: Server started on host 0.0.0.0/0.0.0.0 and port 8887
[DEBUG] :: IdeState :: "Init ProjectorClassLoader" is done
[DEBUG] :: IdeState :: "search for editors" is done

登录后再次尝试,又可以快乐的Ctrl+CCtrl+V了。

2. 自定义Keymap被重置问题

根据参考资料[4],可以通过设置环境变量ORG_JETBRAINS_PROJECTOR_SERVER_AUTO_KEYMAP=false解决:

1
docker run -e ORG_JETBRAINS_PROJECTOR_SERVER_AUTO_KEYMAP=false ...

登录后再观察,发现自定义的keymap不会神奇的恢复了。

3. 原生客户端全局搜索结果模糊问题

模糊部分刚好是搜索的字符串,使用起来问题也不大,如果忍不了,也可以暂时使用浏览器开心玩耍(浏览器下没有该问题)。

更新:v1.0.2版本已修复该问题

参考资料

  1. https://fuckcloudnative.io/posts/run-jetbrains-ide-in-docker/
  2. https://github.com/JetBrains/projector-docker
  3. https://github.com/JetBrains/projector-client/releases
  4. https://jetbrains.github.io/projector-client/mkdocs/latest/ij_user_guide/server_customization/

问题背景

继《Karaf框架升级Lg4j历程》之后,今天又接到通知,需要将版本再升级到2.18.0,据说还是因为漏洞问题。网上查找,未发现有爆出什么漏洞,只找到了一个腾讯发布的相关通知《Apache Log4j官网普通更新》。

分析过程

有了前文的分析,我们知道,要解决漏洞,有三种升级方式:

  1. 升级框架:这个影响就比较大了,而且框架的版本发布周期比较慢,目前还没有编译好的框架包,要升级框架就需要自己编译出所有的框架包,风险较大;
  2. 升级依赖包:影响较小,如果没有配置依赖包的地方,可能无法升级;(实际确认,无法单独升级)
  3. 修改当前版本依赖包并重新编译:影响较小,如果与最新版本跨度较大,可能修改点会很多;

综合比较,继续考虑使用第3个方案走走看,有了前文的经验,就直接修改依赖包版本到2.18.0

1
2
3
4
5
6
7
8
9
pom.xml
<version.org.apache.felix.configadmin>1.9.20</version.org.apache.felix.configadmin>
<version.org.apache.felix.framework>5.6.12</version.org.apache.felix.framework>
<version.org.apache.felix6.framework>6.0.3</version.org.apache.felix6.framework>
- <version.org.apache.logging.log4j>2.17.0</version.org.apache.logging.log4j>
+ <version.org.apache.logging.log4j>2.18.0</version.org.apache.logging.log4j>
<version.org.apache.servicemix.bundles.javax-inject>1_3</version.org.apache.servicemix.bundles.javax-inject>
<version.org.jboss.logging>3.4.1.Final</version.org.jboss.logging>
<version.org.mockito>3.7.7</version.org.mockito>

编译:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
root@895be557c3cd:/home/org.ops4j.pax.logging-logging-1.11.9# mvn clean install -Dmaven.test.skip=true
...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] OPS4J Pax Logging (Build POM) 1.11.9 ............... SUCCESS [ 2.355 s]
[INFO] OPS4J Pax Logging - API (Java9) .................... SUCCESS [ 2.039 s]
[INFO] OPS4J Pax Logging - API ............................ SUCCESS [ 4.926 s]
[INFO] OPS4J Pax Logging - Log4Jv1 implementation ......... SUCCESS [ 2.235 s]
[INFO] OPS4J Pax Logging - Log4Jv2 implementation ......... SUCCESS [ 3.051 s]
[INFO] OPS4J Pax Logging - Log4j v2 Extra packages ........ SUCCESS [ 0.146 s]
[INFO] OPS4J Pax Logging - Logback implementation ......... SUCCESS [ 0.950 s]
[INFO] OPS4J Pax Logging - Integration Tests .............. SUCCESS [ 0.354 s]
[INFO] OPS4J Pax Logging - Karaf .......................... SUCCESS [ 0.014 s]
[INFO] OPS4J Pax Logging - Karaf KAR Logger ............... SUCCESS [ 0.142 s]
[INFO] OPS4J Pax Logging - Karaf KAR ...................... SUCCESS [ 1.710 s]
[INFO] OPS4J Pax Logging - Karaf Distribution ............. SUCCESS [ 2.522 s]
[INFO] OPS4J Pax Logging - Karaf Integration Tests 1.11.9 . SUCCESS [ 0.703 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 22.711 s
[INFO] Finished at: 2022-10-20T03:50:21Z
[INFO] ------------------------------------------------------------------------

把新编译的pax-logging-apipax-logging-log4j替换到依赖仓库中,重新编译交付件,发现日志功能异常,服务不再打印任何日志了,定位都无从下手;

从参考资料[1]的代码提交记录看,org.ops4j.pax.logging为了升级log4j依赖包,不单单是改了版本,还涉及一些代码修改点,怀疑是有关系的:

1
2
3
4
5
6
7
8
9
10
pax-logging-api/src/main/java/org/apache/logging/log4j/LogManager.java
pax-logging-api/src/main/java/org/apache/logging/log4j/status/StatusLogger.java
pax-logging-api/src/main/java/org/apache/logging/log4j/util/PaxPropertySource.java
pax-logging-it/pom.xml
pax-logging-log4j2/src/main/java/org/apache/logging/log4j/core/config/plugins/util/ResolverUtil.java
pax-logging-log4j2/src/main/java/org/apache/logging/log4j/core/impl/ThrowableProxy.java
pax-logging-log4j2/src/main/java/org/apache/logging/log4j/core/pattern/DatePatternConverter.java
pax-logging-log4j2/src/main/java/org/ops4j/pax/logging/log4j2/internal/PaxLoggingServiceImpl.java
pax-logging-samples/fragment-log4j2/src/main/java/org/ops4j/pax/logging/log4j2/extra/ListAppender.java
pom.xml

对比发现,修改点不多,也不复杂,就尝试将更新的代码移植到1.11.9版本上;

然后使用前文使用过的容器编译环境编译jar包:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
root@895be557c3cd:/home/org.ops4j.pax.logging-logging-1.11.9# mvn clean install -Dmaven.test.skip=true
...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] OPS4J Pax Logging (Build POM) 1.11.9 ............... SUCCESS [ 2.355 s]
[INFO] OPS4J Pax Logging - API (Java9) .................... SUCCESS [ 2.039 s]
[INFO] OPS4J Pax Logging - API ............................ SUCCESS [ 4.926 s]
[INFO] OPS4J Pax Logging - Log4Jv1 implementation ......... SUCCESS [ 2.235 s]
[INFO] OPS4J Pax Logging - Log4Jv2 implementation ......... SUCCESS [ 3.051 s]
[INFO] OPS4J Pax Logging - Log4j v2 Extra packages ........ SUCCESS [ 0.146 s]
[INFO] OPS4J Pax Logging - Logback implementation ......... SUCCESS [ 0.950 s]
[INFO] OPS4J Pax Logging - Integration Tests .............. SUCCESS [ 0.354 s]
[INFO] OPS4J Pax Logging - Karaf .......................... SUCCESS [ 0.014 s]
[INFO] OPS4J Pax Logging - Karaf KAR Logger ............... SUCCESS [ 0.142 s]
[INFO] OPS4J Pax Logging - Karaf KAR ...................... SUCCESS [ 1.710 s]
[INFO] OPS4J Pax Logging - Karaf Distribution ............. SUCCESS [ 2.522 s]
[INFO] OPS4J Pax Logging - Karaf Integration Tests 1.11.9 . SUCCESS [ 0.703 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 23.641 s
[INFO] Finished at: 2022-10-20T03:55:39Z
[INFO] ------------------------------------------------------------------------

升级log4j的版本编译成功。

把新编译的pax-logging-apipax-logging-log4j替换到依赖仓库中,重新编译交付件,发现日志功能正常;

解决方案

把新编译的pax-logging-apipax-logging-log4j替换到依赖仓库中,重新编译交付件,日志功能正常。

参考资料

  1. https://github.com/ops4j/org.ops4j.pax.logging/commit/7c007343fe9844a17e9c6eaae3a833e6c19a579a

漏洞描述

漏洞原理参考资料[1],简单来说就是,当服务器SSL/TLS的瞬时Diffie-Hellman公共密钥小于等于1024位时,存在可以恢复纯文本信息的风险。

复现方法很简单,使用nmap -sV -Pn --script ssl-dh-params port ip 命令扫描[2],存在如下漏洞信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
nmap.exe -sV -Pn --script ssl-dh-params 443 192.168.1.10
Starting Nmap 7.92 ( https://nmap.org ) at 2022-07-09 11:14
Nmap scan report for 192.168.1.10
Host is up (0.0033s latency).
Not shown: 996 closed tcp ports (reset)

| ssl-dh-params:
| VULNERABLE:
| Diffie-Hellman Key Exchange Insufficient Group Strength
| State: VULNERABLE
| Transport Layer Security (TLS) services that use Diffie-Hellman groups
| of insufficient strength, especially those using one of a few commonly
| shared groups, may be susceptible to passive eavesdropping attacks.
| Check results:
| WEAK DH GROUP 1
| Cipher Suite: TLS_DHE_RSA_WITH_AES_256_CBC_SHA256
| Modulus Type: Safe prime
| Modulus Source: RFC2409/Oakley Group 2
| Modulus Length: 1024
| Generator Length: 8
| Public Key Length: 1024
| References:
|_ https://weakdh.org

修复方案

参考[3,4],修改方案如下:

1
2
3
4
[root@node1 etc]# cat org.ops4j.pax.web.cfg
...
# Excluded SSL/TLS Cipher Suites comma-separated list of Regular Expressions
org.ops4j.pax.web.ssl.ciphersuites.excluded=.*NULL.*,.*RC4.*,.*MD5.*,.*DES.*,.*DSS.*,TLS_DHE.*,SSL.*,.*anon.*,.*EXPORT.*

修改后,再次使用nmap -sV -Pn --script ssl-dh-params port ip查看扫描结果,漏洞解决:

1
2
3
4
5
6
7
8
9
10
11
nmap.exe -sV -Pn --script ssl-dh-params 443 192.168.1.10(主机IP)
Starting Nmap 7.92 ( https://nmap.org ) at 2022-07-07 11:53
Nmap scan report for 192.168.1.10
Host is up (0.0032s latency).
Not shown: 997 closed tcp ports (reset
PORT STATE SERVICE VERSION
22/tcp open ssh OpenSSH 7.4 (protocol 2.0)
111/tcp open rpcbind 2-4 (RPC #100000)
...
Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 18.74 seconds

需要注意的是,添加完上面的参数后,可能会出现一个新的问题,扫描结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
nmap.exe -sV --script ssl-enum-ciphers -p 443 192.168.1.10
Starting Nmap 6.40 ( http://nmap.org ) at 2022-08-20 22:26 CST
Nmap scan report for matrix-node1 (192.168.1.10)
Host is up (0.000064s latency).
PORT STATE SERVICE VERSION
443/tcp open https-alt
| ssl-enum-ciphers:
| TLSv1.0:
| ciphers:
| TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA - strong
| ...
| compressors:
| NULL
| TLSv1.1:
| ciphers:
| TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA - strong
| ...
| compressors:
| NULL
| TLSv1.2:
| ciphers:
| TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA - strong
| ...
| compressors:
| NULL
|_ least strength: strong

修改配置之前,扫描结果里显示仅开启了TLSv1.2,而修改配置之后,发现TLSv1.0TLSv1.1都被开启了,这俩协议也是需要关闭:

1
2
3
4
5
[root@node1 etc]# cat org.ops4j.pax.web.cfg
...
# Excluded SSL/TLS Cipher Suites comma-separated list of Regular Expressions
org.ops4j.pax.web.ssl.ciphersuites.excluded=.*NULL.*,.*RC4.*,.*MD5.*,.*DES.*,.*DSS.*,TLS_DHE.*,SSL.*,.*anon.*,.*EXPORT.*
org.ops4j.pax.web.ssl.protocols.excluded=TLSv1,TLSv1.1

参考资料

  1. https://access.redhat.com/zh_CN/articles/1480493
  2. https://www.cnblogs.com/zcg-cpdd/p/15573841.html
  3. https://stackoverflow.com/questions/30523324/how-to-config-local-jetty-ssl-to-avoid-weak-phermeral-dh-key-error
  4. https://github.com/codice/ddf/blob/master/distribution/ddf-common/src/main/resources/etc/org.ops4j.pax.web.cfg

本次发现的问题是踩到了新版本Kubernetes的坑,查找资料发现zhangsi-lzq大佬已经分析的很清楚了,此处转载过来仅做学习记录,并新增永久解决方案。

作者:zhangsi-lzq
本文出处:https://www.cnblogs.com/zhangsi-lzq/p/14292628.html

问题背景

Kubernetes升级为1.20版本后,原有的后端nfs存储storageclass无法自动创建PV。查看PVC状态一直为pending状态。

分析过程

查看nfs-provisioner日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
kubectl logs nfs-client-provisioner-5f696dc8bb-qhmsn
E0118 03:01:07.352670 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=3737, ErrCode=NO_ERROR, debug=""
E0118 03:01:07.353951 1 reflector.go:322] github.com/kubernetes-incubator/external-storage/lib/controller/controller.go:668: Failed to watch *v1.StorageClass: Get https://10.96.0.1:443/apis/storage.k8s.io/v1/storageclasses?resourceVersion=604432&timeoutSeconds=387&watch=true: dial tcp 10.96.0.1:443: connect: connection refused
W0118 03:01:07.366606 1 reflector.go:341] github.com/kubernetes-incubator/external-storage/lib/controller/controller.go:665: watch of *v1.PersistentVolume ended with: too old resource version: 11565 (605173)
W0118 03:01:07.367679 1 reflector.go:341] github.com/kubernetes-incubator/external-storage/lib/controller/controller.go:662: watch of *v1.PersistentVolumeClaim ended with: too old resource version: 11565 (605173)
I0118 03:08:28.340240 1 controller.go:987] provision "default/auth-platorm-redis-data-auth-platorm-redis-cluster-0" class "course-nfs-storage": started
E0118 03:08:28.343582 1 controller.go:1004] provision "default/auth-platorm-redis-data-auth-platorm-redis-cluster-0" class "course-nfs-storage": unexpected error getting claim reference: selfLink was empty, can't make reference
I0118 03:16:08.373590 1 controller.go:987] provision "default/auth-platorm-redis-data-auth-platorm-redis-cluster-0" class "course-nfs-storage": started
E0118 03:16:08.382178 1 controller.go:1004] provision "default/auth-platorm-redis-data-auth-platorm-redis-cluster-0" class "course-nfs-storage": unexpected error getting claim reference: selfLink was empty, can't make reference
I0118 03:30:41.647626 1 controller.go:987] provision "default/test-pvc" class "course-nfs-storage": started
E0118 03:30:41.658419 1 controller.go:1004] provision "default/test-pvc" class "course-nfs-storage": unexpected error getting claim reference: selfLink was empty, can't make reference
I0118 03:31:08.373713 1 controller.go:987] provision "default/auth-platorm-redis-data-auth-platorm-redis-cluster-0" class "course-nfs-storage": started
I0118 03:31:08.373969 1 controller.go:987] provision "default/test-pvc" class "course-nfs-storage": started
E0118 03:31:08.382279 1 controller.go:1004] provision "default/test-pvc" class "course-nfs-storage": unexpected error getting claim reference: selfLink was empty, can't make reference
E0118 03:31:08.382791 1 controller.go:1004] provision "default/auth-platorm-redis-data-auth-platorm-redis-cluster-0" class "course-nfs-storage": unexpected error getting claim reference: selfLink was empty, can't make reference

报错信息selfLink was empty,于是上网查询相关内容,在官方1.20变更说明中[1]看到其中一条说明为:

1
Stop propagating SelfLink (deprecated in 1.16) in kube-apiserver

selfLink1.16版本以后已经弃用,在1.20版本停用。而由于nfs-provisioner的实现是基于selfLink功能(同时也会影响其他用到selfLink这个功能的第三方软件),需要等nfs-provisioner的制作方重新提供新的解决方案。

解决方案

目前可用的临时方案是:修改/etc/kubernetes/manifests/kube-apiserver.yaml文件,找到如下内容后,在最后添加一项参数:

1
2
3
4
5
6
7
spec:
containers:
- command:
- kube-apiserver
- --advertise-address=192.168.210.20
- --.......
- --feature-gates=RemoveSelfLink=false  #添加此行

如果是高可用的k8s集群,则需要在所有master节点上进行此操作。

添加后需要删除apiserver的所有pod进行重启

1
2
3
kubectl delete pod kube-apiserver-master01 -n kube-system
kubectl delete pod kube-apiserver-master02 -n kube-system
kubectl delete pod kube-apiserver-master03 -n kube-system

三台apiserverkubelet重启拉起后,再次查询PVC,可以看到PVC状态都为Bound,可以正常被PV绑定了。

(新增)永久解决方案:查资料发现,nfs-provisioner项目实际上已经放弃[2,3],转而提供了下面3个provisioner,可以根据需要适配更新:

参考资料

  1. https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md
  2. https://github.com/kubernetes-retired/nfs-provisioner
  3. https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner

问题背景

  1. 查看kube-system下的系统组件,发现harbor、influxdb、coredns等组件反复重启;
  2. 使用kubectl get pod -n kube-system命令查看pod列表,发现命令会稳定的卡15s

原因分析

先看第一点,这几个pod反复重启已经遇到过几次,看过前面的问题排查文章[1,2]的应该知道,在业务高并发场景下可能出现。先使用top命令看一下负载情况:

1
2
3
4
5
6
load average: 14.76  18.45  17.85
Tasks: 1998 total, 7 running, 1937 sleeping, 0 stopped, 54 zombie
%CPU: 15.2 us, 8.3 sys, 0.7 ni, 75.3 id

cat /proc/cpuinfo|grep MHz| wc -l
40

实际负载不高,排除这个可能(实际定位过程中,也依据前面的经验,做了相关内核参数的优化,问题确实没什么改善)。那就继续看,先describe看几个异常pod的错误信息:

1
2
3
kubelet     Liveness probe failed: Get "http://177.177.138.139:8885/api/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
kubelet Readiness probe failed: Get "http://177.177.138.139:8083/"": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
kubelet Back-off restarting failed container

错误基本一致,就是kubelet调用pod提供的健康检查接口超时了,所以被不断地kill再重启。为什么调不通?模拟kubelet的调用操作,在pod所在节点上使用curl命令调用,结果显示timeout,再ping一下看看pod通信有没有问题:

1
2
ping 177.177.212.186
^C

ping不通!这也就讲通了为啥健康检查不过,因为kubelet与所在节点上的Pod根本就无法通信。为啥会这样?通过一顿验证,发现一个规律:**集群各节点无法访问自己节点上的pod,但可以访问其他节点上的pod**;

这个现象是比较奇怪的,一般来说影响节点与自己节点上pod通信的原因不多,对于使用caliccni网络插件来说,可能的原因有:

  1. pod内的ip/arp/路由异常;
  2. calico网卡的arp配置异常;
  3. 请求被iptables拦截;

分别查看1,2相关配置,显示正常:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[root@influxdb-847b74cbc5-ddxcd]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
3: eth0@if79: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 32:81:0e:f4:dd:3a brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 177.177.212.186/32 scope global eth0
valid_lft f

[root@influxdb-847b74cbc5-ddxcd]# arp
Address HWtype HWaddress Flags Mask Iface
gateway ether ee:ee:ee:ee:ee:ee C eth0

[root@influxdb-847b74cbc5-ddxcd]# ip rou
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0 scope link

[root@node]# cat /proc/sys/net/ipv4/conf/cali7acfda72e71/proxy_arp
1
[root@node]# cat /proc/sys/net/ipv4/conf/cali7acfda72e71/arp_ignore
0

对于第三点,通过iptables命令检查过,也没什么异常规则;

抓包看看啥情况:

  1. 在节点上ping本节点上的pod,同时在cali7acfda72e71上抓包,发现请求到了cali网卡,但没有响应;
  2. podping本节点ip,同时在cali7acfda72e71上抓包,发现cali网卡上有请求和响应,但依然无法ping通;

看起来像是请求被主动丢弃了,跟问题提出人确认问题环境的基本情况,发现该集群有额外安装EDR防病毒软件。为了确认该软件有没有影响,先停掉防病毒软件,观察一段时候后,发现环境恢复正常。重新启动防病毒软件,一段时间后问题复现;

与负责防病毒软件的技术沟通确认,该集群被设置到了防病毒软件的默认策略里,触发条件是默认策略组里面有防止端口扫描和流量阈值配置,结果触发了网络防护,导致节点ip被封了。经过软件提供方的调整,将该集群调整到单独的策略编组,问题解决;

问题解决后,现象1和现象2都消失了,但回过头想想,为什么会出现现象2?当时环境好了没有细究,后来自己的测试环境也出现过这个现象,通过debug日志发现,卡在了调用metric服务的接口上,根本原因就是访问metricpod不通,日志现象比较明显,这里就不贴了:

1
kubectl get pod -n kube-system -owide --v 10

解决方案

  1. 针对该集群环境,单独配置策略编组;
  2. 节点与Pod通信正常后,现象2消失;

参考资料

  1. https://lyyao09.github.io/2021/06/19/k8s/K8S%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5-%E4%B8%9A%E5%8A%A1%E9%AB%98%E5%B9%B6%E5%8F%91%E5%AF%BC%E8%87%B4Pod%E5%8F%8D%E5%A4%8D%E9%87%8D%E5%90%AF/
  2. https://lyyao09.github.io/2021/07/16/k8s/K8S%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5-%E4%B8%9A%E5%8A%A1%E9%AB%98%E5%B9%B6%E5%8F%91%E5%AF%BC%E8%87%B4Pod%E5%8F%8D%E5%A4%8D%E9%87%8D%E5%90%AF(%E7%BB%AD)/
  3. https://developer.aliyun.com/article/409634
  4. https://www.css3.io/31linuxxi-tong-diao-you.htmls

问题背景

为了解决节点上请求部分service延迟63s问题[1],我们临时把OS的版本换成了Redhat 8.4(内核版本4.18.0-305),在VMware上虚出3节点集群后部署跨三层环境失败,提示Harbor部署失败。

原因分析

说明:距离定位这个问题已经有一段时间了,其实最终也没完全定位出根本原因,所以当时也没有整理记录定位过程,这里就简单描述一下,做个记录。

通过查看Harbor的日志,发现部署失败的原因是健康检查失败,因为HarborpodIp:port请求在跨三层下超时,抓包发现请求经过vxlan.calico时止于SYN_SENT报文;

实测发现,该环境上不仅是HarborpodIp:port请求超时,其他pod服务的请求经过跨三层的网络也同样存在问题,说明是一个共性问题,并且pod网段的七层如http请求受影响,4层的icmp请求不受影响);

继续通过抓包确认,podIp:port的请求已经发送到节点网卡上,但跨三层对端的节点没有收到。所以,可以排除主机上的iptables和路由的影响。实际上,也确实跟踪了iptables的请求过程,确认请求没有被丢弃;

综上,初步怀疑请求被跨三层网络的中间设备(如交换机、路由器)丢弃了,之后相关同事在交换机上抓包,发现ping包可以抓到,但http请求依然抓不到,说明交换机侧也没有收到报文,问题原因进一步缩小,可能情况有:

  1. 出口网卡丢弃;
  2. 出网卡后,入交换机之前丢弃

通过ehtool -s xxx命令统计虚机网卡报文信息,没有发现丢弃情况,说明问题不在虚机的出网卡;

关于VMware丢弃报文的情况,找到一些资料[2-6],比如混杂模式,mms配置都会有影响:

1
2
3
4
5
6
1) 通过ping命令指定报文长度,发现跨网段依次ping一下pod的ip均返回正常,确认不是mms问题;
ping -s 1000 177.177.x.x
ping -s 1472 177.177.x.x
ping -s 1500 177.177.x.x

2) 通过临时修改网卡为混杂模式,测试问题依然存在;

进一步在服务器物理网卡上抓包(登录esxi后台,使用pktcap-uw --uplink vmnicX --dir 2 -o result.pcap,其中vmnicX表示虚机关联的上行口物理网卡, dir 2表示同时抓取双向请求), 依然是ping包可以抓到,但http请求依然抓不到,说明服务器物理网卡上也没有收到报文;

最后,丢包范围缩小到VMare下的虚机机请求出网卡后,到服务器物理网卡前 ,这中间涉及到虚拟化的实现,具体还有什么处理就不清楚了,最后改为使用物理服务器部署;

解决方案

临时改用物理服务器部署跨三层集群成功,如果要使用VMware虚机部署,还需要继续排查根因。

参考资料

  1. https://lyyao09.github.io/2022/03/19/k8s/K8S%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5-Calico%E7%9A%84Vxlan%E6%A8%A1%E5%BC%8F%E4%B8%8B%E8%8A%82%E7%82%B9%E5%8F%91%E8%B5%B7K8s%20Service%E8%AF%B7%E6%B1%82%E5%BB%B6%E8%BF%9F/
  2. https://kb.vmware.com/s/article/2113783
  3. https://kb.vmware.com/s/article/52936
  4. https://docs.vmware.com/en/VMware-NSX-T-Data-Center/2.3/troubleshooting/GUID-4E4A9468-1F2B-44E1-A474-AA048A88BF1E.html
  5. https://communities.vmware.com/t5/vCloud-Networking-and-Security/No-SYN-ACK-reply-from-VM-on-virtual-wire-VXLAN/td-p/376545
    1. https://kb.vmware.com/s/article/2051814?lang=zh_CN

问题背景

K8S集群中修改calico的网络为vxlan模式后,发现部分service在节点上无法访问(实际上是延迟访问,延迟时间稳定在1min3s左右)。

1
2
3
4
[root@node2 ~]# time curl -s http://10.96.91.255
real 1m3.136s
user 0m0.005s
sys 0m0.005s

原因分析

先确认问题范围,在环境上找多个service依次测试发现,如果调用service的节点和实际pod不在同一个节点上,则出现延迟,否则请求正常。也就是说跨节点的访问才有问题。而直接用service对应的podIP访问,也不存在问题,猜测问题可能出在servicepod的过程。

再确认基本环境,OSK8Scalico等基础环境没有发生任何变化,仅仅是把calico的网络模式从BGP改为了vxlan,但是这个改动改变了集群内servicepod的请求路径,也即所有的容器请求需要走节点上新增的calico.vxlan接口封装一下。网络模式修改前没有问题,修改后必现,之后切回BGP模式问题就消失了,说明问题可能跟新增的calico.vxlan接口有关系。

先看下环境情况,并触发service请求:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 验证环境:node2(10.10.72.11)——> node1(10.10.72.10)
# 验证方法:node2上curl service:10.96.91.255 ——> node1上pod:166.166.166.168:8082
[root@node2 ~]# ip addr
10: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1410 qdisc noqueue state UNKNOWN group default
link/ether 66:2d:bf:44:a6:8b brd ff:ff:ff:ff:ff:ff
inet 166.166.104.10/32 brd 166.166.104.10 scope global vxlan.calico
valid_lft forever preferred_lft forever

[root@node1 ~]# ip addr
15: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1410 qdisc noqueue state UNKNOWN group default
link/ether 66:f9:37:c3:7e:94 brd ff:ff:ff:ff:ff:ff
inet 166.166.166.175/32 brd 166.166.166.175 scope global vxlan.calico
valid_lft forever preferred_lft forever

[root@node2 ~]# time curl http://10.96.91.255

node1的主机网卡上抓包看看封装后的请求是否已到达:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
[root@node1 ~]# tcpdump -n -vv -i eth0 host 10.10.72.11 and udp
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
07:19:42.730996 IP (tos 0x0, ttl 64, id 6470, offset 0, flags [none], proto UDP (17), length 110)
10.10.72.11.nim-vdrshell > 10.10.72.10.4789: [bad udp cksum 0xffff -> 0x3af0!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 39190, offset 0, flags [DF], proto TCP (6), length 60)
166.166.104.10.35632 > 166.166.166.168.us-cli: Flags [S], cksum 0xe556 (correct), seq 3025623348, win 29200, options [mss 1460,sackOK,TS val 101892130 ecr 0,nop,wscale 7], length 0
07:19:43.733741 IP (tos 0x0, ttl 64, id 6804, offset 0, flags [none], proto UDP (17), length 110)
10.10.72.11.nim-vdrshell > 10.10.72.10.4789: [bad udp cksum 0xffff -> 0x3af0!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 39191, offset 0, flags [DF], proto TCP (6), length 60)
166.166.104.10.35632 > 166.166.166.168.us-cli: Flags [S], cksum 0xe16b (correct), seq 3025623348, win 29200, options [mss 1460,sackOK,TS val 101893133 ecr 0,nop,wscale 7], length 0
07:19:45.736729 IP (tos 0x0, ttl 64, id 7403, offset 0, flags [none], proto UDP (17), length 110)
10.10.72.11.nim-vdrshell > 10.10.72.10.4789: [bad udp cksum 0xffff -> 0x3af0!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 39192, offset 0, flags [DF], proto TCP (6), length 60)
166.166.104.10.35632 > 166.166.166.168.us-cli: Flags [S], cksum 0xd998 (correct), seq 3025623348, win 29200, options [mss 1460,sackOK,TS val 101895136 ecr 0,nop,wscale 7], length 0
07:19:49.744801 IP (tos 0x0, ttl 64, id 9648, offset 0, flags [none], proto UDP (17), length 110)
10.10.72.11.nim-vdrshell > 10.10.72.10.4789: [bad udp cksum 0xffff -> 0x3af0!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 39193, offset 0, flags [DF], proto TCP (6), length 60)
166.166.104.10.35632 > 166.166.166.168.us-cli: Flags [S], cksum 0xc9f0 (correct), seq 3025623348, win 29200, options [mss 1460,sackOK,TS val 101899144 ecr 0,nop,wscale 7], length 0
07:19:57.768735 IP (tos 0x0, ttl 64, id 12853, offset 0, flags [none], proto UDP (17), length 110)
10.10.72.11.nim-vdrshell > 10.10.72.10.4789: [bad udp cksum 0xffff -> 0x3af0!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 39194, offset 0, flags [DF], proto TCP (6), length 60)
166.166.104.10.35632 > 166.166.166.168.us-cli: Flags [S], cksum 0xaa98 (correct), seq 3025623348, win 29200, options [mss 1460,sackOK,TS val 101907168 ecr 0,nop,wscale 7], length 0
07:20:05.087057 IP (tos 0x0, ttl 64, id 8479, offset 0, flags [none], proto UDP (17), length 164)
10.10.72.10.34565 > 10.10.72.11.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 3425, offset 0, flags [DF], proto UDP (17), length 114)
166.166.166.168.57850 > 166.166.104.6.domain: [udp sum ok] 10121+ AAAA? influxdb-nginx-service.kube-system.svc.kube-system.svc.cluster.local. (86)
07:20:05.087076 IP (tos 0x0, ttl 64, id 54475, offset 0, flags [none], proto UDP (17), length 164)
10.10.72.10.51841 > 10.10.72.11.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 3424, offset 0, flags [DF], proto UDP (17), length 114)
166.166.166.168.57984 > 166.166.104.6.domain: [udp sum ok] 20020+ A? influxdb-nginx-service.kube-system.svc.kube-system.svc.cluster.local. (86)
07:20:05.087671 IP (tos 0x0, ttl 64, id 13540, offset 0, flags [none], proto UDP (17), length 257)
10.10.72.11.60395 > 10.10.72.10.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 19190, offset 0, flags [DF], proto UDP (17), length 207)
166.166.104.6.domain > 166.166.166.168.57850: [udp sum ok] 10121 NXDomain*- q: AAAA? influxdb-nginx-service.kube-system.svc.kube-system.svc.cluster.local. 0/1/0 ns: cluster.local. SOA ns.dns.cluster.local. hostmaster.cluster.local. 1647633218 7200 1800 86400 5 (179)
07:20:05.087702 IP (tos 0x0, ttl 64, id 13541, offset 0, flags [none], proto UDP (17), length 257)
10.10.72.11.48571 > 10.10.72.10.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 19191, offset 0, flags [DF], proto UDP (17), length 207)
166.166.104.6.domain > 166.166.166.168.57984: [udp sum ok] 20020 NXDomain*- q: A? influxdb-nginx-service.kube-system.svc.kube-system.svc.cluster.local. 0/1/0 ns: cluster.local. SOA ns.dns.cluster.local. hostmaster.cluster.local. 1647633218 7200 1800 86400 5 (179)
07:20:05.088801 IP (tos 0x0, ttl 64, id 8480, offset 0, flags [none], proto UDP (17), length 152)
10.10.72.10.55780 > 10.10.72.11.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 3427, offset 0, flags [DF], proto UDP (17), length 102)
166.166.166.168.56015 > 166.166.104.6.domain: [udp sum ok] 19167+ AAAA? influxdb-nginx-service.kube-system.svc.svc.cluster.local. (74)
07:20:05.089048 IP (tos 0x0, ttl 64, id 13542, offset 0, flags [none], proto UDP (17), length 245)
10.10.72.11.50151 > 10.10.72.10.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 19192, offset 0, flags [DF], proto UDP (17), length 195)
166.166.104.6.domain > 166.166.166.168.56015: [udp sum ok] 19167 NXDomain*- q: AAAA? influxdb-nginx-service.kube-system.svc.svc.cluster.local. 0/1/0 ns: cluster.local. SOA ns.dns.cluster.local. hostmaster.cluster.local. 1647633218 7200 1800 86400 5 (167)
07:20:05.089212 IP (tos 0x0, ttl 64, id 8481, offset 0, flags [none], proto UDP (17), length 148)
10.10.72.10.50272 > 10.10.72.11.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 3430, offset 0, flags [DF], proto UDP (17), length 98)
166.166.166.168.54926 > 166.166.104.6.domain: [udp sum ok] 40948+ A? influxdb-nginx-service.kube-system.svc.cluster.local. (70)
07:20:05.089403 IP (tos 0x0, ttl 64, id 13543, offset 0, flags [none], proto UDP (17), length 241)
10.10.72.11.59882 > 10.10.72.10.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 19193, offset 0, flags [DF], proto UDP (17), length 191)
166.166.104.6.domain > 166.166.166.168.54926: [udp sum ok] 40948 NXDomain*- q: A? influxdb-nginx-service.kube-system.svc.cluster.local. 0/1/0 ns: cluster.local. SOA ns.dns.cluster.local. hostmaster.cluster.local. 1647633218 7200 1800 86400 5 (163)
07:20:05.089524 IP (tos 0x0, ttl 64, id 8482, offset 0, flags [none], proto UDP (17), length 134)
10.10.72.10.58964 > 10.10.72.11.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 3431, offset 0, flags [DF], proto UDP (17), length 84)
166.166.166.168.50263 > 166.166.104.6.domain: [udp sum ok] 18815+ A? influxdb-nginx-service.kube-system.svc. (56)
07:20:05.089681 IP (tos 0x0, ttl 64, id 13544, offset 0, flags [none], proto UDP (17), length 134)
10.10.72.11.51874 > 10.10.72.10.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 19194, offset 0, flags [DF], proto UDP (17), length 84)
166.166.104.6.domain > 166.166.166.168.50263: [udp sum ok] 18815 ServFail- q: A? influxdb-nginx-service.kube-system.svc. 0/0/0 (56)
07:20:05.089706 IP (tos 0x0, ttl 64, id 8483, offset 0, flags [none], proto UDP (17), length 134)
10.10.72.10.59891 > 10.10.72.11.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 3433, offset 0, flags [DF], proto UDP (17), length 84)
166.166.166.168.49202 > 166.166.104.6.domain: [udp sum ok] 58612+ AAAA? influxdb-nginx-service.kube-system.svc. (56)
07:20:05.089859 IP (tos 0x0, ttl 64, id 13545, offset 0, flags [none], proto UDP (17), length 134)
10.10.72.11.44146 > 10.10.72.10.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 19195, offset 0, flags [DF], proto UDP (17), length 84)
166.166.104.6.domain > 166.166.166.168.49202: [udp sum ok] 58612 ServFail- q: AAAA? influxdb-nginx-service.kube-system.svc. 0/0/0 (56)

从抓包结果看,出现一个可疑点:前几个报文中提示bad udp cksum 0xffff,请求通的最后几个报文提示的是no cksum

根据这个错误信息,搜索发现是个已知bug,相关的详细定位可以参考[1]-[3],这里就不细说了。大概原因如下所述:

内核中存在一个和VXLAN处理有关的缺陷,该缺陷会导致Checksum Offloading不能正确完成。这个缺陷仅仅在很边缘的场景下才会表现出来。

VXLANUDP头被NAT过的前提下,如果:

  1. VXLAN设备禁用(这是RFC的建议)了UDP Checksum
  2. VXLAN设备启用了Tx Checksum Offloading

就会导致生成错误的UDP Checksum

从资料[1]看,K8Sv1.18.5版本已经修复了这个问题,但我的问题是在v1.21.0上发现的,所以不确定只升级K8S是否可以解决该问题,或者升级后还需要额外配置什么?

从资料[3]和[4]看,calicov3.20.0版本做了修改:在kernels < v5.7时也禁用了calico.vxlan接口的Offloading功能。

本地临时禁用并验证:

1
2
3
4
5
6
7
8
9
10
11
[root@node2 ~]# ethtool --offload vxlan.calico rx off tx off
Actual changes:
rx-checksumming: off
tx-checksumming: off
tx-checksum-ip-generic: off
tcp-segmentation-offload: off
tx-tcp-segmentation: off [requested on]
tx-tcp-ecn-segmentation: off [requested on]
tx-tcp6-segmentation: off [requested on]
tx-tcp-mangleid-segmentation: off [requested on]
udp-fragmentation-offload: off [requested on]
1
2
3
4
[root@node2 ~]# time curl http://10.96.91.255
real 0m0.009s
user 0m0.002s
sys 0m0.007s

请求恢复正常。

解决方案

  1. 临时解决: ethtool --offload vxlan.calico rx off tx off
  2. 永久解决:升级calico >=v3.20.0或升级内核到5.6.13, 5.4.41, 4.19.123, 4.14.181,单独升级K8S >= v1.18.5版本待确认是否能解决

参考资料

  1. https://blog.gmem.cc/nodeport-63s-delay-due-to-kernel-issue
  2. https://cloudnative.to/blog/kubernetes-1-17-vxlan-63s-delay/
  3. https://bowser1704.github.io/posts/vxlan-bug/
  4. https://github.com/projectcalico/calico/issues/3145
  5. https://github.com/projectcalico/felix/pull/2811/files

问题描述

Docker守护进程在异常断电后卡在activating状态,并且内存占用在无限增加。

1
2
3
4
5
6
7
8
9
10
11
12
[root@node01 docker]# service docker status
Redirecting to /bin/systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/docker.service.d
└─restart.conf
Active: activating (start) since Tue 2021-12-07 15:44:32 CST; 1min 18s ago
Docs: https:/docs.docker.com
Main PID: 274797 (dockerd)
Tasks: 481
Memory: 15.4G -- 占用不正常
CGroup: /system
1
2
3
4
5
6
7
8
9
10
11
12
13
[root@node01 docker]# service docker status
Redirecting to /bin/systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/docker.service.d
└─restart.conf
Active: deactivating (stop-sigterm)
Docs: https:/docs.docker.com
Main PID: 274797 (dockerd)
Tasks: 481
Memory: 247.3G -- 一直在疯狂增加
CGroup: /system.slice/docker.service
└─274

定位过程

首先,查看docker版本和docker info信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[root@node01files]# docker version
Client: Docker Engine - Community
Version: 20.10.7
API version: 1.41
Go version: go1.13.15
Git commit: f0df350
Built: Wed Jun 2 11:58:10 2021
OS/Arch: linux/amd64
Context: default
Experimental: true

Server: Docker Engine - Community
Engine:
Version: 20.10.7
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: b0f5bc3
runc:
Version: 1.0.0-rc95
GitCommit: b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
docker-init:
Version: 0.19.0
GitCommit: de40ad0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
[root@node01 files]# docker info
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)
scan: Docker Scan (Docker Inc., v0.8.0)

Server:
Server Version: 20.10.7
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d71fcd7d8303cbf684402823e425e9dd2e99285d
runc version: b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
init version: de40ad0
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-957.21.3.el7.x86_64
Operating System: CentOS 7.6
OSType: linux
Architecture: x86_64
Total Memory: 256GiB
Name: node01
Docker Root Dir: /var/lib/docker
Debug Mode: false
File Descriptors: 366
Goroutines: 290
EventsListeners: 0
Registry: https:/index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: true

版本还算比较新,查看docker日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
Dec 07 14:57:57 node01 dockerd[239420]: time="2021-12-07T14:57:57.831370861+08:00" level=info msg="Starting up"
Dec 07 14:57:57 node01 dockerd[239420]: time="2021-12-07T14:57:57.950367668+08:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Dec 07 14:57:57 node01 dockerd[239420]: time="2021-12-07T14:57:57.950430356+08:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Dec 07 14:57:57 node01 dockerd[239420]: time="2021-12-07T14:57:57.950486773+08:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix://run/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" module=grpc
Dec 07 14:57:57 node01 dockerd[239420]: time="2021-12-07T14:57:57.950524941+08:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Dec 07 14:57:58 node01 dockerd[239420]: time="2021-12-07T14:57:58.004622322+08:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Dec 07 14:57:58 node01 dockerd[239420]: time="2021-12-07T14:57:58.004663918+08:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Dec 07 14:57:58 node01 dockerd[239420]: time="2021-12-07T14:57:58.004697382+08:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix://run/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" module=grpc
Dec 07 14:57:58 node01 dockerd[239420]: time="2021-12-07T14:57:58.004726533+08:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Dec 07 14:58:14 node01 dockerd[239420]: time="2021-12-07T14:58:14.832862519+08:00" level=info msg="[graphdriver] using prior storage driver: overlay2"
Dec 07 14:59:00 node01 dockerd[239420]: time="2021-12-07T14:59:00.039554832+08:00" level=info msg="Loading containers: start."
Dec 07 15:02:27 node01 dockerd[239420]: fatal error: runtime: out of memory
Dec 07 15:02:27 node01 dockerd[239420]: runtime stack:
Dec 07 15:02:27 node01 dockerd[239420]: runtime.throw(0x5604d0deee9d, 0x16)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/panic.go:774 +0x74
Dec 07 15:02:28 node01 dockerd[239420]: runtime.sysMap(0xfb6c000000, 0x5c000000, 0x5604d35dab58)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/mem_linux.go:169 +0xc7
Dec 07 15:02:28 node01 dockerd[239420]: runtime.(*mheap).sysAlloc(0x5604d35be240, 0x5beee000, 0x7f6fd1fbab88, 0x5604cf22bf6b)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/malloc.go:701 +0x1cf
Dec 07 15:02:28 node01 dockerd[239420]: runtime.(*mheap).grow(0x5604d35be240, 0x2df77, 0xffffffff)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/mheap.go:1255 +0xa5
Dec 07 15:02:28 node01 dockerd[239420]: runtime.(*mheap).allocSpanLocked(0x5604d35be240, 0x2df77, 0x5604d35dab68, 0x5604cf212927)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/mheap.go:1170 +0x268
Dec 07 15:02:28 node01 dockerd[239420]: runtime.(*mheap).alloc_m(0x5604d35be240, 0x2df77, 0x7f6c01180100, 0x7f6c01184350)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/mheap.go:1022 +0xc6
Dec 07 15:02:28 node01 dockerd[239420]: runtime.(*mheap).alloc.func1()
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/mheap.go:1093 +0x4e
Dec 07 15:02:28 node01 dockerd[239420]: runtime.(*mheap).alloc(0x5604d35be240, 0x2df77, 0x7f6fd1010100, 0x7f6c01184350)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/mheap.go:1092 +0x8c
Dec 07 15:02:28 node01 dockerd[239420]: runtime.largeAlloc(0x5beee000, 0x5604cf250001, 0x7f6c01184350)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/malloc.go:1138 +0x99
Dec 07 15:02:28 node01 dockerd[239420]: runtime.mallocgc.func1()
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/malloc.go:1033 +0x48
Dec 07 15:02:28 node01 dockerd[239420]: runtime.systemstack(0x0)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/asm_amd64.s:370 +0x63
Dec 07 15:02:28 node01 dockerd[239420]: runtime.mstart()
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/proc.go:1146
Dec 07 15:02:28 node01 dockerd[239420]: goroutine 1 [running]:
Dec 07 15:02:28 node01 dockerd[239420]: runtime.systemstack_switch()
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/asm_amd64.s:330 fp=0xc0008bff70 sp=0xc0008bff68 pc=0x5604cf2528e0
Dec 07 15:02:28 node01 dockerd[239420]: runtime.mallocgc(0x5beee000, 0x5604d1e8fde0, 0x1, 0x0)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/malloc.go:1032 +0x8a7 fp=0xc0008c0010 sp=0xc0008bff70 pc=0x5604cf200ef7
Dec 07 15:02:28 node01 dockerd[239420]: runtime.growslice(0x5604d1e8fde0, 0xef119e4000, 0x3107eaa, 0x3107eaa, 0x3107eab, 0x5604cf20146b, 0xfb6bbbfb80, 0x7f6c01184350)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/slice.go:181 +0x1e4 fp=0xc0008c0078 sp=0xc0008c0010 pc=0x5604cf23b0b4
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/go.etcd.io/bbolt.(*Cursor).first(0xc0008c0288)
Dec 07 15:02:28 node01 dockerd[239420]: docker/vendor/go.etcd.io/bbolt/cursor.go:182 +0x138 fp=0xc0008c00d8 sp=0xc0008c0078 pc=0x5604d012d348
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/go.etcd.io/bbolt.(*Cursor).next(0xc0008c0288, 0x0, 0x9f, 0x9f, 0x7f7150506429, 0x2b8, 0xc0008c0188, 0x5604cf23ee10)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/go.etcd.io/bbolt/cursor.go:234 +0x84 fp=0xc0008c0128 sp=0xc0008c00d8 pc=0x5604d012d684
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/go.etcd.io/bbolt.(*Cursor).Next(0xc0008c0288, 0x7f715050638a, 0x9f, 0x9f, 0xfb6bbcf720, 0x9f, 0x2b8)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/go.etcd.io/bbolt/cursor.go:75 +0x43 fp=0xc0008c0198 sp=0xc0008c0128 pc=0x5604d012ca43
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libkv/store/boltdb.(*BoltDB).List.func1(0xc0034350a0, 0xc0008c0200, 0xc0034350a0)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libkv/store/boltdb/boltdb.go:288 +0x19a fp=0xc0008c02b8 sp=0xc0008c0198 pc=0x5604d0142a4a
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/go.etcd.io/bbolt.(*DB).View(0xc0001dc200, 0xc0008c03f8, 0x0, 0x0)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/go.etcd.io/bbolt/db.go:725 +0xaa fp=0xc0008c0340 sp=0xc0008c02b8 pc=0x5604d0130a3a
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libkv/store/boltdb.(*BoltDB).List(0xc00482f2c0, 0xc001a7af00, 0x5e, 0x0, 0x0, 0x0, 0x0, 0x0)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libkv/store/boltdb/boltdb.go:279 +0x1b3 fp=0xc0008c0430 sp=0xc0008c0340 pc=0x5604d0141583
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork/datastore.(*cache).kmap(0xc003bf5900, 0x5604d220c1a0, 0xc001e7c160, 0x0, 0x5e, 0xc001a7ae40)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork/datastore/cache.go:43 +0x286 fp=0xc0008c0560 sp=0xc0008c0430 pc=0x5604cfcc81a6
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork/datastore.(*cache).list(0xc003bf5900, 0x5604d220c1a0, 0xc001e7c160, 0x0, 0x0, 0x0, 0x0, 0x0)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork/datastore/cache.go:164 +0x7e fp=0xc0008c0678 sp=0xc0008c0560 pc=0x5604cfcc988e
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork/datastore.(*datastore).List(0xc0079ff680, 0xc001a7ae40, 0x5e, 0x5604d220c1a0, 0xc001e7c160, 0x0, 0x0, 0x0, 0x0, 0x0)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork/datastore/datastore.go:517 +0x205 fp=0xc0008c0730 sp=0xc0008c0678 pc=0x5604cfccc745
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork.(*network).getEndpointsFromStore(0xc00289a380, 0xc004a99258, 0x7, 0x5604d2206920, 0xc001a515f0, 0xc002f3ff48)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork/store.go:190 +0x343 fp=0xc0008c09f8 sp=0xc0008c0730 pc=0x5604d01acaf3
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork.(*controller).reservePools(0xc0004b7400)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork/controller.go:977 +0x4c1 fp=0xc0008c0c28 sp=0xc0008c09f8 pc=0x5604d0159701
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork.New(0xc001055d00, 0x9, 0x10, 0xc0007ac870, 0xc001387fb0, 0xc001055d00, 0x9)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork/controller.go:245 +0x615 fp=0xc0008c0dd8 sp=0xc0008c0c28 pc=0x5604d0154815
Dec 07 15:02:28 node01 dockerd[239420]: /docker/daemon.(*Daemon).initNetworkController(0xc00000c1e0, 0xc000207080, 0xc001387fb0, 0xc000236f50, 0xc00000c1e0, 0xc0007170c8, 0xc001387fb0)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/daemon/daemon_unix.go:855 +0xa9 fp=0xc0008c0e90 sp=0xc0008c0dd8 pc=0x5604d0a49429
Dec 07 15:02:28 node01 dockerd[239420]: /docker/daemon.(*Daemon).restore(0xc00000c1e0, 0xc00004e5c0, 0xc000220000)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/daemon/daemon.go:490 +0x50b fp=0xc0008c1088 sp=0xc0008c0e90 pc=0x5604d0a3898b
Dec 07 15:02:28 node01 dockerd[239420]: /docker/daemon.NewDaemon(0x5604d21de4c0, 0xc00004e5c0, 0xc000207080, 0xc0007ac870, 0x0, 0x0, 0x0)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/daemon/daemon.go:1147 +0x2be8 fp=0xc0008c19c8 sp=0xc0008c1088 pc=0x5604d0a3c938
Dec 07 15:02:28 node01 dockerd[239420]: main.(*DaemonCli).start(0xc0007ab410, 0xc000218600, 0x0, 0x0)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/cmd/dockerd/daemon.go:195 +0x743 fp=0xc0008c1d00 sp=0xc0008c19c8 pc=0x5604d0dbb3c3
Dec 07 15:02:28 node01 dockerd[239420]: main.runDaemon(...)

日志中存在明显的异常堆栈打印,分析可知,启动过程中走到Loading containers: start.之后卡住,然后打印fatal error: runtime: out of memory,也就是内存爆了。根据堆栈信息,可以看出异常的调用路径如下:

1
NewDaemon —> restore() —> initNetworkController —> libnetwork.New() —> reservePools() —>  getEndpointsFromStore() —> List —> cache.list(kvObject) —> cache.kmap(kvObject) —> List(keyPrefix) —> Next —> next —> first

拉取对应版本的代码,根据上述调用过程,找到指定的代码位置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// List returns the range of keys starting with the passed in prefix
func (b *BoltDB) List(keyPrefix string) ([]*store.KVPair, error) {
var (
db *bolt.DB
err error
)
b.Lock()
defer b.Unlock()

kv := []*store.KVPair{}

if db, err = b.getDBhandle(); err != nil {
return nil, err
}
defer b.releaseDBhandle()

err = db.View(func(tx *bolt.Tx) error {
bucket := tx.Bucket(b.boltBucket)
if bucket == nil {
return store.ErrKeyNotFound
}

cursor := bucket.Cursor()
prefix := []byte(keyPrefix)

for key, v := cursor.Seek(prefix); bytes.HasPrefix(key, prefix); key, v = cursor.Next() {

dbIndex := binary.LittleEndian.Uint64(v[:libkvmetadatalen])
v = v[libkvmetadatalen:]
val := make([]byte, len(v))
copy(val, v)

kv = append(kv, &store.KVPair{
Key: string(key),
Value: val,
LastIndex: dbIndex,
})
}
return nil
})
if len(kv) == 0 {
return nil, store.ErrKeyNotFound
}
return kv, err
}

可以看出,这段是读取boltdb数据库(用于缓存网络配置),从之前了解看,这个数据库在异常断电时很容易损坏,所以怀疑是数据库损坏了,导致此处的遍历读取超出了预期的循环次数,而每次循环都会创建变量,分配内存,最终被内核OOM

docker社区查找相关issue[1] [2],发现确实存在boltdb数据库损坏的现象,不过最终报错的现象不太一样。最后,在docker社区也提了个issue[3],社区反馈也是怀疑boltdb数据库损坏,并建议可以把local-kv.db文件删除再重启来恢复。

个人觉得,上面报错的地方可以优化一下,对db文件做一次检查,如果检查到异常,提前抛异常,而不是不停地吃内存(由于异常的环境被破坏了,这个想法需要等复现后再考虑优化)。

解决方案

  1. 临时解决:删除local-kv.db文件再重启docker服务。
  2. 永久解决:优化异常代码。

参考资料

  1. https://github.com/moby/moby/issues/37238
  2. https://github.com/moby/moby/issues/42099
  3. https://github.com/moby/moby/issues/43072