0%

问题现象

部署在麒麟V10SP03环境上的K8S集群,kube-proxy组件使用conntrack命令删除表项失败。

1
2
[root@node1 ~]# docker run --privileged --net=host --rm  kube-proxy conntrack -D -p icmp -s 192.168.1.2
conntrack v1.4.5 (conntrack-tools): Operation failed: Operation not supported

原因分析

考虑到问题场景是使用的docker镜像带的conntrack工具,为了排除docker相关影响,在宿主机上安装conntrack工具并执行命令:

1
2
[root@node1 ~]# conntrack -D -p icmp -s 192.168.1.2
conntrack v1.4.5 (conntrack-tools): Operation failed: Operation not supported

验证结果报错,再排除conntrack工具层面的影响,直接使用如下c++程序调用libnetfilter_conntrack库构造删除:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <arpa/inet.h>

#include <libnetfilter_conntrack/libnetfilter_conntrack.h>
#include <libnetfilter_conntrack/libnetfilter_conntrack_tcp.h>

int main(void)
{
int ret;
struct nfct_handle *h;
struct nf_conntrack *ct;

ct = nfct_new();
if (!ct) {
perror("nfct_new");
return 0;
}

nfct_set_attr_u8(ct, ATTR_L3PROTO, AF_INET);
nfct_set_attr_u32(ct, ATTR_IPV4_SRC, inet_addr("1.1.1.1"));
nfct_set_attr_u32(ct, ATTR_IPV4_DST, inet_addr("2.2.2.2"));

nfct_set_attr_u8(ct, ATTR_L4PROTO, IPPROTO_TCP);
nfct_set_attr_u16(ct, ATTR_PORT_SRC, htons(20));
nfct_set_attr_u16(ct, ATTR_PORT_DST, htons(10));

h = nfct_open(CONNTRACK, 0);
if (!h) {
perror("nfct_open");
nfct_destroy(ct);
return -1;
}

ret = nfct_query(h, NFCT_Q_DESTROY, ct);

printf("TEST: delete conntrack ");
if (ret == -1)
printf("(%d)(%s)\n", ret, strerror(errno));
else
printf("(OK)\n");

nfct_close(h);

nfct_destroy(ct);

ret == -1 ? exit(EXIT_FAILURE) : exit(EXIT_SUCCESS);
}

验证结果依然报错:

1
2
[root@single ~]# ./conntrack_delete
TEST: delete conntrack (-1)(Operation not supported)

通过gdb查看,问题出在调用libnetfilter_conntrack.so库,初步判断跟内核有关。

1
2
3
4
(gdb) b nfct_query
Breakpoint 3 at 0x7ffff7fa6a34
(gdb) s
Breakpoint 3, 0x00007ffff7fa6a34 in nfct_query () from /lib64/libnetfilter_conntrack.so.3

之后查看了麒麟官网的内核版本列表,未发现相关bug,通过更新到最新内核4.19.90-52.39测试,问题依然存在。

回退到麒麟V10SP02的环境(内核版本是4.19.90-24.4),测试没有该问题。

基本确认,是SP03版本的内核引入的bug,提交issue后,官方计划在4.19.90-52.40内核版本里解决。

解决方案

更新内核版本到4.19.90-52.40

参考资料

https://update.cs2c.com.cn/NS/V10/V10SP3/os/adv/lic/base/x86_64/Packages/

https://update.cs2c.com.cn/NS/V10/V10SP3/os/adv/lic/updates/x86_64/Packages/

问题现象

K8S集群内的一个Pod服务定时发起icmpv6ping报文到集群外的设备,偶现请求无响应应。

原因分析

抓包分析

先在节点上和Pod所在网卡上抓包,分析确认,请求包在节点上有响应,但未进入Pod内,初步判断请求是在节点发往Pod内的过程中丢了。

网卡是否丢包

查看集群上的网卡配置,环境使用的是bound网卡:

1
2
3
4
5
6
7
8
9
10
[root@node1 ~]# ifconfig bound0
bound0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST,MASTER> mtu 1500
inet xx.xx.xx.xx netmask 255.255.0.0 broadcast xx.xx.255.255
inet6 xxxx::xxx:xxxx:xxxx:xxxx prefixlen 64 scopeid 0x20<link>
inet6 xx.xx.xx.xx prefixlen 64 scopeid 0x0<global>
ether xx:xx:xx:xx:xx:xx txqueuelen 1000 (Ethernet)
RX packets 94617918 bytes 39566668050 (36.8 GiB)
RX errors 0 dropped 5011121212 overruns 0 frame 0
TX packets 58685914 bytes 77971464576 (72.6 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

查看到RXdrop报文数量很大,且不断增加。考虑到早期出现过bound配置导致的丢包问题,排查相关配置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[root@node1 ~]# cat /sys/class/net/bondo/bonding/mode
802.3ad 4

[root@node1 ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802. 3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 64: 2f: c7: c2: b1: 8b
Active Aggregator Info:
...

MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent Hw addr: 64: 2f: c7: c2: b1: 8b Slave queue ID: 0
Aggregator ID: 1
...

从配置看,Link Failure Count: 1,也就是出现过一次网卡链路问题,看起来跟持续的网络丢包关系不大,从dmesg日志也可以印证这一点。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@node1 ~]# dmesg -T |grep bond
[ 7 6 17:52:08 2023]bonding: bondo is being created...
[ 7 6 17:52:08 2023]bonding: bondo already exists
[ 7 6 17:52:08 2023]IPv6: ADDRCONF (NETDEV_UP): bondo: link is not ready
[ 7 6 17:52:08 2023]bondo: Enslaving ens3fo as a backup interface with a down link
[ 7 6 17:52:09 2023]bondo: Enslaving ens1fo as a backup interface with a down link
[ 7 6 17:52:09 2023]bondo: Warning: No 802. 3ad response from the link partner for any adapters in the bond
[ 7 6 17:52:09 2023]IPV6: ADDRCONF (NETDEV_UP): bondo: link is not ready
[ 7 6 17:52:09 2023]IPv6: ADDRCONF(NETDEV_CHANGE): bondo: link becomes ready
[ 7 6 17:52:09 2023]bondo: link status definitely up for interface ens3f0, 10000 Mbps full duplex
[ 7 6 17:52:09 2023]bondo: first active interface up!
[ 7 6 17:52:09 2023] bondo: link status definitely up for interface ens1f0, 10000 Mbps full duplex
...
[ 3 21 01:20:53 2024]bondo: link status definitely down for interface ens3f0, disabling it
[ 3 21 01:24:18 2024]bondo: link status definitely up for interface ens3f0, 10000 Mbps full duduplex
...

继续看抓包结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
节点上抓包结果:
45906 2024-05-08 08:33:19.680488 1000:ff00::1:212 1000:ff00::101 ICMPv6 94 Echo (ping) request id-0xef02, seq=1 (reply in 45911)
45911 2024-05-08 08:33:19.681123 1000:ff00:101 1000:ff00::1:212 ICMPv6 94 Echo (ping) reply id=0xef02, seq=1 (request in 45906)

45917 2024-05-08 08:33:23.640357 1000:ff00::1:212 1000:ff00:101 ICMPv6 94 Echo (ping) request id=0xef02, seq=1 (reply in 45921)
45921 2024-05-08 08:33:23.650087 1000:ff00:101 1000:ff00::1:212 ICMPv6 94 Echo (ping) reply id=0xef02, seq=1 (request in 45917)

45923 2024-05-08 08:33:24.652114 1000:ff00::1:212 1000:ff00::101 ICMPv6 94 Echo (ping) request id=0xef02, seq=1 (reply in 45924)
45924 2024-05-08 08:33:24.654495 1000:ff00::101 1000:ff00::1:212 ICMPv6 94 Echo (ping) reply id=0xef02, seq=1 (request in 45923)

45925 2024-05-08 08:33:26.653971 1000:ff00::1:212 1000:ff00::101 ICMPv6 94 Echo (ping) request id=0xef02, seq=1 (reply in 45926)
45926 2024-05-08 08:33:26.660779 1000:ff00::101 1000:ff00:1:212 ICMPv6 94 Echo (ping) reply id=0xef02, seq=1 (request in 45925)

Pod网卡抓包结果:
38001 2024-05-08 08:33:23.640134 fd00:7a23::466 1000:ff00::101 ICMPv6 94 Echo (ping) id=0xef02 (no response found! )
38004 2024-05-08 08:33:24.652076 fd00:7a23::466 1000:ff00::101 ICMPv6 94 Echo (ping) id=0xef02 (no response found! )
38005 2024-05-08 08:33:26.653936 fd00:7a23::466 1000:ff00::101 ICMPv6 94 Echo (ping) id=0xef02 (no response found! )

对比发现,Pod内是发了3个请求包,均没有响应,跟业务的请求配置2s超时 + 重试3次吻合。但从节点的角度看有4个请求和响应包,且从时间看,是从第2个包开始对应的。

由此怀疑,第1个请求响应包是别的服务触发的,跟现场了解到确实存在两个服务ping设备,且这个请求使用的id0xef2,跟后面3个请求包的id是相同的,初步判断是id相同引起的后续包无法响应。

icmpv6相关资料[1],没有明确的官方文档可以证明id相同的两个ping报文存在问题,从一些个人总结资料[2]里看,是不允许相同的。为了确认是否有问题,在家里使用跟业务类似的c++程序模拟ping设备,构造两个容器内使用相同的id请求集群外的ip地址,可以稳定复现该问题。而同时测试两个系统原生ping命令,未复现该问题。

分析两次复现的抓包结果,并查找相关资料[3],可以解释两种结果的差异原因:

1)为什么c++实现的ping有问题? –ping报文使用的id号的实现是顺序累加,当请求不通时,该程序会重试,且重试的报文依然使用相同的id号,这就导致一个id号冲突必然会导致重试也不通,直到下个轮询里使用新的id号探测恢复

2)为什么原生ping没有问题? –ping报文使用的id号是通过ping的进程id和一个十六进制的与计算得到的,每次独立的ping操作会使用一个计算得到的随机id作为icmp报文的id

id相同为什么会有问题呢?到底是哪里的机制影响了请求的响应?通过家里不断构造场景测试,问题原因和几个疑问的分析结论如下:

问题原因:icmpid相同导致系统记录的conntrack表项无法区分出两个不同的响应包该回给谁,如下所示,2ping请求,一个在节点上,一个在容器内,2ping请求使用了相同的id,这会导致两个ping请求均匹配到第一条表项,进而导致容器内的ping请求得不到响应:

1
2
3
4
[root@node1 ~]# cat /proc/net/nf_conntrack|grep icmpv6
ipv6 10 icmpv6 58 29 src=1000:0000:0000:0000:0000:0000:0212:0165dst=1000:0000:0000:0000:0000:0000:0212:0160 type=128 code=0 id=14640src=1000:0000:0000:0000:0000:0000:0212:0160 dst=1000:0000:0000:0000:0000:0000:0212:0165 type=129 code=0 id=0 mark=0 zone=0 use=2

ipv6 10 icmpv6 58 29 src=fd00:0111:0111:0000:0c11:b42f:f17e:a683dst=1000:0000:0000:0000:0000:0000:0212:0160 type=128 code=0 id=14640src=1000:0000:0000:0000:0000:0000:0212:0160 dst=1000:0000:0000:0000:0000:0000:0212:0165 type=129 code=0 id=14640 mark=0 zone=0 use=2

问题1:为什么节点上两个ping使用同一个id没问题?

因为请求和响应都在节点上,记录的conntrack表项是同一条,且id也相同,所以即使请求和响应不对应,两个请求方都可以得到响应。

1
ipv6   10 icmpv6  58 29 src=1000:0000:0000:0000:0000:0000:0212:0165 dst=1000:0000:0000:0000:0000:0000:0212:0160 type=128 code=0 id=14640 src=1000:0000:0000:0000:0000:0000:0212:0160 dst=1000:0000:0000:0000:0000:0000:0212:0165 type=129 code=0 id=0 mark=0 zone=0 use=2

问题2:为什么一个节点一个容器ping同一个id才有问题?

无必然关系,只要是出集群的请求 + nat转换 + ping id相同,都会存在这个问题,而nat转换是容器出集群依赖的必要机制。非容器场景如果使用了nat机制,理论上同样会出现这个问题。

问题3: 为什么使用不同的id发请求没问题?

在请求源ip,目标ip相同的情况下,不同的id请求会在conntrack表项会新增id不同的记录,请求的响应可以依据id做区分,并正常响应:

1
2
3
4
5
[root@node1 ~]# cat /proc/net/nf_conntrack|grep icmpv6

ipv6 10 icmpv6 58 29 src=fd00:0111:0111:0000:0c11:b42f:f17e:a683 dst=1000:0000:0000:0000:0000:0000:0212:0160 type=128 code=0 id=14640src=1000:0000:0000:0000:0000:0000:0212:0160 dst=1000:0000:0000:0000:0000:0000:0212:0165 type=129 code=0 id=14640 mark=0 zone=0 use=2

ipv6 10 icmpv6 58 29 src=fd00:0111:0111:0000:0c11:b42f:f17e:a683 dst=1000:0000:0000:0000:0000:0000:0212:0160 type=128 code=0 id=53764src=1000:0000:0000:0000:0000:0000:0212:0160 dst=1000:0000:0000:0000:0000:0000:0212:0165 type=129 code=0 id=53764 mark=0 zone=0 use=2

问题4:使用相同id请求,异常会持续多久?

由表项老化时间决定,默认是30s

解决方案

  1. 业务侧重试的报文不使用相同的id号。
  2. 同一个环境下避免多个业务同时ping相同的设备,或规划使用不同的id号,避免冲突。

参考资料

https://datatracker.ietf.org/doc/html/rfc4443

https://community.icinga.com/t/how-to-avoid-icmp-identifiers-colliding/5290

https://hechao.li/2018/09/27/How-Is-Ping-Deduplexed/

问题现象

K8S集群中有一个节点的docker stats命令查看不到资源使用:

1
2
3
4
5
6
7
8
9
[root@node1 ~]# docker stats
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O
1c9bec808f61 k8s_busybox_xxx -- -- -- --
86f38791af8f k8s_kube-controller-manager_xxx -- -- -- --
60d98fe39332 k8s_kube-scheduler_xxx -- -- -- --
a81320ad61e8 k8s_calico-kube-controllers_xxx -- -- -- --
4cf98fb540ba k8s_calico-node_xxx -- -- -- --
9747e7ce0032 k8s_kube-proxy_xxx -- -- -- --
...

原因分析

先看docker日志,存在大量如下异常:

1
time="xxx" level=error msg="collecting stats for xxx:no metrics reveived"

根据错误信息未找到相关问题,继续看这个命令的结果是从哪里读取的,根据资料[1]可知,该数据是从cgroup fs中计算得到。

随机找一个容器id,进入cgroup中看看相关指标文件是否正常:

1
2
3
4
5
[root@node1 ~]# cat /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podxxx/docker-xxx.scope/cpuacct.usage
32068181

[root@node1 ~]# cat /sys/fs/cgroup/memory/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podxxx/docker-xxx.scope/memory.limit_in_bytes
9223372036854771712

看起来没啥问题,查docker相关issue,未找到相关问题。

containerd服务,发现一直在打印如下异常:

1
/sys/fs/cgroup/cpuacct/kubepods.slice/besteffort.slice/podxxx/xxx/cpuacct.stat is expected to have 4 fields

根据上面的错误信息,再次查看相关cgroup,看内容不像是4个字段:

1
2
3
4
[root@node1 ~]# cat /sys/fs/cgroup/cpuacct/kubepods.slice/kubepods-besteffort.slice/podxxx/xxx/cpuacct.stat
user 32
system 89
sched_delay 0

找一个正常节点,查看相关cgroup

1
2
3
[root@node1 ~]# cat /sys/fs/cgroup/cpuacct/kubepods.slice/kubepods-burstable.slice/podxxx/xxx/cpuacct.stat
user 1568
system 6436

对比发现,问题环境里多了一个sched_delay字段,该字段表示由于调度延迟而导致的 CPU 时间延迟。查看到相关资料[2],此问题源自https://github.com/containerd/cgroups。 当尝试从文件/sys/fs/cgroup/cpuacct/cpuacct.stat检索字段时,会报告该错误。这个限制是不合理的,已在containerd/cgroups@5fe29ea中修复。

查看修改记录,containerd/cgroups的解决版本如下:

1
v3.0.3  v3.0.2 v3.0.1 v3.0.0 v1.1.0

对应的containerd版本是从v1.7.0开始升级cgroup版本到v1.0.0,解决了该问题。

为什么只有一个节点存在该问题?

根据修复记录的说明,某些系统内核才会触发该问题。查看正常节点和异常节点的内核版本,发现异常节点的内核版本是4.14.0,而正常节点的内核版本是5.x

解决方案

1.升级containerdv1.7.0及以上版本;

2.升级操作系统内核版本;

参考资料

1.https://cloud.tencent.com/developer/article/1096453

2.https://github.com/milvus-io/milvus/issues/22982

3.https://github.com/containerd/cgroups/pull/231

背景

找一个能查看etcd中存储的解码后的k8s数据的方法或工具。查看开源工具[1],很久没有维护了,看相关issue,该工具已经加入etcd-io

编译步骤

根据官方文档[2]操作,下载源码包:

1
2
3
4
5
6
7
8
[root@node1]# git clone git@github.com:etcd-io/auger.git
Cloning into 'auger'...
remote: Enumerating objects: 712, done.
remote: Counting objects: 100% (229/229), done.
remote: Compressing objects: 100% (106/106), done.
remote: Total 712 (delta 179), reused 150 (delta 123), pack-reused 483
Receiving objects: 100% (712/712), 247.44 KiB | 186.00 KiB/s, done.
Resolving deltas: 100% (409/409), done.

编译版本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[root@node1]# cd auger/
[root@node1 auger]# make release
Building release in temp directory /tmp/tmp.VtO7q4KrPY
docker run \
-v /tmp/tmp.VtO7q4KrPY/auger:/go/src/github.com/etcd-io/auger \
-w /go/src/github.com/etcd-io/auger \
golang:1.21.8 \
/bin/bash -c "make -f /go/src/github.com/etcd-io/auger/Makefile release-docker-build GOARCH=amd64 GOOS=linux"
Unable to find image 'golang:1.21.8' locally
1.21.8: Pulling from library/golang
71215d55680c: Pull complete
3cb8f9c23302: Pull complete
5f899db30843: Pull complete
c29f45468664: Pull complete
6de33e7b6490: Pull complete
6dbaf8e5f127: Pull complete
4f4fb700ef54: Pull complete
Digest: sha256:856073656d1a517517792e6cdd2f7a5ef080d3ca2dff33e518c8412f140fdd2d
Status: Downloaded newer image for golang:1.21.8
export GOPATH=/go
GOOS=linux GOARCH=amd64 GO111MODULE=on go build
go: go.mod requires go >= 1.22.0 (running go 1.21.8; GOTOOLCHAIN=local)
make: *** [/go/src/github.com/etcd-io/auger/Makefile:66: release-docker-build] Error 1
make: *** [release] Error 2

提示go的版本不匹配,更新版本:

1
2
3
4
5
[root@node1 auger]# vim Makefile
NAME ?= auger
PKG ?= github.com/etcd-io/$(NAME)
GO_VERSION ?= 1.22.0
...

继续编译:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[root@node1 auger]# make release
Building release in temp directory /tmp/tmp.s0Ue7zvIop
docker run \
-v /tmp/tmp.s0Ue7zvIop/auger:/go/src/github.com/etcd-io/auger \
-w /go/src/github.com/etcd-io/auger \
golang:1.22.0 \
/bin/bash -c "make -f /go/src/github.com/etcd-io/auger/Makefile release-docker-build GOARCH=amd64 GOOS=linux"
Unable to find image 'golang:1.22.0' locally
1.22.0: Pulling from library/golang
7bb465c29149: Pull complete
...
Digest: sha256:7b297d9abee021bab9046e492506b3c2da8a3722cbf301653186545ecc1e00bb
Status: Downloaded newer image for golang:1.22.0
export GOPATH=/go
GOOS=linux GOARCH=amd64 GO111MODULE=on go build
go: downloading github.com/coreos/etcd v3.1.11+incompatible
go: downloading github.com/google/safetext v0.0.0-20220914124124-e18e3fe012bf
go: downloading github.com/spf13/cobra v1.8.0
go: downloading github.com/coreos/bbolt v1.3.1-coreos.3
go: downloading k8s.io/apimachinery v0.30.0
go: downloading proxy.golang.org/xxx io timeout
...

使用proxy.golang.org代理导致很多依赖包下载失败,修改GOPROXY代理

1
2
3
4
5
6
7
8
9
10
[root@node1 ~]# docker exec -it 9b41dd00e91a sh
# go env
...
GOPROXY='https://proxy.golang.org,direct'

[root@node1 auger]# vim Makefile
# Build used inside docker by 'release'
release-docker-build:
export GOPATH=/go
GOOS=$(GOOS) GOARCH=$(GOARCH) GO111MODULE=on GOPROXY='https://goproxy.cn,direct' go build

继续编译:

1
2
3
4
5
6
7
8
9
10
11
12
[root@node1 auger]# make release
Building release in temp directory /tmp/tmp.34OgmWJGLU
docker run \
-v /tmp/tmp.34OgmWJGLU/auger:/go/src/github.com/etcd-io/auger \
-w /go/src/github.com/etcd-io/auger \
golang:1.22.0 \
/bin/bash -c "make -f /go/src/github.com/etcd-io/auger/Makefile release-docker-build GOARCH=amd64 GOOS=linux"
export GOPATH=/go
GOOS=linux GOARCH=amd64 GO111MODULE=on GOPROXY='https://goproxy.cn,direct' go build
go: downloading github.com/coreos/etcd v3.1.11+incompatible
...
build/auger built!

编译成功,执行二进制文件测试,提示glibc版本没找到:

1
2
3
[root@node1 auger]# ./build/auger -help
./build/auger: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by ./build/auger)
./build/auger: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by ./build/auger)

查看本地的glibc版本,发现版本不匹配:

1
2
[root@node1 l14185]# rpm -qa|grep glibc
glibc-2.17-326.el7_9.x86_64

解决方案有两个:

  1. 修改编译使用的镜像,找一个glibc版本跟节点上一致的编译镜像;
  2. 直接在节点上编译;

以直接在节点上编译为例,下载指定版本的go安装包,直接执行go build命令:

1
2
3
4
[root@node1]# GOOS=linux GOARCH=amd64 go build -o build/auger
[root@node1 auger]# ll build/
total 39916
-rwxr-xr-x 1 root root 40871798 May 13 19:12 auger

使用方法

查看帮助信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[root@iZbp1esczkzr2k2fughijkZ auger]# ./build/auger
Inspect and analyze kubernetes objects in binary storage
encoding used with etcd 3+ and boltdb.

Usage:
auger [command]

Available Commands:
analyze Analyze kubernetes data from the boltdb '.db' files etcd persists to.
checksum Checksum a etcd keyspace.
completion Generate the autocompletion script for the specified shell
decode Decode objects from the kubernetes binary key-value store encoding.
encode Encode objects to the kubernetes binary key-value store encoding.
extract Extracts kubernetes data from the boltdb '.db' files etcd persists to.
help Help about any command

Flags:
-h, --help help for auger

Use "auger [command] --help" for more information about a command.

查看解码后的etcd数据:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
[root@node1]# ETCDCTL_API=3 etcdctl get /registry/pods/kube-system/coredns-795cc9c45c-j7nl4 | ./auger decode
apiVersion: v1
kind: Pod
metadata:
generateName: coredns-795cc9c45c-
labels:
k8s-app: kube-dns
pod-template-hash: 795cc9c45c
name: coredns-795cc9c45c-j7nl4
namespace: kube-system
spec:
containers:
- args:
- -conf
- /etc/coredns/Corefile
name: coredns
ports:
...
volumeMounts:
- mountPath: /etc/coredns
name: config-volume
readOnly: true
- mountPath: /tmp
name: tmp
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: coredns-token-9dldj
readOnly: true
nodeName: node1
tolerations:
- key: CriticalAddonsOnly
operator: Exists
...
volumes:
- emptyDir: {}
name: tmp
...
status:
conditions:
- lastProbeTime: null
type: Initialized
...
containerStatuses:
- containerID: docker://f85d0fd1422a3860d574eb88b5dc23c165d5adb3eccb242a1a847bd0cfc98227
...
hostIP: 192.168.10.10
phase: Running
podIP: 10.10.166.139
qosClass: Burstable

注意事项

直接使用auger命令时,需要保证etcd服务未启动,或者把etcd的数据库文件拷贝一份再解析,否则会导致解析卡住。

解析卡住的strace命令现象如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[root@node1]# strace ./auger checksum -f /var/lib/etcd/default.etcd/member/snap/db
execve("./auger", ["./auger", "checksum", "-f", "/var/lib/etcd/default.etcd/membe"...], [/* 25 vars */]) = 0
brk(NULL) = 0x3e75000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4a242fc000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=37465, ...}) = 0
mmap(NULL, 37465, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4a242f2000
close(3) = 0
...
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f49da795000
openat(AT_FDCWD, "/var/lib/etcd/default.etcd/member/snap/db", O_RDWR|O_CREAT|O_CLOEXEC, 0400) = 3
fcntl(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1581252610, u64=9172183252402700290}}) = -1 EPERM (Operation not permitted)
fcntl(3, F_GETFL) = 0x8802 (flags O_RDWR|O_NONBLOCK|O_LARGEFILE)
fcntl(3, F_SETFL, O_RDWR|O_LARGEFILE) = 0
flock(3, LOCK_EX|LOCK_NB) = -1 EAGAIN (Resource temporarily unavailable)
futex(0xc000100148, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x1edd920, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x1edd920, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x1edd920, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x1edd920, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x1edd920, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
...

拷贝一份数据库文件,对比校验结果:

1
2
3
4
5
6
7
8
9
10
11
[root@node1]# cp /var/lib/etcd/default.etcd/member/snap/db /root/etcd.db
[root@node1]# ./auger checksum -f /root/etcd.db
checksum: 2125275681
compact-revision: 6609891
revision: 6610932

[root@node2 ~]# cp /var/lib/etcd/default.etcd/member/snap/db /root/etcd.db
[root@node2 ~]# ./auger checksum -f /root/etcd.db -r 6610932
checksum: 2125275681
compact-revision: 6610743
revision: 6610932

参考资料

https://github.com/jpbetz/auger

https://github.com/etcd-io/auger

需求背景

需要实时看到业务环境里的抓包结果,查看资料[1,2],了解到有两种配置方法:

  • 方案1:利用wireshark的远程接口功能
  • 方案2:利用wireshrkSSH remote capture功能

利用远程接口功能

1.安装依赖包

1
yum install glibc-static

2.下载rpcapd的源码包

下载跟wireshark版本相近的4.0.1-WpcapSrc.zip[3]

3.编译配置

1
2
3
4
5
6
7
8
9
[root@node1 ~]# CFLAGS=-static ./configure
...
checking for flex... no
checking for bison... no
checking for capable lex... insufficient
configure: error: Your operating system's lex is insufficient to compile
libpcap. flex is a lex replacement that has many advantages, including
being able to compile libpcap. For more information, see
http://www.gnu.org/software/flex/flex.html .

根据报错信息,安装缺少的相关依赖包:

1
yum install flex bison

编译报错:

1
2
3
4
[root@node1 libpcap]# make
gcc -O2 -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" @V_HAVE_REMOTE@ -c ./pcap-linux.c
gcc: 错误:@V_HAVE_REMOTE@:没有那个文件或目录
make: *** [pcap-linux.o] 错误 1

重新下载4.1.1-WpcapSrc.zip源码包,编译成功:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
[root@node1 libpcap]# make
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./pcap-linux.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./pcap-usb-linux.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./fad-getad.c
sed -e 's/.*/static const char pcap_version_string[] = "libpcap version &";/' ./VERSION > version.h
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./pcap.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./inet.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./gencode.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./optimize.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./nametoaddr.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./etherent.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./savefile.c
rm -f bpf_filter.c
ln -s ./bpf/net/bpf_filter.c bpf_filter.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c bpf_filter.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./bpf_image.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./bpf_dump.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c scanner.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -Dyylval=pcap_lval -c grammar.c
sed -e 's/.*/char pcap_version[] = "&";/' ./VERSION > version.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c version.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./pcap-new.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./pcap-remote.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./sockutils.c
ar rc libpcap.a pcap-linux.o pcap-usb-linux.o fad-getad.o pcap.o inet.o gencode.o optimize.o nametoaddr.o etherent.o savefile.o bpf_filter.o bpf_image.o bpf_dump.o scanner.o grammar.o version.o pcap-new.o pcap-remote.o sockutils.o
ranlib libpcap.a
sed -e 's|@includedir[@]|/usr/local/include|g' \
-e 's|@libdir[@]|/usr/local/lib|g' \
-e 's|@DEPLIBS[@]||g' \
pcap-config.in >pcap-config.tmp
mv pcap-config.tmp pcap-config
chmod a+x pcap-config
[root@node1 libpcap]#
[root@node1 libpcap]# cd rpcapd
[root@node1 rpcapd]# make
gcc -pthread -DHAVE_REMOTE -DHAVE_SNPRINTF -I../ -c rpcapd.c
gcc -pthread -DHAVE_REMOTE -DHAVE_SNPRINTF -I../ -c daemon.c
daemon.c: 在函数‘daemon_AuthUserPwd’中:
daemon.c:684:30: 警告:将一个整数转换为大小不同的指针 [-Wint-to-pointer-cast]
if (strcmp(usersp->sp_pwdp, (char *) crypt(password, usersp->sp_pwdp) ) != 0)
^
gcc -pthread -DHAVE_REMOTE -DHAVE_SNPRINTF -I../ -c utils.c
gcc -pthread -DHAVE_REMOTE -DHAVE_SNPRINTF -I../ -c fileconf.c
gcc -pthread -DHAVE_REMOTE -DHAVE_SNPRINTF -I../ -c ../pcap-remote.c
gcc -pthread -DHAVE_REMOTE -DHAVE_SNPRINTF -I../ -c ../sockutils.c
gcc -pthread -DHAVE_REMOTE -DHAVE_SNPRINTF -I../ -c ../pcap-new.c
gcc -pthread -DHAVE_REMOTE -DHAVE_SNPRINTF -I../ -o rpcapd rpcapd.o daemon.o utils.o fileconf.o pcap-remote.o sockutils.o pcap-new.o -L../ -lpcap -lcrypt

启动rpcapd服务

1
2
[root@node1 rpcapd]#  ./rpcapd -4 -n -p 2002
Press CTRL + C to stop the server...

查看监听结果

1
2
[root@node1 ~]# netstat -anp|grep -w 2002
tcp 0 0 0.0.0.0:2002 0.0.0.0:* LISTEN 28399/./rpcapd

启动Wireshark,在Wireshark捕获->选项->管理接口->远程接口页面下新增主机端口,提示错误“PCAP没有发现”,查看资料[4],需要下载npcap解决。

远程接口连接后,后台提示如下错误:

1
2
3
[root@node1 rpcapd]#  ./rpcapd -4 -n -p 2002
Press CTRL + C to stop the server...
Not enough space in the temporary send buffer

按照资料[4]的解决方法,Wireshark页面配置后提示超时,后台依然报错:

1
2
3
4
5
6
[root@node1 rpcapd]#  ./rpcapd -n -p 2002
Press CTRL + C to stop the server...
Not enough space in the temporary send buffer.
The RPCAP runtime timeout has expired
I'm exiting from the child loop
Child terminated

考虑到该方法依赖rpcapd,且该依赖包需要在相同的环境下编译,暂不采用。

利用SSH remote capture功能

启动Wireshark,在Wireshark捕获->选项->输入页面下找到SSH remote capture,点击左侧的设置图标,打开ssh登录设置。

在弹出页面上配置ssh的连接参数,包括服务器地址,端口,用户名,密码(也可以用证书)等等。

配置完成后,点击开始按钮,开始远程抓包。

参考资料

1.https://zhuanlan.zhihu.com/p/551549544

2.https://blog.csdn.net/weixin_40991654/article/details/126779792

3.https://www.winpcap.org/archive/

4.https://blog.csdn.net/m0_37678467/article/details/127940287

由于工作需要,定位问题时可能需要访问redhat的知识库,参考资料[1],执行以下几步即可搞定:

操作步骤

第一步:登录 https://access.redhat.com/ 创建一个账号;

第二步:访问 https://developers.redhat.com/products/rhel/download 激活订阅(收到邮件并激活);

第三步:访问 https://access.redhat.com/management 确认一下我们的账号是否有 developer subscription

1
2
14904535	Red Hat Developer Subscription for Individuals
14904536 Red Hat Beta Access

第四步:用注册的用户名密码,激活一个rhel系统:

1
subscription-manager register --auto-attach --username ******** --password ********

第五步:访问https://access.redhat.com/solutions/6178422测试知识库是否能访问;

关于激活一个rhel系统的操作,为了快速方便,这里使用vagrant软件快速部署一个redhat8的操作系统。这个流程也仅需要以下几步:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# 初始化Vagrantfile文件
$ vagrant init generic/rhel8
A `Vagrantfile` has been placed in this directory. You are now
ready to `vagrant up` your first virtual environment! Please read
the comments in the Vagrantfile as well as documentation on
`vagrantup.com` for more information on using Vagrant.

# 启动redhat8系统
$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Box 'generic/rhel8' could not be found. Attempting to find and install...
default: Box Provider: virtualbox
default: Box Version: >= 0
==> default: Loading metadata for box 'generic/rhel8'
default: URL: https://vagrantcloud.com/generic/rhel8
==> default: Adding box 'generic/rhel8' (v4.3.12) for provider: virtualbox
default: Downloading: https://vagrantcloud.com/generic/boxes/rhel8/versions/4.3.12/providers/virtualbox/amd64/vagrant.box
default:
default: Calculating and comparing box checksum...
==> default: Successfully added box 'generic/rhel8' (v4.3.12) for 'virtualbox'!
==> default: Importing base box 'generic/rhel8'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'generic/rhel8' version '4.3.12' is up to date...
==> default: Setting the name of the VM: Redhat8_default_1715673627487_1933
==> default: Vagrant has detected a configuration issue which exposes a
==> default: vulnerability with the installed version of VirtualBox. The
==> default: current guest is configured to use an E1000 NIC type for a
==> default: network adapter which is vulnerable in this version of VirtualBox.
==> default: Ensure the guest is trusted to use this configuration or update
==> default: the NIC type using one of the methods below:
==> default:
==> default: https://www.vagrantup.com/docs/virtualbox/configuration.html#default-nic-type
==> default: https://www.vagrantup.com/docs/virtualbox/networking.html#virtualbox-nic-type
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
default: Adapter 1: nat
==> default: Forwarding ports...
default: 22 (guest) => 2222 (host) (adapter 1)
==> default: Running 'pre-boot' VM customizations...
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
default: SSH address: 127.0.0.1:2222
default: SSH username: vagrant
default: SSH auth method: private key
The guest machine entered an invalid state while waiting for it
to boot. Valid states are 'starting, running'. The machine is in the
'paused' state. Please verify everything is configured
properly and try again.

If the provider you're using has a GUI that comes with it,
it is often helpful to open that and watch the machine, since the
GUI often has more helpful error messages than Vagrant can retrieve.
For example, if you're using VirtualBox, run `vagrant up` while the
VirtualBox GUI is open.

The primary issue for this error is that the provider you're using
is not properly configured. This is very rarely a Vagrant issue.

# 上面的命令执行后系统处于paused状态,再启动一下
$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Checking if box 'generic/rhel8' version '4.3.12' is up to date...
==> default: Unpausing the VM...

# ssh到新安装的redhat系统
$ vagrant ssh
Register this system with Red Hat Insights: insights-client --register
Create an account or view all your systems at https://red.ht/insights-dashboard

# 在redhat系统中执行注册
[root@rhel8 ~]# subscription-manager register --auto-attach --username ******** --password ********
Registering to: subscription.rhsm.redhat.com:443/subscription
The system has been registered with ID: xxxx-xxxx-xxxx-xxxx-xxxx
The registered system name is: rhel8.localdomain

# 系统使用完,关机即可
$ vagrant halt
==> default: Attempting graceful shutdown of VM...
default:
default: Vagrant insecure key detected. Vagrant will automatically replace
default: this with a newly generated keypair for better security.
default:
default: Inserting generated public key within guest...
default: Removing insecure key from the guest if it's present...
default: Key inserted! Disconnecting and reconnecting using new SSH key...

参考资料

https://wangzheng422.github.io/docker_env/notes/2022/2022.04.no-cost.rhel.sub.html

问题背景

K8S环境中,某个业务由于误操作重启了系统的dbus服务,导致所有的Pod启动失败,相关日志如下:

1
unable to ensure pod container exists: failed to create container for [kubepods besteffort ...] : dbus: connection closed by user

原因分析

根据错误信息,查到相关issue[1],原因如下:

kubelet服务在创建Pod时会调用/var/run/dbus/system_bus_socket,如果dbus服务由于某些异常发生重启,/var/run/dbus/system_bus_socket这个文件就会被重新创建。此时,kubelet继续向旧的socket发送数据,就会出现上述的报错信息。

解决方案

临时方案:重启kubelet服务

永久方案:升级K8S版本到v1.25+

后续问题

重启过dbuskubelet服务后,出现非root用户ssh远程慢的现象。查看secure日志,发现如下错误:

1
2
pam_systemd(crond:session): Failed to create session: Activation of org.freedesktop.login1 timed out
pam_systemd(crond:session): Failed to create session: Connection timed out

查看资料[2],原因是ssh依赖systemd-logind服务,而该服务又依赖dbus服务,通过重启systemd-logind服务解决:

1
[root@core log]# systemctl restart systemd-logind 

参考资料

1.https://github.com/kubernetes/kubernetes/issues/100328

2.https://www.jianshu.com/p/bb66d7f8c859

问题现象

K8S集群所有节点之间网络异常,无法执行正常的SSH操作。

原因分析

基于该现象,首先怀疑是使用的密码错误,先排查使用的密码和实际密码和是否一致,经确认业务存储的密码跟实际密码是一致的,排除密码不一致的问题;

再排查是不是有异常的ip使用错误密码连接:

这里使用的是ipv6地址,需要注意,默认的netstat命令看到的ipv6地址是不全的,无法方便看出完整的ip地址,需要添加-W命令完整显示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@node1 ~]# netstat -anp -v|grep -w 22 
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 156183/sshd: /usr/s
tcp6 0 0 :::22 :::* LISTEN 156183/sshd: /usr/s
tcp6 0 0 2000:8080:5a0a:2f:59732 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:44072 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:35666 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:42998 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:59834 2000:8080:5a0a:2f40::22 ESTABLISHED 170769/java
tcp6 0 0 2000:8080:5a0a:2f:59652 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:39430 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:35648 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:36852 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:43162 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:35002 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:36052 2000:8080:5a0a:2f40::22 ESTABLISHED 170769/java

完整ipv6地址的ssh连接如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@node1 ~]# netstat -anp -W|grep -w 22 |grep -v ::4
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 156183/sshd: /usr/s
tcp6 0 0 :::22 :::* LISTEN 156183/sshd: /usr/s
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:48950 2000:8080:5a0a:2f40:8002::5:22 ESTABLISHED 170769/java
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:52506 2000:8080:5a0a:2f40:8002::5:22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:56798 2000:8080:5a0a:2f40:8002::6:22 ESTABLISHED 170769/java
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:52624 2000:8080:5a0a:2f40:8002::5:22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:56860 2000:8080:5a0a:2f40:8002::6:22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:52396 2000:8080:5a0a:2f40:8002::5:22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:52398 2000:8080:5a0a:2f40:8002::5:22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:22 2000:8080:5a0a:2f40:8002::5:45532 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:52202 2000:8080:5a0a:2f40:8002::5:22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:52348 2000:8080:5a0a:2f40:8002::5:22 ESTABLISHED 170769/java

从上面的记录看,至少当前没有异常ipssh连接,再确认一下是不是之前出现过错误密码导致密码被锁的情况;

查看/var/log/secure日志(日志已发生过轮转,无法确认出问题的初始时间点),查看系统最近没有发生过重启,继续看journal --boot里的登录失败日志,找到了出问题的时间点,并且可以看到源ip地址2000:8080:5a0a:2f47::2一直使用错误密码登录:

1
2
3
4
5
6
7
8
cat boot.log |grep "Failed password"|less
3月 26 10:42:19 node1 sshd[114043]: Failed password for admin from 2000:8080:5a0a:2f47::2 port 34968 ssh2
3月 26 10:42:23 node1 sshd[114043]: Failed password for admin from 2000:8080:5a0a:2f47::2 port 34968 ssh2
3月 26 10:42:25 node1 sshd[114043]: Failed password for admin from 2000:8080:5a0a:2f47::2 port 34968 ssh2
3月 26 10:42:28 node1 sshd[114043]: Failed password for admin from 2000:8080:5a0a:2f47::2 port 34968 ssh2
3月 26 10:42:31 node1 sshd[116187]: Failed password for admin from 2000:8080:5a0a:2f47::2 port 35194 ssh2
3月 26 10:42:34 node1 sshd[116187]: Failed password for admin from 2000:8080:5a0a:2f47::2 port 35194 ssh2
3月 26 10:42:36 node1 sshd[116187]: Failed password for admin from 2000:8080:5a0a:2f47::2 port 35194 ssh2

正常来说,使用错误密码登录失败后,密码被锁到指定时间后会自动解锁。但问题环境当前没有错误密码连接的情况下,使用正确密码依然无法连接。

临时注释/etc/pam.d/security-auth/etc/pam.d/password-authauth相关的配置,验证ssh异常是否是密码锁配置导致:

1
# auth required pam_tally2.so onerr=fail deny=5 unlock_time=900 even_deny_root

修改完观察一段时间,ssh恢复正常,还原回去后,ssh又出现异常,基本确认是配置问题。从系统相关同事了解到,这里使用的密码锁定模块tally是个老模块,因为存在缺陷已经被废弃,其中一个问题就是:在使用错误密码被锁后,即使密码正确了,也无法解除锁定。建议使用faillock模块替代,配置方法如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@node1 ~]# vim /etc/pam.d/system-auth 或者 vi /etc/pam.d/login
# 在文件开头增加如下内容:
auth [success=1 default=bad] pam_unix.so
auth [default=die] pam_faillock.so authfail deny=5 even_deny_root unlock_time=900 root_unlock_time=10
auth sufficient pam_faillock.so authsucc deny=5 even_deny_root unlock_time=900 root_unlock_time=10
auth required pam_deny.so

[root@node1 ~]# vim /etc/pam.d/password-auth 或者 vi /etc/pam.d/sshd
在文件第二行(第一行为 #%PAM-1.0 )增加如下内容:
auth [success=1 default=bad] pam_unix.so
auth [default=die] pam_faillock.so authfail deny=5 even_deny_root unlock_time=900 root_unlock_time=10
auth sufficient pam_faillock.so authsucc deny=5 even_deny_root unlock_time=900 root_unlock_time=10
auth required pam_deny.so

说明:faillock模块远程登录、本地登录过程中,用户锁定均不会有任何提示,只会出现锁定期间即使密码输入正确也无法登录系统的现象,解锁后可正常登录。

至于为什么出现这个问题,最后了解到是客户那边的漏扫平台使用弱密码故意扫的,正常只会扫一次,不清楚为什么触发扫了多次。

解决方案

锁密码的安全加固使用faillock模块替代老版本的tally模块。

1.找到harborconfigmap配置文件,以harbor-cm.yaml为例,修改配置文件,vim harbor-cm.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
http {
...
server {
listen 80;
server_tokens off;
client_max_body_size 0;

location / {
proxy_pass http://localhost:80/; --删除该配置项
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

proxy_buffering off;
proxy_request_buffering off;
return 403; --新增该配置项
}
...

2.执行kubectl apply -f harbor-cm.yaml命令更新配置
3.找到harborpod所在yaml,以harbor1.yaml为例,vim harbor1.yaml,修改harbor nginx的探针,把默认探测 / 修改为探测 /api/systeminfo

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
- image: goharbor/nginx-photon:v1.6.4
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /api/systeminfo --修改点
port: 80
initialDelaySeconds: 1
periodSeconds: 10
name: nginx
ports:
- containerPort: 80
readinessProbe:
httpGet:
path: /api/systeminfo --修改点
port: 80
initialDelaySeconds: 1
periodSeconds: 10
  1. 执行kubectl apply -f harbor1.yaml重启harbor服务
  2. 重新登录harbor页面,预期返回403 Forbidden,无法登录

问题背景

在使用国产海光CPU Hygon C86 7265的服务器上部署K8S集群时,出现calico-node启动失败,相关日志如下:

1
2
3
4
5
6
7
8
9
10
[root@node1 ~]# kubectl get pod -A -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system calico-kube-controllers-7c7986989c-bwvw4 0/1 Pending 0 5m11s <none> <none>
kube-system calico-node-v64fv 0/1 CrashLoopBackOff 5 5m11s 10.10.26.120 node1
kube-system coredns-6db7677797-jkhpd 0/1 Pending 0 5m11s <none> <none>
kube-system coredns-6db7677797-r58c5 0/1 Pending 0 5m11s <none> <none>
kube-system kube-apiserver-node1 1/1 Running 6 5m23s 10.10.26.120 node1
kube-system kube-controller-manager-node1 1/1 Running 8 5m28s 10.10.26.120 node1
kube-system kube-proxy-ncw4g 1/1 Running 0 5m11s 10.10.26.120 node1
kube-system kube-scheduler-node1 1/1 Running 6 5m29s 10.10.26.120 node1

原因分析

查看具体错误日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
[root@node1 ~]# kubectl logs -n kube-system calico-node-v64fv
2024-04-03 14:29:25.424 [INFO][9] startup/startup.go 427: Early log level set to info
2024-04-03 14:29:25.425 [INFO][9] startup/utils.go 131: Using HOSTNAME environment (lowercase) for node name node1
2024-04-03 14:29:25.425 [INFO][9] startup/utils.go 139: Determined node name: node1
2024-04-03 14:29:25.428 [INFO][9] startup/startup.go 106: Skipping datastore connection test
CRNGT failed.
SIGABRT: abort
PC=0x7efbf7409a9f m=13 sigcode=18446744073709551610

goroutine 0 [idle]:
runtime: unknown pc 0x7efbf7409a9f
stack: frame={sp:0x7efbaa7fb780, fp:0x0} stack=[0x7efba9ffc250,0x7efbaa7fbe50)fffff 0x00007efbf7de02cc
0x00007efbaa7fb6c0: 0x00007efbf73c2340 0x00007efbf7ffbed0
0x00007efbaa7fb6d0: 0x00007efbf73c8cd0 0x00007efbf7ffbed0
0x00007efbaa7fb6e0: 0x0000000000000001 0x00007efbf7de06be
0x00007efbaa7fb6f0: 0x000000000000015f 0x00007efbf7783360
0x00007efbaa7fb700: 0x00007efbf7ffb9e0 0x0000000004904060
0x00007efbaa7fb710: 0x00007efbaa7fbdf0 0x0000000000000000
0x00007efbaa7fb720: 0x0000000000000020 0x00007efb94000dd0
0x00007efbaa7fb730: 0x00007efb94000dd0 0x00007efbf7de5574
0x00007efbaa7fb740: 0x0000000000000005 0x0000000000000000
0x00007efbaa7fb750: 0x0000000000000005 0x00007efbf73c2340
0x00007efbaa7fb760: 0x00007efbaa7fb9b0 0x00007efbf7dd2ae7
0x00007efbaa7fb770: 0x0000000000000001 0x00007efbf74db5df
0x00007efbaa7fb780: <0x0000000000000000 0x00007efbf777c850
0x00007efbaa7fb790: 0x0000000000000000 0x0000000000000000
0x00007efbaa7fb7a0: 0x0000000000000000 0x0000000000000000
0x00007efbaa7fb7b0: 0x000000000000037f 0x0000000000000000
0x00007efbaa7fb7c0: 0x0000000000000000 0x0002ffff00001fa0
0x00007efbaa7fb7d0: 0x0000000000000000 0x0000000000000000
0x00007efbaa7fb7e0: 0x0000000000000000 0x0000000000000000
0x00007efbaa7fb7f0: 0x0000000000000000 0x0000000000000000
0x00007efbaa7fb800: 0xfffffffe7fffffff 0xffffffffffffffff
0x00007efbaa7fb810: 0xffffffffffffffff 0xffffffffffffffff
0x00007efbaa7fb820: 0xffffffffffffffff 0xffffffffffffffff
0x00007efbaa7fb830: 0xffffffffffffffff 0xffffffffffffffff
0x00007efbaa7fb840: 0xffffffffffffffff 0xffffffffffffffff
0x00007efbaa7fb850: 0xffffffffffffffff 0xffffffffffffffff
0x00007efbaa7fb860: 0xffffffffffffffff 0xffffffffffffffff
0x00007efbaa7fb870: 0xffffffffffffffff 0xffffffffffffffff
runtime: unknown pc 0x7efbf7409a9f
stack: frame={sp:0x7efbaa7fb780, fp:0x0} stack=[0x7efba9ffc250,0x7efbaa7fbe50)
0x00007efbaa7fb680: 0x00007efbaa7fb6c0 0x00007efbf8000558
0x00007efbaa7fb690: 0x0000000000000000 0x00007efbf8000558
0x00007efbaa7fb6a0: 0x0000000000000001 0x0000000000000000
0x00007efbaa7fb6b0: 0x00000000ffffffff 0x00007efbf7de02cc
0x00007efbaa7fb6c0: 0x00007efbf73c2340 0x00007efbf7ffbed0
0x00007efbaa7fb6d0: 0x00007efbf73c8cd0 0x00007efbf7ffbed0
0x00007efbaa7fb6e0: 0x0000000000000001 0x00007efbf7de06be
0x00007efbaa7fb6f0: 0x000000000000015f 0x00007efbf7783360
0x00007efbaa7fb700: 0x00007efbf7ffb9e0 0x0000000004904060
0x00007efbaa7fb710: 0x00007efbaa7fbdf0 0x0000000000000000
0x00007efbaa7fb720: 0x0000000000000020 0x00007efb94000dd0
0x00007efbaa7fb730: 0x00007efb94000dd0 0x00007efbf7de5574
0x00007efbaa7fb740: 0x0000000000000005 0x0000000000000000
0x00007efbaa7fb750: 0x0000000000000005 0x00007efbf73c2340
0x00007efbaa7fb760: 0x00007efbaa7fb9b0 0x00007efbf7dd2ae7
0x00007efbaa7fb770: 0x0000000000000001 0x00007efbf74db5df
0x00007efbaa7fb780: <0x0000000000000000 0x00007efbf777c850
0x00007efbaa7fb790: 0x0000000000000000 0x0000000000000000
0x00007efbaa7fb7a0: 0x0000000000000000 0x0000000000000000
0x00007efbaa7fb7b0: 0x000000000000037f 0x0000000000000000
0x00007efbaa7fb7c0: 0x0000000000000000 0x0002ffff00001fa0
0x00007efbaa7fb7d0: 0x0000000000000000 0x0000000000000000
0x00007efbaa7fb7e0: 0x0000000000000000 0x0000000000000000
0x00007efbaa7fb7f0: 0x0000000000000000 0x0000000000000000
0x00007efbaa7fb800: 0xfffffffe7fffffff 0xffffffffffffffff
0x00007efbaa7fb810: 0xffffffffffffffff 0xffffffffffffffff
0x00007efbaa7fb820: 0xffffffffffffffff 0xffffffffffffffff
0x00007efbaa7fb830: 0xffffffffffffffff 0xffffffffffffffff
0x00007efbaa7fb840: 0xffffffffffffffff 0xffffffffffffffff
0x00007efbaa7fb850: 0xffffffffffffffff 0xffffffffffffffff
0x00007efbaa7fb860: 0xffffffffffffffff 0xffffffffffffffff
0x00007efbaa7fb870: 0xffffffffffffffff 0xffffffffffffffff

goroutine 131 [syscall]:
runtime.cgocall(0x2629971, 0xc00067da20)
/usr/local/go-cgo/src/runtime/cgocall.go:156 +0x5c fp=0xc00067d9f8 sp=0xc00067d9c0 pc=0x41081c
crypto/internal/boring._Cfunc__goboringcrypto_RAND_bytes(0xc0006ba680, 0x20)
_cgo_gotypes.go:1140 +0x4c fp=0xc00067da20 sp=0xc00067d9f8 pc=0x66a0ac
crypto/internal/boring.randReader.Read(0x0, {0xc0006ba680, 0x20, 0x20})
/usr/local/go-cgo/src/crypto/internal/boring/rand.go:21 +0x31 fp=0xc00067da48 sp=0xc00067da20 pc=0x66e691
crypto/internal/boring.(*randReader).Read(0x3333408, {0xc0006ba680, 0xc00067dab0, 0x45acb2})
<autogenerated>:1 +0x34 fp=0xc00067da78 sp=0xc00067da48 pc=0x6754f4
io.ReadAtLeast({0x336aba0, 0x3333408}, {0xc0006ba680, 0x20, 0x20}, 0x20)
/usr/local/go-cgo/src/io/io.go:328 +0x9a fp=0xc00067dac0 sp=0xc00067da78 pc=0x4b6ffa
io.ReadFull(...)
/usr/local/go-cgo/src/io/io.go:347
crypto/tls.(*Conn).makeClientHello(0xc000410700)
/usr/local/go-cgo/src/crypto/tls/handshake_client.go:107 +0x6a5 fp=0xc00067dbe8 sp=0xc00067dac0 pc=0x728f25
crypto/tls.(*Conn).clientHandshake(0xc000410700, {0x33dd910, 0xc000a91880})
/usr/local/go-cgo/src/crypto/tls/handshake_client.go:157 +0x96 fp=0xc00067de78 sp=0xc00067dbe8 pc=0x7295f6
crypto/tls.(*Conn).clientHandshake-fm({0x33dd910, 0xc000a91880})
/usr/local/go-cgo/src/crypto/tls/handshake_client.go:148 +0x39 fp=0xc00067dea0 sp=0xc00067de78 pc=0x759899
crypto/tls.(*Conn).handshakeContext(0xc000410700, {0x33dd980, 0xc00057a240})
/usr/local/go-cgo/src/crypto/tls/conn.go:1445 +0x3d1 fp=0xc00067df70 sp=0xc00067dea0 pc=0x727bf1
crypto/tls.(*Conn).HandshakeContext(...)
/usr/local/go-cgo/src/crypto/tls/conn.go:1395
net/http.(*persistConn).addTLS.func2()
/usr/local/go-cgo/src/net/http/transport.go:1534 +0x71 fp=0xc00067dfe0 sp=0xc00067df70 pc=0x7d4dd1
runtime.goexit()
/usr/local/go-cgo/src/runtime/asm_amd64.s:1581 +0x1 fp=0xc00067dfe8 sp=0xc00067dfe0 pc=0x4759c1
created by net/http.(*persistConn).addTLS
/usr/local/go-cgo/src/net/http/transport.go:1530 +0x345

goroutine 1 [select]:
net/http.(*Transport).getConn(0xc00033e140, 0xc0006b6f80, {{}, 0x0, {0xc000c6a0a0, 0x5}, {0xc00046c680, 0xd}, 0x0})
/usr/local/go-cgo/src/net/http/transport.go:1372 +0x5d2
net/http.(*Transport).roundTrip(0xc00033e140, 0xc000650e00)
/usr/local/go-cgo/src/net/http/transport.go:581 +0x774
net/http.(*Transport).RoundTrip(0x2cc6880, 0xc000a6b1d0)
/usr/local/go-cgo/src/net/http/roundtrip.go:18 +0x19
k8s.io/client-go/transport.(*bearerAuthRoundTripper).RoundTrip(0xc000a98360, 0xc000650a00)
/go/pkg/mod/k8s.io/client-go@v0.23.3/transport/round_trippers.go:317 +0x242
net/http.send(0xc000650900, {0x336a040, 0xc000a98360}, {0x2e2a640, 0x4d0701, 0x4a61720})
/usr/local/go-cgo/src/net/http/client.go:252 +0x5d8
net/http.(*Client).send(0xc000a983f0, 0xc000650900, {0x2ec14f9, 0xe, 0x4a61720})
/usr/local/go-cgo/src/net/http/client.go:176 +0x9b
net/http.(*Client).do(0xc000a983f0, 0xc000650900)
/usr/local/go-cgo/src/net/http/client.go:725 +0x908
net/http.(*Client).Do(...)
/usr/local/go-cgo/src/net/http/client.go:593
k8s.io/client-go/rest.(*Request).request(0xc000650700, {0x33dd980, 0xc00057a240}, 0x4a95f98)
/go/pkg/mod/k8s.io/client-go@v0.23.3/rest/request.go:980 +0x439
k8s.io/client-go/rest.(*Request).Do(0x20, {0x33dd948, 0xc000056060})
/go/pkg/mod/k8s.io/client-go@v0.23.3/rest/request.go:1038 +0xcc
k8s.io/client-go/kubernetes/typed/core/v1.(*configMaps).Get(0xc00004fa40, {0x33dd948, 0xc000056060}, {0x2ec14f9, 0xe}, {{{0x0, 0x0}, {0x0, 0x0}}, {0x0, ...}})
/go/pkg/mod/k8s.io/client-go@v0.23.3/kubernetes/typed/core/v1/configmap.go:78 +0x15a
github.com/projectcalico/calico/node/pkg/lifecycle/startup.Run()
/go/src/github.com/projectcalico/calico/node/pkg/lifecycle/startup/startup.go:148 +0x422
main.main()
/go/src/github.com/projectcalico/calico/node/cmd/calico-node/main.go:142 +0x732

goroutine 8 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0xc000202900)
/go/pkg/mod/k8s.io/klog/v2@v2.40.1/klog.go:1283 +0x6a
created by k8s.io/klog/v2.init.0
/go/pkg/mod/k8s.io/klog/v2@v2.40.1/klog.go:420 +0xfb

goroutine 29 [select]:
net/http.setRequestCancel.func4()
/usr/local/go-cgo/src/net/http/client.go:398 +0x94
created by net/http.setRequestCancel
/usr/local/go-cgo/src/net/http/client.go:397 +0x43e

goroutine 30 [chan receive]:
net/http.(*persistConn).addTLS(0xc0005617a0, {0x33dd980, 0xc00057a240}, {0xc00046c680, 0x9}, 0x0)
/usr/local/go-cgo/src/net/http/transport.go:1540 +0x365
net/http.(*Transport).dialConn(0xc00033e140, {0x33dd980, 0xc00057a240}, {{}, 0x0, {0xc000c6a0a0, 0x5}, {0xc00046c680, 0xd}, 0x0})
/usr/local/go-cgo/src/net/http/transport.go:1614 +0xab7
net/http.(*Transport).dialConnFor(0x0, 0xc00077d6b0)
/usr/local/go-cgo/src/net/http/transport.go:1446 +0xb0
created by net/http.(*Transport).queueForDial
/usr/local/go-cgo/src/net/http/transport.go:1415 +0x3d7

goroutine 183 [select]:
google.golang.org/grpc.(*ccBalancerWrapper).watcher(0xc0001ac0a0)
/go/pkg/mod/google.golang.org/grpc@v1.40.0/balancer_conn_wrappers.go:71 +0xa5
created by google.golang.org/grpc.newCCBalancerWrapper
/go/pkg/mod/google.golang.org/grpc@v1.40.0/balancer_conn_wrappers.go:62 +0x246

goroutine 184 [chan receive]:
google.golang.org/grpc.(*addrConn).resetTransport(0xc000660000)
/go/pkg/mod/google.golang.org/grpc@v1.40.0/clientconn.go:1219 +0x48f
created by google.golang.org/grpc.(*addrConn).connect
/go/pkg/mod/google.golang.org/grpc@v1.40.0/clientconn.go:849 +0x147

goroutine 194 [select]:
google.golang.org/grpc/internal/transport.(*http2Client).keepalive(0xc00000c5a0)
/go/pkg/mod/google.golang.org/grpc@v1.40.0/internal/transport/http2_client.go:1569 +0x169
created by google.golang.org/grpc/internal/transport.newHTTP2Client
/go/pkg/mod/google.golang.org/grpc@v1.40.0/internal/transport/http2_client.go:350 +0x18a5

goroutine 195 [IO wait]:
internal/poll.runtime_pollWait(0x7efbf7e43728, 0x72)
/usr/local/go-cgo/src/runtime/netpoll.go:303 +0x85
internal/poll.(*pollDesc).wait(0xc000a92980, 0xc0005b4000, 0x0)
/usr/local/go-cgo/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go-cgo/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000a92980, {0xc0005b4000, 0x8000, 0x8000})
/usr/local/go-cgo/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc000a92980, {0xc0005b4000, 0x1060100000000, 0x8})
/usr/local/go-cgo/src/net/fd_posix.go:56 +0x29
net.(*conn).Read(0xc00060e078, {0xc0005b4000, 0x9c8430, 0xc00033c500})
/usr/local/go-cgo/src/net/net.go:183 +0x45
bufio.(*Reader).Read(0xc000197380, {0xc0003c04a0, 0x9, 0x18})
/usr/local/go-cgo/src/bufio/bufio.go:227 +0x1b4
io.ReadAtLeast({0x3365ca0, 0xc000197380}, {0xc0003c04a0, 0x9, 0x9}, 0x9)
/usr/local/go-cgo/src/io/io.go:328 +0x9a
io.ReadFull(...)
/usr/local/go-cgo/src/io/io.go:347
golang.org/x/net/http2.readFrameHeader({0xc0003c04a0, 0x9, 0x3f69d15}, {0x3365ca0, 0xc000197380})
/go/pkg/mod/golang.org/x/net@v0.0.0-20220520000938-2e3eb7b945c2/http2/frame.go:237 +0x6e
golang.org/x/net/http2.(*Framer).ReadFrame(0xc0003c0460)
/go/pkg/mod/golang.org/x/net@v0.0.0-20220520000938-2e3eb7b945c2/http2/frame.go:498 +0x95
google.golang.org/grpc/internal/transport.(*http2Client).reader(0xc00000c5a0)
/go/pkg/mod/google.golang.org/grpc@v1.40.0/internal/transport/http2_client.go:1495 +0x41f
created by google.golang.org/grpc/internal/transport.newHTTP2Client
/go/pkg/mod/google.golang.org/grpc@v1.40.0/internal/transport/http2_client.go:355 +0x18ef

goroutine 196 [select]:
google.golang.org/grpc/internal/transport.(*controlBuffer).get(0xc000204230, 0x1)
/go/pkg/mod/google.golang.org/grpc@v1.40.0/internal/transport/controlbuf.go:406 +0x11b
google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc000197440)
/go/pkg/mod/google.golang.org/grpc@v1.40.0/internal/transport/controlbuf.go:533 +0x85
google.golang.org/grpc/internal/transport.newHTTP2Client.func3()
/go/pkg/mod/google.golang.org/grpc@v1.40.0/internal/transport/http2_client.go:405 +0x65
created by google.golang.org/grpc/internal/transport.newHTTP2Client
/go/pkg/mod/google.golang.org/grpc@v1.40.0/internal/transport/http2_client.go:403 +0x1f45

goroutine 132 [select]:
crypto/tls.(*Conn).handshakeContext.func2()
/usr/local/go-cgo/src/crypto/tls/conn.go:1421 +0x9e
created by crypto/tls.(*Conn).handshakeContext
/usr/local/go-cgo/src/crypto/tls/conn.go:1420 +0x1bd

rax 0x0
rbx 0x6
rcx 0xffffffffffffffff
rdx 0x0
rdi 0x2
rsi 0x7efbaa7fb780
rbp 0x7efbaa7fbdf0
rsp 0x7efbaa7fb780
r8 0x0
r9 0x7efbaa7fb780
r10 0x8
r11 0x246
r12 0x0
r13 0x20
r14 0x7efb94000dd0
r15 0x7efb94000dd0
rip 0x7efbf7409a9f
rflags 0x246
cs 0x33
fs 0x0
gs 0x0
Calico node failed to start

经沟通确认,该环境之前安装的版本没有问题,最新的版本才出现。考虑到最新版本升级过calico,先怀疑是不是calico版本的问题。通过升级calico到最新版本,发现问题依然存在。

根据错误日志CRNGT failed查找相关资料[1],发现有人遇到相同的错误,原因是使用的AMD Ryzen 9 3000 系列CPU存在bug,导致RNRAND 在某些 BIOS 版本中无法正确生成随机数,解决方法是升级BIOS版本。

按照资料[1]提供的检测方法,创建一个main.go文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
package main

import (
"fmt"
"crypto/rand"
)

func main() {
a := make([]byte, 10)
_, err := rand.Read(a)
if err != nil {
panic(err)
}
fmt.Println(string(a))
}

执行如下命令,在问题服务器上未复现出问题:

1
$ GOEXPERIMENT=boringcrypto go run main.go

继续查资料[2],同样也是AMD Ryzen 9 3000 系列CPU,并且给出了另一种检查方法:

1
2
3
4
you@ubuntu-live:~$ wget https://cdn.arstechnica.net/wp-content/uploads/2019/10/rdrand-test.zip
you@ubuntu-live:~$ unzip rdrand-test.zip
you@ubuntu-live:~$ cd rdrand-test
you@ubuntu-live:~$ ./amd-rdrand.bug

不过,这个链接已经失效了,继续搜索相关资料,又找了一个测试工具[3],并提供了二进制文件,在异常服务器上测试效果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
[root@node1 ~]# ./RDRAND_Tester_Linux_x86_64 --第一次随机数不同
RDRAND Tester v20210328 x86_64
Compiled on Apr 16 2021
Compiled with GNU Compiler Collection (GCC) 10.3.0
Running on Hygon C86 7265 24-core Processor
This CPU supports the following instructions:
RDRAND: Supported
RDSEED: Supported

Testing RDRAND...
try: 1 success: 1 random number: 17705883718297935842 (0xf5b7ef6e97855fe2)
try: 2 success: 1 random number: 6443855104021096318 (0x596d2c137b43e77e)
try: 3 success: 1 random number: 10126471306861746785 (0x8c88740051ae1a61)
try: 4 success: 1 random number: 13463061200056996464 (0xbad666d4c2bdd270)
try: 5 success: 1 random number: 7695825692332247646 (0x6acd10b164c9ca5e)
try: 6 success: 1 random number: 1263849930341660097 (0x118a18d0c36ab5c1)
try: 7 success: 1 random number: 2580393233033016710 (0x23cf65f953c13586)
try: 8 success: 1 random number: 1842118076754864861 (0x199084a17f4caadd)
try: 9 success: 1 random number: 2896900625228522073 (0x2833dbc52c5a6259)
try: 10 success: 1 random number: 3899901262805814503 (0x361f3b8934a34ce7)
try: 11 success: 1 random number: 3597359862242937122 (0x31ec63bc2e3d0922)
try: 12 success: 1 random number: 12246743104637488545 (0xa9f52bf7b761cda1)
try: 13 success: 1 random number: 16491679937497687446 (0xe4de3786c7fc6596)
try: 14 success: 1 random number: 7270227793600200162 (0x64e509a8b1b63de2)
try: 15 success: 1 random number: 15697857806096052438 (0xd9d9fe80faf2b0d6)
try: 16 success: 1 random number: 2546933488048450266 (0x235886835dacaada)
try: 17 success: 1 random number: 6670897529050922874 (0x5c93c9f5701c7f7a)
try: 18 success: 1 random number: 14670415794664541721 (0xcb97c97024428e19)
try: 19 success: 1 random number: 2452728878003037248 (0x2209d7eb5fb6a440)
try: 20 success: 1 random number: 16252906931536406850 (0xe18decbe1db62942)

The RDRAND instruction of this CPU appears to be working.
The numbers generated should be different and random.
If the numbers generated appears to be similar, the RDRAND instruction is
broken.

[root@node1 ~]# ./RDRAND_Tester_Linux_x86_64 --之后的随机数完全相同
RDRAND Tester v20210328 x86_64
Compiled on Apr 16 2021
Compiled with GNU Compiler Collection (GCC) 10.3.0
Running on Hygon C86 7265 24-core Processor
This CPU supports the following instructions:
RDRAND: Supported
RDSEED: Supported

Testing RDRAND...
try: 1 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)
try: 2 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)
try: 3 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)
try: 4 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)
try: 5 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)
try: 6 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)
try: 7 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)
try: 8 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)
try: 9 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)
try: 10 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)
try: 11 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)
try: 12 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)
try: 13 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)
try: 14 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)
try: 15 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)
try: 16 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)
try: 17 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)
try: 18 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)
try: 19 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)
try: 20 success: 1 random number: 18446744073709551615 (0xffffffffffffffff)

The RDRAND instruction of this CPU appears to be broken!
The numbers generated are NOT random but the CPU returns the success flag.

在正常的服务器上执行效果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
[root@node1 ~]# ./RDRAND_Tester_Linux_x86_64  --多次测试,随机数没有出现相同的情况
RDRAND Tester v20210328 x86_64
Compiled on Apr 16 2021
Compiled with GNU Compiler Collection (GCC) 10.3.0
Running on Hygon C86 7265 24-core Processor
This CPU supports the following instructions:
RDRAND: Supported
RDSEED: Supported

Testing RDRAND...
try: 1 success: 1 random number: 17914541561690204462 (0xf89d3ca29284292e)
try: 2 success: 1 random number: 14332812162628513309 (0xc6e860b931deee1d)
try: 3 success: 1 random number: 11906898495071391800 (0xa53dcd18875d1038)
try: 4 success: 1 random number: 5465211412374691004 (0x4bd854d2d9011cbc)
try: 5 success: 1 random number: 13927489571584093018 (0xc14861f96f3a2b5a)
try: 6 success: 1 random number: 70328156090550554 (0x00f9db15d97c491a)
try: 7 success: 1 random number: 9065062530023621999 (0x7dcd9257a0c3056f)
try: 8 success: 1 random number: 283806862943046502 (0x03f048d69289cb66)
try: 9 success: 1 random number: 7602503365830811759 (0x698184880c0ea06f)
try: 10 success: 1 random number: 3090051278467342602 (0x2ae2114416c9a10a)
try: 11 success: 1 random number: 2685951337108651825 (0x25466a82a458bf31)
try: 12 success: 1 random number: 15486706753868706299 (0xd6ebd5bd94fcb1fb)
try: 13 success: 1 random number: 11789666617122680772 (0xa39d4f52ede0efc4)
try: 14 success: 1 random number: 1388997005975229823 (0x1346b56aef3c157f)
try: 15 success: 1 random number: 11566015841037137779 (0xa082be20c78f3773)
try: 16 success: 1 random number: 14397918040333260716 (0xc7cfae2c9b4097ac)
try: 17 success: 1 random number: 10383120616855762267 (0x901841305bb8f55b)
try: 18 success: 1 random number: 6694856356368217838 (0x5ce8e8629f97f6ee)
try: 19 success: 1 random number: 2307408338273596455 (0x20058fa892927427)
try: 20 success: 1 random number: 6317182892917504808 (0x57ab245f0985bb28)

The RDRAND instruction of this CPU appears to be working.
The numbers generated should be different and random.
If the numbers generated appears to be similar, the RDRAND instruction is
broken.

对比两个服务器的BIOS差异:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
异常服务器:
[root@node1 ~]# dmidecode -t bios
dmidecode 3.1
Getting SMBIOS data from sysfs.
SMBIOS 3.1 present.

Handle 0x0068, DMI type 0, 26 bytes
BIOS Information
Vendor: Byosoft
Version: 3.07.09P01
Release Date: 12/16/2020
Address: 0xF0000
Runtime Size: 64 kB
ROM Size: 0 MB
Characteristics:
PCI is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
BIOS ROM is socketed
EDD is supported
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Targeted content distribution is supported
UEFI is supported
BIOS Revision: 17.0

Handle 0x0070, DMI type 13, 22 bytes
BIOS Language Information
Language Description Format: Long
Installable Languages: 2
en|US|iso8859-1
zh|CN|unicode
Currently Installed Language: zh|CN|unicode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
正常服务器:
[root@node1 ~]# dmidecode -t bios
dmidecode 3.1
Getting SMBIOS data from sysfs.
SMBIOS 3.1 present.

Handle 0x0069, DMI type 0, 26 bytes
BIOS Information
Vendor: Byosoft
Version: 5.19
Release Date: 03/04/2022
Address: 0xF0000
Runtime Size: 64 kB
ROM Size: 16 MB
Characteristics:
ISA is supported
PCI is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
BIOS ROM is socketed
EDD is supported
ACPI is supported
USB legacy is supported
ATAPI Zip drive boot is supported
BIOS boot specification is supported
Targeted content distribution is supported
UEFI is supported
System is a virtual machine
BIOS Revision: 5.19

Handle 0x0070, DMI type 13, 22 bytes
BIOS Language Information
Language Description Format: Long
Installable Languages: 2
en|US|iso8859-1
zh|CN|unicode
Currently Installed Language: zh|CN|unicode

对比发现,异常服务器的BIOS版本是Version: 3.07.09P01,而正常服务器的BIOS版本是Version: 5.19,基本确认是BIOS版本差异导致。最后升级BIOS版本后再次测试,随机数生成正常,calico-node也可以正常启动。

解决方案

升级BIOS的版本。

参考资料

1.https://github.com/projectcalico/calico/issues/7001
2.https://arstechnica.com/gadgets/2019/10/how-a-months-old-amd-microcode-bug-destroyed-my-weekend/
3.https://github.com/cjee21/RDRAND-Tester