0%

需求背景

在业务稳定运行过程中,希望能通过修改临时配置文件控制业务的行为,要求实时生效。

实现方案

已知karaf框架下,/etc/xxx.cfg文件的配置都是支持实时修改和生效的,参考这种实现机制自定义一个cfg文件即可:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
package com.xxx.xxx;

import org.osgi.framework.BundleContext;
import org.osgi.service.cm.ManagedService;
import org.osgi.service.component.annotations.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.Dictionary;

@Component(immediate = true, property = "service.pid=xxx") public class AutoConfig implements ManagedService {
protected static final Logger LOG = LoggerFactory.getLogger("xxx");

private static final String FLAG = "xxx.xxx.enabled";

public static boolean flag = true;

@Activate public void activate(BundleContext context) {
LOG.info("AutoConfig activated.");
}

@Deactivate public void deactivate() {
LOG.info("AutoConfig deactivated.");
}

@Override public void updated(Dictionary properties) {
try {
if (null == properties) {
LOG.info("Configuration updated with null properties, use default.");
return;
}

Object object = properties.get(FLAG);
if (null == object) {
LOG.info("Configuration updated with null properties, use default.");
return;
}

String valueFromConfigFile = (String) object;
if ("TRUE".equalsIgnoreCase(valueFromConfigFile)) {
flag = true;
} else if ("FALSE".equalsIgnoreCase(valueFromConfigFile)) {
flag = false;
} else {
LOG.error("Invalid value {} from config file, use default true.", valueFromConfigFile);
flag = true;
}

LOG.info("Configuration updated, flag={}", flag);
} catch (Exception e) {
LOG.error("Error updating configuration, use default true.", e);
}
}
}

参考资料

https://github.com/apache/karaf/tree/karaf-4.2.16/examples/karaf-config-example

背景

Windows环境中做一些删除或移动文件或文件夹的操作时,有时候会出现如下报错:

1
操作无法完成,因为其中的文件夹或文件已在另一个程序中打开,请关闭该文件夹或文件,然后重试。

报错信息说的很清楚,是文件或文件夹被其他程序占用了,这个时候我们如何知道是哪个程序占用的呢?

今天分享一个微软官方提供的小工具Handle[1],在不需要重启电脑的情况下快速查到是什么进程正在占用文件。

使用方法

下载Handle,参考资料[1]。

使用Handle,打开命令提示符,导航到Handle工具下载的目录。使用命令handle filename来识别占用该文件的进程。举例如下:

1
handle C:\path\to\your\file.txt

Handle会列出所有打开文件的句柄以及对应的进程ID(PID)和进程名称。找到了占用进程,就可以通过进程ID终止进程:

1
2
3
4
5
# cmd命令行
taskkill /PID <process_id>

# PowerShell命令行
Stop-Process -Id <process_id>

参考资料

https://learn.microsoft.com/en-us/sysinternals/downloads/handle

问题现象

部署在麒麟V10SP03环境上的K8S集群,kube-proxy组件使用conntrack命令删除表项失败。

1
2
[root@node1 ~]# docker run --privileged --net=host --rm  kube-proxy conntrack -D -p icmp -s 192.168.1.2
conntrack v1.4.5 (conntrack-tools): Operation failed: Operation not supported

原因分析

考虑到问题场景是使用的docker镜像带的conntrack工具,为了排除docker相关影响,在宿主机上安装conntrack工具并执行命令:

1
2
[root@node1 ~]# conntrack -D -p icmp -s 192.168.1.2
conntrack v1.4.5 (conntrack-tools): Operation failed: Operation not supported

验证结果报错,再排除conntrack工具层面的影响,直接使用如下c++程序调用libnetfilter_conntrack库构造删除:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <arpa/inet.h>

#include <libnetfilter_conntrack/libnetfilter_conntrack.h>
#include <libnetfilter_conntrack/libnetfilter_conntrack_tcp.h>

int main(void)
{
int ret;
struct nfct_handle *h;
struct nf_conntrack *ct;

ct = nfct_new();
if (!ct) {
perror("nfct_new");
return 0;
}

nfct_set_attr_u8(ct, ATTR_L3PROTO, AF_INET);
nfct_set_attr_u32(ct, ATTR_IPV4_SRC, inet_addr("1.1.1.1"));
nfct_set_attr_u32(ct, ATTR_IPV4_DST, inet_addr("2.2.2.2"));

nfct_set_attr_u8(ct, ATTR_L4PROTO, IPPROTO_TCP);
nfct_set_attr_u16(ct, ATTR_PORT_SRC, htons(20));
nfct_set_attr_u16(ct, ATTR_PORT_DST, htons(10));

h = nfct_open(CONNTRACK, 0);
if (!h) {
perror("nfct_open");
nfct_destroy(ct);
return -1;
}

ret = nfct_query(h, NFCT_Q_DESTROY, ct);

printf("TEST: delete conntrack ");
if (ret == -1)
printf("(%d)(%s)\n", ret, strerror(errno));
else
printf("(OK)\n");

nfct_close(h);

nfct_destroy(ct);

ret == -1 ? exit(EXIT_FAILURE) : exit(EXIT_SUCCESS);
}

验证结果依然报错:

1
2
[root@single ~]# ./conntrack_delete
TEST: delete conntrack (-1)(Operation not supported)

通过gdb查看,问题出在调用libnetfilter_conntrack.so库,初步判断跟内核有关。

1
2
3
4
(gdb) b nfct_query
Breakpoint 3 at 0x7ffff7fa6a34
(gdb) s
Breakpoint 3, 0x00007ffff7fa6a34 in nfct_query () from /lib64/libnetfilter_conntrack.so.3

之后查看了麒麟官网的内核版本列表,未发现相关bug,通过更新到最新内核4.19.90-52.39测试,问题依然存在。

回退到麒麟V10SP02的环境(内核版本是4.19.90-24.4),测试没有该问题。

基本确认,是SP03版本的内核引入的bug,提交issue后,官方计划在4.19.90-52.40内核版本里解决。

解决方案

更新内核版本到4.19.90-52.40

参考资料

https://update.cs2c.com.cn/NS/V10/V10SP3/os/adv/lic/base/x86_64/Packages/

https://update.cs2c.com.cn/NS/V10/V10SP3/os/adv/lic/updates/x86_64/Packages/

问题现象

K8S集群内的一个Pod服务定时发起icmpv6ping报文到集群外的设备,偶现请求无响应应。

原因分析

抓包分析

先在节点上和Pod所在网卡上抓包,分析确认,请求包在节点上有响应,但未进入Pod内,初步判断请求是在节点发往Pod内的过程中丢了。

网卡是否丢包

查看集群上的网卡配置,环境使用的是bound网卡:

1
2
3
4
5
6
7
8
9
10
[root@node1 ~]# ifconfig bound0
bound0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST,MASTER> mtu 1500
inet xx.xx.xx.xx netmask 255.255.0.0 broadcast xx.xx.255.255
inet6 xxxx::xxx:xxxx:xxxx:xxxx prefixlen 64 scopeid 0x20<link>
inet6 xx.xx.xx.xx prefixlen 64 scopeid 0x0<global>
ether xx:xx:xx:xx:xx:xx txqueuelen 1000 (Ethernet)
RX packets 94617918 bytes 39566668050 (36.8 GiB)
RX errors 0 dropped 5011121212 overruns 0 frame 0
TX packets 58685914 bytes 77971464576 (72.6 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

查看到RXdrop报文数量很大,且不断增加。考虑到早期出现过bound配置导致的丢包问题,排查相关配置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[root@node1 ~]# cat /sys/class/net/bondo/bonding/mode
802.3ad 4

[root@node1 ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802. 3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 64: 2f: c7: c2: b1: 8b
Active Aggregator Info:
...

MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent Hw addr: 64: 2f: c7: c2: b1: 8b Slave queue ID: 0
Aggregator ID: 1
...

从配置看,Link Failure Count: 1,也就是出现过一次网卡链路问题,看起来跟持续的网络丢包关系不大,从dmesg日志也可以印证这一点。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@node1 ~]# dmesg -T |grep bond
[ 7 6 17:52:08 2023]bonding: bondo is being created...
[ 7 6 17:52:08 2023]bonding: bondo already exists
[ 7 6 17:52:08 2023]IPv6: ADDRCONF (NETDEV_UP): bondo: link is not ready
[ 7 6 17:52:08 2023]bondo: Enslaving ens3fo as a backup interface with a down link
[ 7 6 17:52:09 2023]bondo: Enslaving ens1fo as a backup interface with a down link
[ 7 6 17:52:09 2023]bondo: Warning: No 802. 3ad response from the link partner for any adapters in the bond
[ 7 6 17:52:09 2023]IPV6: ADDRCONF (NETDEV_UP): bondo: link is not ready
[ 7 6 17:52:09 2023]IPv6: ADDRCONF(NETDEV_CHANGE): bondo: link becomes ready
[ 7 6 17:52:09 2023]bondo: link status definitely up for interface ens3f0, 10000 Mbps full duplex
[ 7 6 17:52:09 2023]bondo: first active interface up!
[ 7 6 17:52:09 2023] bondo: link status definitely up for interface ens1f0, 10000 Mbps full duplex
...
[ 3 21 01:20:53 2024]bondo: link status definitely down for interface ens3f0, disabling it
[ 3 21 01:24:18 2024]bondo: link status definitely up for interface ens3f0, 10000 Mbps full duduplex
...

继续看抓包结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
节点上抓包结果:
45906 2024-05-08 08:33:19.680488 1000:ff00::1:212 1000:ff00::101 ICMPv6 94 Echo (ping) request id-0xef02, seq=1 (reply in 45911)
45911 2024-05-08 08:33:19.681123 1000:ff00:101 1000:ff00::1:212 ICMPv6 94 Echo (ping) reply id=0xef02, seq=1 (request in 45906)

45917 2024-05-08 08:33:23.640357 1000:ff00::1:212 1000:ff00:101 ICMPv6 94 Echo (ping) request id=0xef02, seq=1 (reply in 45921)
45921 2024-05-08 08:33:23.650087 1000:ff00:101 1000:ff00::1:212 ICMPv6 94 Echo (ping) reply id=0xef02, seq=1 (request in 45917)

45923 2024-05-08 08:33:24.652114 1000:ff00::1:212 1000:ff00::101 ICMPv6 94 Echo (ping) request id=0xef02, seq=1 (reply in 45924)
45924 2024-05-08 08:33:24.654495 1000:ff00::101 1000:ff00::1:212 ICMPv6 94 Echo (ping) reply id=0xef02, seq=1 (request in 45923)

45925 2024-05-08 08:33:26.653971 1000:ff00::1:212 1000:ff00::101 ICMPv6 94 Echo (ping) request id=0xef02, seq=1 (reply in 45926)
45926 2024-05-08 08:33:26.660779 1000:ff00::101 1000:ff00:1:212 ICMPv6 94 Echo (ping) reply id=0xef02, seq=1 (request in 45925)

Pod网卡抓包结果:
38001 2024-05-08 08:33:23.640134 fd00:7a23::466 1000:ff00::101 ICMPv6 94 Echo (ping) id=0xef02 (no response found! )
38004 2024-05-08 08:33:24.652076 fd00:7a23::466 1000:ff00::101 ICMPv6 94 Echo (ping) id=0xef02 (no response found! )
38005 2024-05-08 08:33:26.653936 fd00:7a23::466 1000:ff00::101 ICMPv6 94 Echo (ping) id=0xef02 (no response found! )

对比发现,Pod内是发了3个请求包,均没有响应,跟业务的请求配置2s超时 + 重试3次吻合。但从节点的角度看有4个请求和响应包,且从时间看,是从第2个包开始对应的。

由此怀疑,第1个请求响应包是别的服务触发的,跟现场了解到确实存在两个服务ping设备,且这个请求使用的id0xef2,跟后面3个请求包的id是相同的,初步判断是id相同引起的后续包无法响应。

icmpv6相关资料[1],没有明确的官方文档可以证明id相同的两个ping报文存在问题,从一些个人总结资料[2]里看,是不允许相同的。为了确认是否有问题,在家里使用跟业务类似的c++程序模拟ping设备,构造两个容器内使用相同的id请求集群外的ip地址,可以稳定复现该问题。而同时测试两个系统原生ping命令,未复现该问题。

分析两次复现的抓包结果,并查找相关资料[3],可以解释两种结果的差异原因:

1)为什么c++实现的ping有问题? –ping报文使用的id号的实现是顺序累加,当请求不通时,该程序会重试,且重试的报文依然使用相同的id号,这就导致一个id号冲突必然会导致重试也不通,直到下个轮询里使用新的id号探测恢复

2)为什么原生ping没有问题? –ping报文使用的id号是通过ping的进程id和一个十六进制的与计算得到的,每次独立的ping操作会使用一个计算得到的随机id作为icmp报文的id

id相同为什么会有问题呢?到底是哪里的机制影响了请求的响应?通过家里不断构造场景测试,问题原因和几个疑问的分析结论如下:

问题原因:icmpid相同导致系统记录的conntrack表项无法区分出两个不同的响应包该回给谁,如下所示,2ping请求,一个在节点上,一个在容器内,2ping请求使用了相同的id,这会导致两个ping请求均匹配到第一条表项,进而导致容器内的ping请求得不到响应:

1
2
3
4
[root@node1 ~]# cat /proc/net/nf_conntrack|grep icmpv6
ipv6 10 icmpv6 58 29 src=1000:0000:0000:0000:0000:0000:0212:0165dst=1000:0000:0000:0000:0000:0000:0212:0160 type=128 code=0 id=14640src=1000:0000:0000:0000:0000:0000:0212:0160 dst=1000:0000:0000:0000:0000:0000:0212:0165 type=129 code=0 id=0 mark=0 zone=0 use=2

ipv6 10 icmpv6 58 29 src=fd00:0111:0111:0000:0c11:b42f:f17e:a683dst=1000:0000:0000:0000:0000:0000:0212:0160 type=128 code=0 id=14640src=1000:0000:0000:0000:0000:0000:0212:0160 dst=1000:0000:0000:0000:0000:0000:0212:0165 type=129 code=0 id=14640 mark=0 zone=0 use=2

问题1:为什么节点上两个ping使用同一个id没问题?

因为请求和响应都在节点上,记录的conntrack表项是同一条,且id也相同,所以即使请求和响应不对应,两个请求方都可以得到响应。

1
ipv6   10 icmpv6  58 29 src=1000:0000:0000:0000:0000:0000:0212:0165 dst=1000:0000:0000:0000:0000:0000:0212:0160 type=128 code=0 id=14640 src=1000:0000:0000:0000:0000:0000:0212:0160 dst=1000:0000:0000:0000:0000:0000:0212:0165 type=129 code=0 id=0 mark=0 zone=0 use=2

问题2:为什么一个节点一个容器ping同一个id才有问题?

无必然关系,只要是出集群的请求 + nat转换 + ping id相同,都会存在这个问题,而nat转换是容器出集群依赖的必要机制。非容器场景如果使用了nat机制,理论上同样会出现这个问题。

问题3: 为什么使用不同的id发请求没问题?

在请求源ip,目标ip相同的情况下,不同的id请求会在conntrack表项会新增id不同的记录,请求的响应可以依据id做区分,并正常响应:

1
2
3
4
5
[root@node1 ~]# cat /proc/net/nf_conntrack|grep icmpv6

ipv6 10 icmpv6 58 29 src=fd00:0111:0111:0000:0c11:b42f:f17e:a683 dst=1000:0000:0000:0000:0000:0000:0212:0160 type=128 code=0 id=14640src=1000:0000:0000:0000:0000:0000:0212:0160 dst=1000:0000:0000:0000:0000:0000:0212:0165 type=129 code=0 id=14640 mark=0 zone=0 use=2

ipv6 10 icmpv6 58 29 src=fd00:0111:0111:0000:0c11:b42f:f17e:a683 dst=1000:0000:0000:0000:0000:0000:0212:0160 type=128 code=0 id=53764src=1000:0000:0000:0000:0000:0000:0212:0160 dst=1000:0000:0000:0000:0000:0000:0212:0165 type=129 code=0 id=53764 mark=0 zone=0 use=2

问题4:使用相同id请求,异常会持续多久?

由表项老化时间决定,默认是30s

解决方案

  1. 业务侧重试的报文不使用相同的id号。
  2. 同一个环境下避免多个业务同时ping相同的设备,或规划使用不同的id号,避免冲突。

参考资料

https://datatracker.ietf.org/doc/html/rfc4443

https://community.icinga.com/t/how-to-avoid-icmp-identifiers-colliding/5290

https://hechao.li/2018/09/27/How-Is-Ping-Deduplexed/

问题现象

K8S集群中有一个节点的docker stats命令查看不到资源使用:

1
2
3
4
5
6
7
8
9
[root@node1 ~]# docker stats
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O
1c9bec808f61 k8s_busybox_xxx -- -- -- --
86f38791af8f k8s_kube-controller-manager_xxx -- -- -- --
60d98fe39332 k8s_kube-scheduler_xxx -- -- -- --
a81320ad61e8 k8s_calico-kube-controllers_xxx -- -- -- --
4cf98fb540ba k8s_calico-node_xxx -- -- -- --
9747e7ce0032 k8s_kube-proxy_xxx -- -- -- --
...

原因分析

先看docker日志,存在大量如下异常:

1
time="xxx" level=error msg="collecting stats for xxx:no metrics reveived"

根据错误信息未找到相关问题,继续看这个命令的结果是从哪里读取的,根据资料[1]可知,该数据是从cgroup fs中计算得到。

随机找一个容器id,进入cgroup中看看相关指标文件是否正常:

1
2
3
4
5
[root@node1 ~]# cat /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podxxx/docker-xxx.scope/cpuacct.usage
32068181

[root@node1 ~]# cat /sys/fs/cgroup/memory/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podxxx/docker-xxx.scope/memory.limit_in_bytes
9223372036854771712

看起来没啥问题,查docker相关issue,未找到相关问题。

containerd服务,发现一直在打印如下异常:

1
/sys/fs/cgroup/cpuacct/kubepods.slice/besteffort.slice/podxxx/xxx/cpuacct.stat is expected to have 4 fields

根据上面的错误信息,再次查看相关cgroup,看内容不像是4个字段:

1
2
3
4
[root@node1 ~]# cat /sys/fs/cgroup/cpuacct/kubepods.slice/kubepods-besteffort.slice/podxxx/xxx/cpuacct.stat
user 32
system 89
sched_delay 0

找一个正常节点,查看相关cgroup

1
2
3
[root@node1 ~]# cat /sys/fs/cgroup/cpuacct/kubepods.slice/kubepods-burstable.slice/podxxx/xxx/cpuacct.stat
user 1568
system 6436

对比发现,问题环境里多了一个sched_delay字段,该字段表示由于调度延迟而导致的 CPU 时间延迟。查看到相关资料[2],此问题源自https://github.com/containerd/cgroups。 当尝试从文件/sys/fs/cgroup/cpuacct/cpuacct.stat检索字段时,会报告该错误。这个限制是不合理的,已在containerd/cgroups@5fe29ea中修复。

查看修改记录,containerd/cgroups的解决版本如下:

1
v3.0.3  v3.0.2 v3.0.1 v3.0.0 v1.1.0

对应的containerd版本是从v1.7.0开始升级cgroup版本到v1.0.0,解决了该问题。

为什么只有一个节点存在该问题?

根据修复记录的说明,某些系统内核才会触发该问题。查看正常节点和异常节点的内核版本,发现异常节点的内核版本是4.14.0,而正常节点的内核版本是5.x

解决方案

1.升级containerdv1.7.0及以上版本;

2.升级操作系统内核版本;

参考资料

1.https://cloud.tencent.com/developer/article/1096453

2.https://github.com/milvus-io/milvus/issues/22982

3.https://github.com/containerd/cgroups/pull/231

背景

找一个能查看etcd中存储的解码后的k8s数据的方法或工具。查看开源工具[1],很久没有维护了,看相关issue,该工具已经加入etcd-io

编译步骤

根据官方文档[2]操作,下载源码包:

1
2
3
4
5
6
7
8
[root@node1]# git clone git@github.com:etcd-io/auger.git
Cloning into 'auger'...
remote: Enumerating objects: 712, done.
remote: Counting objects: 100% (229/229), done.
remote: Compressing objects: 100% (106/106), done.
remote: Total 712 (delta 179), reused 150 (delta 123), pack-reused 483
Receiving objects: 100% (712/712), 247.44 KiB | 186.00 KiB/s, done.
Resolving deltas: 100% (409/409), done.

编译版本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[root@node1]# cd auger/
[root@node1 auger]# make release
Building release in temp directory /tmp/tmp.VtO7q4KrPY
docker run \
-v /tmp/tmp.VtO7q4KrPY/auger:/go/src/github.com/etcd-io/auger \
-w /go/src/github.com/etcd-io/auger \
golang:1.21.8 \
/bin/bash -c "make -f /go/src/github.com/etcd-io/auger/Makefile release-docker-build GOARCH=amd64 GOOS=linux"
Unable to find image 'golang:1.21.8' locally
1.21.8: Pulling from library/golang
71215d55680c: Pull complete
3cb8f9c23302: Pull complete
5f899db30843: Pull complete
c29f45468664: Pull complete
6de33e7b6490: Pull complete
6dbaf8e5f127: Pull complete
4f4fb700ef54: Pull complete
Digest: sha256:856073656d1a517517792e6cdd2f7a5ef080d3ca2dff33e518c8412f140fdd2d
Status: Downloaded newer image for golang:1.21.8
export GOPATH=/go
GOOS=linux GOARCH=amd64 GO111MODULE=on go build
go: go.mod requires go >= 1.22.0 (running go 1.21.8; GOTOOLCHAIN=local)
make: *** [/go/src/github.com/etcd-io/auger/Makefile:66: release-docker-build] Error 1
make: *** [release] Error 2

提示go的版本不匹配,更新版本:

1
2
3
4
5
[root@node1 auger]# vim Makefile
NAME ?= auger
PKG ?= github.com/etcd-io/$(NAME)
GO_VERSION ?= 1.22.0
...

继续编译:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[root@node1 auger]# make release
Building release in temp directory /tmp/tmp.s0Ue7zvIop
docker run \
-v /tmp/tmp.s0Ue7zvIop/auger:/go/src/github.com/etcd-io/auger \
-w /go/src/github.com/etcd-io/auger \
golang:1.22.0 \
/bin/bash -c "make -f /go/src/github.com/etcd-io/auger/Makefile release-docker-build GOARCH=amd64 GOOS=linux"
Unable to find image 'golang:1.22.0' locally
1.22.0: Pulling from library/golang
7bb465c29149: Pull complete
...
Digest: sha256:7b297d9abee021bab9046e492506b3c2da8a3722cbf301653186545ecc1e00bb
Status: Downloaded newer image for golang:1.22.0
export GOPATH=/go
GOOS=linux GOARCH=amd64 GO111MODULE=on go build
go: downloading github.com/coreos/etcd v3.1.11+incompatible
go: downloading github.com/google/safetext v0.0.0-20220914124124-e18e3fe012bf
go: downloading github.com/spf13/cobra v1.8.0
go: downloading github.com/coreos/bbolt v1.3.1-coreos.3
go: downloading k8s.io/apimachinery v0.30.0
go: downloading proxy.golang.org/xxx io timeout
...

使用proxy.golang.org代理导致很多依赖包下载失败,修改GOPROXY代理

1
2
3
4
5
6
7
8
9
10
[root@node1 ~]# docker exec -it 9b41dd00e91a sh
# go env
...
GOPROXY='https://proxy.golang.org,direct'

[root@node1 auger]# vim Makefile
# Build used inside docker by 'release'
release-docker-build:
export GOPATH=/go
GOOS=$(GOOS) GOARCH=$(GOARCH) GO111MODULE=on GOPROXY='https://goproxy.cn,direct' go build

继续编译:

1
2
3
4
5
6
7
8
9
10
11
12
[root@node1 auger]# make release
Building release in temp directory /tmp/tmp.34OgmWJGLU
docker run \
-v /tmp/tmp.34OgmWJGLU/auger:/go/src/github.com/etcd-io/auger \
-w /go/src/github.com/etcd-io/auger \
golang:1.22.0 \
/bin/bash -c "make -f /go/src/github.com/etcd-io/auger/Makefile release-docker-build GOARCH=amd64 GOOS=linux"
export GOPATH=/go
GOOS=linux GOARCH=amd64 GO111MODULE=on GOPROXY='https://goproxy.cn,direct' go build
go: downloading github.com/coreos/etcd v3.1.11+incompatible
...
build/auger built!

编译成功,执行二进制文件测试,提示glibc版本没找到:

1
2
3
[root@node1 auger]# ./build/auger -help
./build/auger: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by ./build/auger)
./build/auger: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by ./build/auger)

查看本地的glibc版本,发现版本不匹配:

1
2
[root@node1 l14185]# rpm -qa|grep glibc
glibc-2.17-326.el7_9.x86_64

解决方案有两个:

  1. 修改编译使用的镜像,找一个glibc版本跟节点上一致的编译镜像;
  2. 直接在节点上编译;

以直接在节点上编译为例,下载指定版本的go安装包,直接执行go build命令:

1
2
3
4
[root@node1]# GOOS=linux GOARCH=amd64 go build -o build/auger
[root@node1 auger]# ll build/
total 39916
-rwxr-xr-x 1 root root 40871798 May 13 19:12 auger

使用方法

查看帮助信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[root@iZbp1esczkzr2k2fughijkZ auger]# ./build/auger
Inspect and analyze kubernetes objects in binary storage
encoding used with etcd 3+ and boltdb.

Usage:
auger [command]

Available Commands:
analyze Analyze kubernetes data from the boltdb '.db' files etcd persists to.
checksum Checksum a etcd keyspace.
completion Generate the autocompletion script for the specified shell
decode Decode objects from the kubernetes binary key-value store encoding.
encode Encode objects to the kubernetes binary key-value store encoding.
extract Extracts kubernetes data from the boltdb '.db' files etcd persists to.
help Help about any command

Flags:
-h, --help help for auger

Use "auger [command] --help" for more information about a command.

查看解码后的etcd数据:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
[root@node1]# ETCDCTL_API=3 etcdctl get /registry/pods/kube-system/coredns-795cc9c45c-j7nl4 | ./auger decode
apiVersion: v1
kind: Pod
metadata:
generateName: coredns-795cc9c45c-
labels:
k8s-app: kube-dns
pod-template-hash: 795cc9c45c
name: coredns-795cc9c45c-j7nl4
namespace: kube-system
spec:
containers:
- args:
- -conf
- /etc/coredns/Corefile
name: coredns
ports:
...
volumeMounts:
- mountPath: /etc/coredns
name: config-volume
readOnly: true
- mountPath: /tmp
name: tmp
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: coredns-token-9dldj
readOnly: true
nodeName: node1
tolerations:
- key: CriticalAddonsOnly
operator: Exists
...
volumes:
- emptyDir: {}
name: tmp
...
status:
conditions:
- lastProbeTime: null
type: Initialized
...
containerStatuses:
- containerID: docker://f85d0fd1422a3860d574eb88b5dc23c165d5adb3eccb242a1a847bd0cfc98227
...
hostIP: 192.168.10.10
phase: Running
podIP: 10.10.166.139
qosClass: Burstable

注意事项

直接使用auger命令时,需要保证etcd服务未启动,或者把etcd的数据库文件拷贝一份再解析,否则会导致解析卡住。

解析卡住的strace命令现象如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[root@node1]# strace ./auger checksum -f /var/lib/etcd/default.etcd/member/snap/db
execve("./auger", ["./auger", "checksum", "-f", "/var/lib/etcd/default.etcd/membe"...], [/* 25 vars */]) = 0
brk(NULL) = 0x3e75000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4a242fc000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=37465, ...}) = 0
mmap(NULL, 37465, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4a242f2000
close(3) = 0
...
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f49da795000
openat(AT_FDCWD, "/var/lib/etcd/default.etcd/member/snap/db", O_RDWR|O_CREAT|O_CLOEXEC, 0400) = 3
fcntl(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1581252610, u64=9172183252402700290}}) = -1 EPERM (Operation not permitted)
fcntl(3, F_GETFL) = 0x8802 (flags O_RDWR|O_NONBLOCK|O_LARGEFILE)
fcntl(3, F_SETFL, O_RDWR|O_LARGEFILE) = 0
flock(3, LOCK_EX|LOCK_NB) = -1 EAGAIN (Resource temporarily unavailable)
futex(0xc000100148, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x1edd920, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x1edd920, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x1edd920, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x1edd920, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x1edd920, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
...

拷贝一份数据库文件,对比校验结果:

1
2
3
4
5
6
7
8
9
10
11
[root@node1]# cp /var/lib/etcd/default.etcd/member/snap/db /root/etcd.db
[root@node1]# ./auger checksum -f /root/etcd.db
checksum: 2125275681
compact-revision: 6609891
revision: 6610932

[root@node2 ~]# cp /var/lib/etcd/default.etcd/member/snap/db /root/etcd.db
[root@node2 ~]# ./auger checksum -f /root/etcd.db -r 6610932
checksum: 2125275681
compact-revision: 6610743
revision: 6610932

参考资料

https://github.com/jpbetz/auger

https://github.com/etcd-io/auger

需求背景

需要实时看到业务环境里的抓包结果,查看资料[1,2],了解到有两种配置方法:

  • 方案1:利用wireshark的远程接口功能
  • 方案2:利用wireshrkSSH remote capture功能

利用远程接口功能

1.安装依赖包

1
yum install glibc-static

2.下载rpcapd的源码包

下载跟wireshark版本相近的4.0.1-WpcapSrc.zip[3]

3.编译配置

1
2
3
4
5
6
7
8
9
[root@node1 ~]# CFLAGS=-static ./configure
...
checking for flex... no
checking for bison... no
checking for capable lex... insufficient
configure: error: Your operating system's lex is insufficient to compile
libpcap. flex is a lex replacement that has many advantages, including
being able to compile libpcap. For more information, see
http://www.gnu.org/software/flex/flex.html .

根据报错信息,安装缺少的相关依赖包:

1
yum install flex bison

编译报错:

1
2
3
4
[root@node1 libpcap]# make
gcc -O2 -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" @V_HAVE_REMOTE@ -c ./pcap-linux.c
gcc: 错误:@V_HAVE_REMOTE@:没有那个文件或目录
make: *** [pcap-linux.o] 错误 1

重新下载4.1.1-WpcapSrc.zip源码包,编译成功:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
[root@node1 libpcap]# make
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./pcap-linux.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./pcap-usb-linux.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./fad-getad.c
sed -e 's/.*/static const char pcap_version_string[] = "libpcap version &";/' ./VERSION > version.h
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./pcap.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./inet.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./gencode.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./optimize.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./nametoaddr.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./etherent.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./savefile.c
rm -f bpf_filter.c
ln -s ./bpf/net/bpf_filter.c bpf_filter.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c bpf_filter.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./bpf_image.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./bpf_dump.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c scanner.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -Dyylval=pcap_lval -c grammar.c
sed -e 's/.*/char pcap_version[] = "&";/' ./VERSION > version.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c version.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./pcap-new.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./pcap-remote.c
gcc -O2 -fPIC -static -I. -DHAVE_CONFIG_H -D_U_="__attribute__((unused))" -DHAVE_REMOTE -c ./sockutils.c
ar rc libpcap.a pcap-linux.o pcap-usb-linux.o fad-getad.o pcap.o inet.o gencode.o optimize.o nametoaddr.o etherent.o savefile.o bpf_filter.o bpf_image.o bpf_dump.o scanner.o grammar.o version.o pcap-new.o pcap-remote.o sockutils.o
ranlib libpcap.a
sed -e 's|@includedir[@]|/usr/local/include|g' \
-e 's|@libdir[@]|/usr/local/lib|g' \
-e 's|@DEPLIBS[@]||g' \
pcap-config.in >pcap-config.tmp
mv pcap-config.tmp pcap-config
chmod a+x pcap-config
[root@node1 libpcap]#
[root@node1 libpcap]# cd rpcapd
[root@node1 rpcapd]# make
gcc -pthread -DHAVE_REMOTE -DHAVE_SNPRINTF -I../ -c rpcapd.c
gcc -pthread -DHAVE_REMOTE -DHAVE_SNPRINTF -I../ -c daemon.c
daemon.c: 在函数‘daemon_AuthUserPwd’中:
daemon.c:684:30: 警告:将一个整数转换为大小不同的指针 [-Wint-to-pointer-cast]
if (strcmp(usersp->sp_pwdp, (char *) crypt(password, usersp->sp_pwdp) ) != 0)
^
gcc -pthread -DHAVE_REMOTE -DHAVE_SNPRINTF -I../ -c utils.c
gcc -pthread -DHAVE_REMOTE -DHAVE_SNPRINTF -I../ -c fileconf.c
gcc -pthread -DHAVE_REMOTE -DHAVE_SNPRINTF -I../ -c ../pcap-remote.c
gcc -pthread -DHAVE_REMOTE -DHAVE_SNPRINTF -I../ -c ../sockutils.c
gcc -pthread -DHAVE_REMOTE -DHAVE_SNPRINTF -I../ -c ../pcap-new.c
gcc -pthread -DHAVE_REMOTE -DHAVE_SNPRINTF -I../ -o rpcapd rpcapd.o daemon.o utils.o fileconf.o pcap-remote.o sockutils.o pcap-new.o -L../ -lpcap -lcrypt

启动rpcapd服务

1
2
[root@node1 rpcapd]#  ./rpcapd -4 -n -p 2002
Press CTRL + C to stop the server...

查看监听结果

1
2
[root@node1 ~]# netstat -anp|grep -w 2002
tcp 0 0 0.0.0.0:2002 0.0.0.0:* LISTEN 28399/./rpcapd

启动Wireshark,在Wireshark捕获->选项->管理接口->远程接口页面下新增主机端口,提示错误“PCAP没有发现”,查看资料[4],需要下载npcap解决。

远程接口连接后,后台提示如下错误:

1
2
3
[root@node1 rpcapd]#  ./rpcapd -4 -n -p 2002
Press CTRL + C to stop the server...
Not enough space in the temporary send buffer

按照资料[4]的解决方法,Wireshark页面配置后提示超时,后台依然报错:

1
2
3
4
5
6
[root@node1 rpcapd]#  ./rpcapd -n -p 2002
Press CTRL + C to stop the server...
Not enough space in the temporary send buffer.
The RPCAP runtime timeout has expired
I'm exiting from the child loop
Child terminated

考虑到该方法依赖rpcapd,且该依赖包需要在相同的环境下编译,暂不采用。

利用SSH remote capture功能

启动Wireshark,在Wireshark捕获->选项->输入页面下找到SSH remote capture,点击左侧的设置图标,打开ssh登录设置。

在弹出页面上配置ssh的连接参数,包括服务器地址,端口,用户名,密码(也可以用证书)等等。

配置完成后,点击开始按钮,开始远程抓包。

参考资料

1.https://zhuanlan.zhihu.com/p/551549544

2.https://blog.csdn.net/weixin_40991654/article/details/126779792

3.https://www.winpcap.org/archive/

4.https://blog.csdn.net/m0_37678467/article/details/127940287

由于工作需要,定位问题时可能需要访问redhat的知识库,参考资料[1],执行以下几步即可搞定:

操作步骤

第一步:登录 https://access.redhat.com/ 创建一个账号;

第二步:访问 https://developers.redhat.com/products/rhel/download 激活订阅(收到邮件并激活);

第三步:访问 https://access.redhat.com/management 确认一下我们的账号是否有 developer subscription

1
2
14904535	Red Hat Developer Subscription for Individuals
14904536 Red Hat Beta Access

第四步:用注册的用户名密码,激活一个rhel系统:

1
subscription-manager register --auto-attach --username ******** --password ********

第五步:访问https://access.redhat.com/solutions/6178422测试知识库是否能访问;

关于激活一个rhel系统的操作,为了快速方便,这里使用vagrant软件快速部署一个redhat8的操作系统。这个流程也仅需要以下几步:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# 初始化Vagrantfile文件
$ vagrant init generic/rhel8
A `Vagrantfile` has been placed in this directory. You are now
ready to `vagrant up` your first virtual environment! Please read
the comments in the Vagrantfile as well as documentation on
`vagrantup.com` for more information on using Vagrant.

# 启动redhat8系统
$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Box 'generic/rhel8' could not be found. Attempting to find and install...
default: Box Provider: virtualbox
default: Box Version: >= 0
==> default: Loading metadata for box 'generic/rhel8'
default: URL: https://vagrantcloud.com/generic/rhel8
==> default: Adding box 'generic/rhel8' (v4.3.12) for provider: virtualbox
default: Downloading: https://vagrantcloud.com/generic/boxes/rhel8/versions/4.3.12/providers/virtualbox/amd64/vagrant.box
default:
default: Calculating and comparing box checksum...
==> default: Successfully added box 'generic/rhel8' (v4.3.12) for 'virtualbox'!
==> default: Importing base box 'generic/rhel8'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'generic/rhel8' version '4.3.12' is up to date...
==> default: Setting the name of the VM: Redhat8_default_1715673627487_1933
==> default: Vagrant has detected a configuration issue which exposes a
==> default: vulnerability with the installed version of VirtualBox. The
==> default: current guest is configured to use an E1000 NIC type for a
==> default: network adapter which is vulnerable in this version of VirtualBox.
==> default: Ensure the guest is trusted to use this configuration or update
==> default: the NIC type using one of the methods below:
==> default:
==> default: https://www.vagrantup.com/docs/virtualbox/configuration.html#default-nic-type
==> default: https://www.vagrantup.com/docs/virtualbox/networking.html#virtualbox-nic-type
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
default: Adapter 1: nat
==> default: Forwarding ports...
default: 22 (guest) => 2222 (host) (adapter 1)
==> default: Running 'pre-boot' VM customizations...
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
default: SSH address: 127.0.0.1:2222
default: SSH username: vagrant
default: SSH auth method: private key
The guest machine entered an invalid state while waiting for it
to boot. Valid states are 'starting, running'. The machine is in the
'paused' state. Please verify everything is configured
properly and try again.

If the provider you're using has a GUI that comes with it,
it is often helpful to open that and watch the machine, since the
GUI often has more helpful error messages than Vagrant can retrieve.
For example, if you're using VirtualBox, run `vagrant up` while the
VirtualBox GUI is open.

The primary issue for this error is that the provider you're using
is not properly configured. This is very rarely a Vagrant issue.

# 上面的命令执行后系统处于paused状态,再启动一下
$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Checking if box 'generic/rhel8' version '4.3.12' is up to date...
==> default: Unpausing the VM...

# ssh到新安装的redhat系统
$ vagrant ssh
Register this system with Red Hat Insights: insights-client --register
Create an account or view all your systems at https://red.ht/insights-dashboard

# 在redhat系统中执行注册
[root@rhel8 ~]# subscription-manager register --auto-attach --username ******** --password ********
Registering to: subscription.rhsm.redhat.com:443/subscription
The system has been registered with ID: xxxx-xxxx-xxxx-xxxx-xxxx
The registered system name is: rhel8.localdomain

# 系统使用完,关机即可
$ vagrant halt
==> default: Attempting graceful shutdown of VM...
default:
default: Vagrant insecure key detected. Vagrant will automatically replace
default: this with a newly generated keypair for better security.
default:
default: Inserting generated public key within guest...
default: Removing insecure key from the guest if it's present...
default: Key inserted! Disconnecting and reconnecting using new SSH key...

参考资料

https://wangzheng422.github.io/docker_env/notes/2022/2022.04.no-cost.rhel.sub.html

问题背景

K8S环境中,某个业务由于误操作重启了系统的dbus服务,导致所有的Pod启动失败,相关日志如下:

1
unable to ensure pod container exists: failed to create container for [kubepods besteffort ...] : dbus: connection closed by user

原因分析

根据错误信息,查到相关issue[1],原因如下:

kubelet服务在创建Pod时会调用/var/run/dbus/system_bus_socket,如果dbus服务由于某些异常发生重启,/var/run/dbus/system_bus_socket这个文件就会被重新创建。此时,kubelet继续向旧的socket发送数据,就会出现上述的报错信息。

解决方案

临时方案:重启kubelet服务

永久方案:升级K8S版本到v1.25+

后续问题

重启过dbuskubelet服务后,出现非root用户ssh远程慢的现象。查看secure日志,发现如下错误:

1
2
pam_systemd(crond:session): Failed to create session: Activation of org.freedesktop.login1 timed out
pam_systemd(crond:session): Failed to create session: Connection timed out

查看资料[2],原因是ssh依赖systemd-logind服务,而该服务又依赖dbus服务,通过重启systemd-logind服务解决:

1
[root@core log]# systemctl restart systemd-logind 

参考资料

1.https://github.com/kubernetes/kubernetes/issues/100328

2.https://www.jianshu.com/p/bb66d7f8c859

问题现象

K8S集群所有节点之间网络异常,无法执行正常的SSH操作。

原因分析

基于该现象,首先怀疑是使用的密码错误,先排查使用的密码和实际密码和是否一致,经确认业务存储的密码跟实际密码是一致的,排除密码不一致的问题;

再排查是不是有异常的ip使用错误密码连接:

这里使用的是ipv6地址,需要注意,默认的netstat命令看到的ipv6地址是不全的,无法方便看出完整的ip地址,需要添加-W命令完整显示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@node1 ~]# netstat -anp -v|grep -w 22 
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 156183/sshd: /usr/s
tcp6 0 0 :::22 :::* LISTEN 156183/sshd: /usr/s
tcp6 0 0 2000:8080:5a0a:2f:59732 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:44072 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:35666 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:42998 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:59834 2000:8080:5a0a:2f40::22 ESTABLISHED 170769/java
tcp6 0 0 2000:8080:5a0a:2f:59652 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:39430 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:35648 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:36852 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:43162 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:35002 2000:8080:5a0a:2f40::22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f:36052 2000:8080:5a0a:2f40::22 ESTABLISHED 170769/java

完整ipv6地址的ssh连接如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@node1 ~]# netstat -anp -W|grep -w 22 |grep -v ::4
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 156183/sshd: /usr/s
tcp6 0 0 :::22 :::* LISTEN 156183/sshd: /usr/s
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:48950 2000:8080:5a0a:2f40:8002::5:22 ESTABLISHED 170769/java
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:52506 2000:8080:5a0a:2f40:8002::5:22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:56798 2000:8080:5a0a:2f40:8002::6:22 ESTABLISHED 170769/java
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:52624 2000:8080:5a0a:2f40:8002::5:22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:56860 2000:8080:5a0a:2f40:8002::6:22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:52396 2000:8080:5a0a:2f40:8002::5:22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:52398 2000:8080:5a0a:2f40:8002::5:22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:22 2000:8080:5a0a:2f40:8002::5:45532 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:52202 2000:8080:5a0a:2f40:8002::5:22 TIME_WAIT -
tcp6 0 0 2000:8080:5a0a:2f40:8002::5:52348 2000:8080:5a0a:2f40:8002::5:22 ESTABLISHED 170769/java

从上面的记录看,至少当前没有异常ipssh连接,再确认一下是不是之前出现过错误密码导致密码被锁的情况;

查看/var/log/secure日志(日志已发生过轮转,无法确认出问题的初始时间点),查看系统最近没有发生过重启,继续看journal --boot里的登录失败日志,找到了出问题的时间点,并且可以看到源ip地址2000:8080:5a0a:2f47::2一直使用错误密码登录:

1
2
3
4
5
6
7
8
cat boot.log |grep "Failed password"|less
3月 26 10:42:19 node1 sshd[114043]: Failed password for admin from 2000:8080:5a0a:2f47::2 port 34968 ssh2
3月 26 10:42:23 node1 sshd[114043]: Failed password for admin from 2000:8080:5a0a:2f47::2 port 34968 ssh2
3月 26 10:42:25 node1 sshd[114043]: Failed password for admin from 2000:8080:5a0a:2f47::2 port 34968 ssh2
3月 26 10:42:28 node1 sshd[114043]: Failed password for admin from 2000:8080:5a0a:2f47::2 port 34968 ssh2
3月 26 10:42:31 node1 sshd[116187]: Failed password for admin from 2000:8080:5a0a:2f47::2 port 35194 ssh2
3月 26 10:42:34 node1 sshd[116187]: Failed password for admin from 2000:8080:5a0a:2f47::2 port 35194 ssh2
3月 26 10:42:36 node1 sshd[116187]: Failed password for admin from 2000:8080:5a0a:2f47::2 port 35194 ssh2

正常来说,使用错误密码登录失败后,密码被锁到指定时间后会自动解锁。但问题环境当前没有错误密码连接的情况下,使用正确密码依然无法连接。

临时注释/etc/pam.d/security-auth/etc/pam.d/password-authauth相关的配置,验证ssh异常是否是密码锁配置导致:

1
# auth required pam_tally2.so onerr=fail deny=5 unlock_time=900 even_deny_root

修改完观察一段时间,ssh恢复正常,还原回去后,ssh又出现异常,基本确认是配置问题。从系统相关同事了解到,这里使用的密码锁定模块tally是个老模块,因为存在缺陷已经被废弃,其中一个问题就是:在使用错误密码被锁后,即使密码正确了,也无法解除锁定。建议使用faillock模块替代,配置方法如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@node1 ~]# vim /etc/pam.d/system-auth 或者 vi /etc/pam.d/login
# 在文件开头增加如下内容:
auth [success=1 default=bad] pam_unix.so
auth [default=die] pam_faillock.so authfail deny=5 even_deny_root unlock_time=900 root_unlock_time=10
auth sufficient pam_faillock.so authsucc deny=5 even_deny_root unlock_time=900 root_unlock_time=10
auth required pam_deny.so

[root@node1 ~]# vim /etc/pam.d/password-auth 或者 vi /etc/pam.d/sshd
在文件第二行(第一行为 #%PAM-1.0 )增加如下内容:
auth [success=1 default=bad] pam_unix.so
auth [default=die] pam_faillock.so authfail deny=5 even_deny_root unlock_time=900 root_unlock_time=10
auth sufficient pam_faillock.so authsucc deny=5 even_deny_root unlock_time=900 root_unlock_time=10
auth required pam_deny.so

说明:faillock模块远程登录、本地登录过程中,用户锁定均不会有任何提示,只会出现锁定期间即使密码输入正确也无法登录系统的现象,解锁后可正常登录。

至于为什么出现这个问题,最后了解到是客户那边的漏扫平台使用弱密码故意扫的,正常只会扫一次,不清楚为什么触发扫了多次。

解决方案

锁密码的安全加固使用faillock模块替代老版本的tally模块。