0%

问题背景

继《Karaf框架升级Lg4j历程》之后,今天又接到通知,需要将版本再升级到2.18.0,据说还是因为漏洞问题。网上查找,未发现有爆出什么漏洞,只找到了一个腾讯发布的相关通知《Apache Log4j官网普通更新》。

分析过程

有了前文的分析,我们知道,要解决漏洞,有三种升级方式:

  1. 升级框架:这个影响就比较大了,而且框架的版本发布周期比较慢,目前还没有编译好的框架包,要升级框架就需要自己编译出所有的框架包,风险较大;
  2. 升级依赖包:影响较小,如果没有配置依赖包的地方,可能无法升级;(实际确认,无法单独升级)
  3. 修改当前版本依赖包并重新编译:影响较小,如果与最新版本跨度较大,可能修改点会很多;

综合比较,继续考虑使用第3个方案走走看,有了前文的经验,就直接修改依赖包版本到2.18.0

1
2
3
4
5
6
7
8
9
pom.xml
<version.org.apache.felix.configadmin>1.9.20</version.org.apache.felix.configadmin>
<version.org.apache.felix.framework>5.6.12</version.org.apache.felix.framework>
<version.org.apache.felix6.framework>6.0.3</version.org.apache.felix6.framework>
- <version.org.apache.logging.log4j>2.17.0</version.org.apache.logging.log4j>
+ <version.org.apache.logging.log4j>2.18.0</version.org.apache.logging.log4j>
<version.org.apache.servicemix.bundles.javax-inject>1_3</version.org.apache.servicemix.bundles.javax-inject>
<version.org.jboss.logging>3.4.1.Final</version.org.jboss.logging>
<version.org.mockito>3.7.7</version.org.mockito>

编译:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
root@895be557c3cd:/home/org.ops4j.pax.logging-logging-1.11.9# mvn clean install -Dmaven.test.skip=true
...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] OPS4J Pax Logging (Build POM) 1.11.9 ............... SUCCESS [ 2.355 s]
[INFO] OPS4J Pax Logging - API (Java9) .................... SUCCESS [ 2.039 s]
[INFO] OPS4J Pax Logging - API ............................ SUCCESS [ 4.926 s]
[INFO] OPS4J Pax Logging - Log4Jv1 implementation ......... SUCCESS [ 2.235 s]
[INFO] OPS4J Pax Logging - Log4Jv2 implementation ......... SUCCESS [ 3.051 s]
[INFO] OPS4J Pax Logging - Log4j v2 Extra packages ........ SUCCESS [ 0.146 s]
[INFO] OPS4J Pax Logging - Logback implementation ......... SUCCESS [ 0.950 s]
[INFO] OPS4J Pax Logging - Integration Tests .............. SUCCESS [ 0.354 s]
[INFO] OPS4J Pax Logging - Karaf .......................... SUCCESS [ 0.014 s]
[INFO] OPS4J Pax Logging - Karaf KAR Logger ............... SUCCESS [ 0.142 s]
[INFO] OPS4J Pax Logging - Karaf KAR ...................... SUCCESS [ 1.710 s]
[INFO] OPS4J Pax Logging - Karaf Distribution ............. SUCCESS [ 2.522 s]
[INFO] OPS4J Pax Logging - Karaf Integration Tests 1.11.9 . SUCCESS [ 0.703 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 22.711 s
[INFO] Finished at: 2022-10-20T03:50:21Z
[INFO] ------------------------------------------------------------------------

把新编译的pax-logging-apipax-logging-log4j替换到依赖仓库中,重新编译交付件,发现日志功能异常,服务不再打印任何日志了,定位都无从下手;

从参考资料[1]的代码提交记录看,org.ops4j.pax.logging为了升级log4j依赖包,不单单是改了版本,还涉及一些代码修改点,怀疑是有关系的:

1
2
3
4
5
6
7
8
9
10
pax-logging-api/src/main/java/org/apache/logging/log4j/LogManager.java
pax-logging-api/src/main/java/org/apache/logging/log4j/status/StatusLogger.java
pax-logging-api/src/main/java/org/apache/logging/log4j/util/PaxPropertySource.java
pax-logging-it/pom.xml
pax-logging-log4j2/src/main/java/org/apache/logging/log4j/core/config/plugins/util/ResolverUtil.java
pax-logging-log4j2/src/main/java/org/apache/logging/log4j/core/impl/ThrowableProxy.java
pax-logging-log4j2/src/main/java/org/apache/logging/log4j/core/pattern/DatePatternConverter.java
pax-logging-log4j2/src/main/java/org/ops4j/pax/logging/log4j2/internal/PaxLoggingServiceImpl.java
pax-logging-samples/fragment-log4j2/src/main/java/org/ops4j/pax/logging/log4j2/extra/ListAppender.java
pom.xml

对比发现,修改点不多,也不复杂,就尝试将更新的代码移植到1.11.9版本上;

然后使用前文使用过的容器编译环境编译jar包:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
root@895be557c3cd:/home/org.ops4j.pax.logging-logging-1.11.9# mvn clean install -Dmaven.test.skip=true
...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] OPS4J Pax Logging (Build POM) 1.11.9 ............... SUCCESS [ 2.355 s]
[INFO] OPS4J Pax Logging - API (Java9) .................... SUCCESS [ 2.039 s]
[INFO] OPS4J Pax Logging - API ............................ SUCCESS [ 4.926 s]
[INFO] OPS4J Pax Logging - Log4Jv1 implementation ......... SUCCESS [ 2.235 s]
[INFO] OPS4J Pax Logging - Log4Jv2 implementation ......... SUCCESS [ 3.051 s]
[INFO] OPS4J Pax Logging - Log4j v2 Extra packages ........ SUCCESS [ 0.146 s]
[INFO] OPS4J Pax Logging - Logback implementation ......... SUCCESS [ 0.950 s]
[INFO] OPS4J Pax Logging - Integration Tests .............. SUCCESS [ 0.354 s]
[INFO] OPS4J Pax Logging - Karaf .......................... SUCCESS [ 0.014 s]
[INFO] OPS4J Pax Logging - Karaf KAR Logger ............... SUCCESS [ 0.142 s]
[INFO] OPS4J Pax Logging - Karaf KAR ...................... SUCCESS [ 1.710 s]
[INFO] OPS4J Pax Logging - Karaf Distribution ............. SUCCESS [ 2.522 s]
[INFO] OPS4J Pax Logging - Karaf Integration Tests 1.11.9 . SUCCESS [ 0.703 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 23.641 s
[INFO] Finished at: 2022-10-20T03:55:39Z
[INFO] ------------------------------------------------------------------------

升级log4j的版本编译成功。

把新编译的pax-logging-apipax-logging-log4j替换到依赖仓库中,重新编译交付件,发现日志功能正常;

解决方案

把新编译的pax-logging-apipax-logging-log4j替换到依赖仓库中,重新编译交付件,日志功能正常。

参考资料

  1. https://github.com/ops4j/org.ops4j.pax.logging/commit/7c007343fe9844a17e9c6eaae3a833e6c19a579a

漏洞描述

漏洞原理参考资料[1],简单来说就是,当服务器SSL/TLS的瞬时Diffie-Hellman公共密钥小于等于1024位时,存在可以恢复纯文本信息的风险。

复现方法很简单,使用nmap -sV -Pn --script ssl-dh-params port ip 命令扫描[2],存在如下漏洞信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
nmap.exe -sV -Pn --script ssl-dh-params 443 192.168.1.10
Starting Nmap 7.92 ( https://nmap.org ) at 2022-07-09 11:14
Nmap scan report for 192.168.1.10
Host is up (0.0033s latency).
Not shown: 996 closed tcp ports (reset)

| ssl-dh-params:
| VULNERABLE:
| Diffie-Hellman Key Exchange Insufficient Group Strength
| State: VULNERABLE
| Transport Layer Security (TLS) services that use Diffie-Hellman groups
| of insufficient strength, especially those using one of a few commonly
| shared groups, may be susceptible to passive eavesdropping attacks.
| Check results:
| WEAK DH GROUP 1
| Cipher Suite: TLS_DHE_RSA_WITH_AES_256_CBC_SHA256
| Modulus Type: Safe prime
| Modulus Source: RFC2409/Oakley Group 2
| Modulus Length: 1024
| Generator Length: 8
| Public Key Length: 1024
| References:
|_ https://weakdh.org

修复方案

参考[3,4],修改方案如下:

1
2
3
4
[root@node1 etc]# cat org.ops4j.pax.web.cfg
...
# Excluded SSL/TLS Cipher Suites comma-separated list of Regular Expressions
org.ops4j.pax.web.ssl.ciphersuites.excluded=.*NULL.*,.*RC4.*,.*MD5.*,.*DES.*,.*DSS.*,TLS_DHE.*,SSL.*,.*anon.*,.*EXPORT.*

修改后,再次使用nmap -sV -Pn --script ssl-dh-params port ip查看扫描结果,漏洞解决:

1
2
3
4
5
6
7
8
9
10
11
nmap.exe -sV -Pn --script ssl-dh-params 443 192.168.1.10(主机IP)
Starting Nmap 7.92 ( https://nmap.org ) at 2022-07-07 11:53
Nmap scan report for 192.168.1.10
Host is up (0.0032s latency).
Not shown: 997 closed tcp ports (reset
PORT STATE SERVICE VERSION
22/tcp open ssh OpenSSH 7.4 (protocol 2.0)
111/tcp open rpcbind 2-4 (RPC #100000)
...
Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 18.74 seconds

需要注意的是,添加完上面的参数后,可能会出现一个新的问题,扫描结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
nmap.exe -sV --script ssl-enum-ciphers -p 443 192.168.1.10
Starting Nmap 6.40 ( http://nmap.org ) at 2022-08-20 22:26 CST
Nmap scan report for matrix-node1 (192.168.1.10)
Host is up (0.000064s latency).
PORT STATE SERVICE VERSION
443/tcp open https-alt
| ssl-enum-ciphers:
| TLSv1.0:
| ciphers:
| TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA - strong
| ...
| compressors:
| NULL
| TLSv1.1:
| ciphers:
| TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA - strong
| ...
| compressors:
| NULL
| TLSv1.2:
| ciphers:
| TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA - strong
| ...
| compressors:
| NULL
|_ least strength: strong

修改配置之前,扫描结果里显示仅开启了TLSv1.2,而修改配置之后,发现TLSv1.0TLSv1.1都被开启了,这俩协议也是需要关闭:

1
2
3
4
5
[root@node1 etc]# cat org.ops4j.pax.web.cfg
...
# Excluded SSL/TLS Cipher Suites comma-separated list of Regular Expressions
org.ops4j.pax.web.ssl.ciphersuites.excluded=.*NULL.*,.*RC4.*,.*MD5.*,.*DES.*,.*DSS.*,TLS_DHE.*,SSL.*,.*anon.*,.*EXPORT.*
org.ops4j.pax.web.ssl.protocols.excluded=TLSv1,TLSv1.1

参考资料

  1. https://access.redhat.com/zh_CN/articles/1480493
  2. https://www.cnblogs.com/zcg-cpdd/p/15573841.html
  3. https://stackoverflow.com/questions/30523324/how-to-config-local-jetty-ssl-to-avoid-weak-phermeral-dh-key-error
  4. https://github.com/codice/ddf/blob/master/distribution/ddf-common/src/main/resources/etc/org.ops4j.pax.web.cfg

本次发现的问题是踩到了新版本Kubernetes的坑,查找资料发现zhangsi-lzq大佬已经分析的很清楚了,此处转载过来仅做学习记录,并新增永久解决方案。

作者:zhangsi-lzq
本文出处:https://www.cnblogs.com/zhangsi-lzq/p/14292628.html

问题背景

Kubernetes升级为1.20版本后,原有的后端nfs存储storageclass无法自动创建PV。查看PVC状态一直为pending状态。

分析过程

查看nfs-provisioner日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
kubectl logs nfs-client-provisioner-5f696dc8bb-qhmsn
E0118 03:01:07.352670 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=3737, ErrCode=NO_ERROR, debug=""
E0118 03:01:07.353951 1 reflector.go:322] github.com/kubernetes-incubator/external-storage/lib/controller/controller.go:668: Failed to watch *v1.StorageClass: Get https://10.96.0.1:443/apis/storage.k8s.io/v1/storageclasses?resourceVersion=604432&timeoutSeconds=387&watch=true: dial tcp 10.96.0.1:443: connect: connection refused
W0118 03:01:07.366606 1 reflector.go:341] github.com/kubernetes-incubator/external-storage/lib/controller/controller.go:665: watch of *v1.PersistentVolume ended with: too old resource version: 11565 (605173)
W0118 03:01:07.367679 1 reflector.go:341] github.com/kubernetes-incubator/external-storage/lib/controller/controller.go:662: watch of *v1.PersistentVolumeClaim ended with: too old resource version: 11565 (605173)
I0118 03:08:28.340240 1 controller.go:987] provision "default/auth-platorm-redis-data-auth-platorm-redis-cluster-0" class "course-nfs-storage": started
E0118 03:08:28.343582 1 controller.go:1004] provision "default/auth-platorm-redis-data-auth-platorm-redis-cluster-0" class "course-nfs-storage": unexpected error getting claim reference: selfLink was empty, can't make reference
I0118 03:16:08.373590 1 controller.go:987] provision "default/auth-platorm-redis-data-auth-platorm-redis-cluster-0" class "course-nfs-storage": started
E0118 03:16:08.382178 1 controller.go:1004] provision "default/auth-platorm-redis-data-auth-platorm-redis-cluster-0" class "course-nfs-storage": unexpected error getting claim reference: selfLink was empty, can't make reference
I0118 03:30:41.647626 1 controller.go:987] provision "default/test-pvc" class "course-nfs-storage": started
E0118 03:30:41.658419 1 controller.go:1004] provision "default/test-pvc" class "course-nfs-storage": unexpected error getting claim reference: selfLink was empty, can't make reference
I0118 03:31:08.373713 1 controller.go:987] provision "default/auth-platorm-redis-data-auth-platorm-redis-cluster-0" class "course-nfs-storage": started
I0118 03:31:08.373969 1 controller.go:987] provision "default/test-pvc" class "course-nfs-storage": started
E0118 03:31:08.382279 1 controller.go:1004] provision "default/test-pvc" class "course-nfs-storage": unexpected error getting claim reference: selfLink was empty, can't make reference
E0118 03:31:08.382791 1 controller.go:1004] provision "default/auth-platorm-redis-data-auth-platorm-redis-cluster-0" class "course-nfs-storage": unexpected error getting claim reference: selfLink was empty, can't make reference

报错信息selfLink was empty,于是上网查询相关内容,在官方1.20变更说明中[1]看到其中一条说明为:

1
Stop propagating SelfLink (deprecated in 1.16) in kube-apiserver

selfLink1.16版本以后已经弃用,在1.20版本停用。而由于nfs-provisioner的实现是基于selfLink功能(同时也会影响其他用到selfLink这个功能的第三方软件),需要等nfs-provisioner的制作方重新提供新的解决方案。

解决方案

目前可用的临时方案是:修改/etc/kubernetes/manifests/kube-apiserver.yaml文件,找到如下内容后,在最后添加一项参数:

1
2
3
4
5
6
7
spec:
containers:
- command:
- kube-apiserver
- --advertise-address=192.168.210.20
- --.......
- --feature-gates=RemoveSelfLink=false  #添加此行

如果是高可用的k8s集群,则需要在所有master节点上进行此操作。

添加后需要删除apiserver的所有pod进行重启

1
2
3
kubectl delete pod kube-apiserver-master01 -n kube-system
kubectl delete pod kube-apiserver-master02 -n kube-system
kubectl delete pod kube-apiserver-master03 -n kube-system

三台apiserverkubelet重启拉起后,再次查询PVC,可以看到PVC状态都为Bound,可以正常被PV绑定了。

(新增)永久解决方案:查资料发现,nfs-provisioner项目实际上已经放弃[2,3],转而提供了下面3个provisioner,可以根据需要适配更新:

参考资料

  1. https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md
  2. https://github.com/kubernetes-retired/nfs-provisioner
  3. https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner

问题背景

  1. 查看kube-system下的系统组件,发现harbor、influxdb、coredns等组件反复重启;
  2. 使用kubectl get pod -n kube-system命令查看pod列表,发现命令会稳定的卡15s

原因分析

先看第一点,这几个pod反复重启已经遇到过几次,看过前面的问题排查文章[1,2]的应该知道,在业务高并发场景下可能出现。先使用top命令看一下负载情况:

1
2
3
4
5
6
load average: 14.76  18.45  17.85
Tasks: 1998 total, 7 running, 1937 sleeping, 0 stopped, 54 zombie
%CPU: 15.2 us, 8.3 sys, 0.7 ni, 75.3 id

cat /proc/cpuinfo|grep MHz| wc -l
40

实际负载不高,排除这个可能(实际定位过程中,也依据前面的经验,做了相关内核参数的优化,问题确实没什么改善)。那就继续看,先describe看几个异常pod的错误信息:

1
2
3
kubelet     Liveness probe failed: Get "http://177.177.138.139:8885/api/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
kubelet Readiness probe failed: Get "http://177.177.138.139:8083/"": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
kubelet Back-off restarting failed container

错误基本一致,就是kubelet调用pod提供的健康检查接口超时了,所以被不断地kill再重启。为什么调不通?模拟kubelet的调用操作,在pod所在节点上使用curl命令调用,结果显示timeout,再ping一下看看pod通信有没有问题:

1
2
ping 177.177.212.186
^C

ping不通!这也就讲通了为啥健康检查不过,因为kubelet与所在节点上的Pod根本就无法通信。为啥会这样?通过一顿验证,发现一个规律:**集群各节点无法访问自己节点上的pod,但可以访问其他节点上的pod**;

这个现象是比较奇怪的,一般来说影响节点与自己节点上pod通信的原因不多,对于使用caliccni网络插件来说,可能的原因有:

  1. pod内的ip/arp/路由异常;
  2. calico网卡的arp配置异常;
  3. 请求被iptables拦截;

分别查看1,2相关配置,显示正常:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[root@influxdb-847b74cbc5-ddxcd]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
3: eth0@if79: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 32:81:0e:f4:dd:3a brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 177.177.212.186/32 scope global eth0
valid_lft f

[root@influxdb-847b74cbc5-ddxcd]# arp
Address HWtype HWaddress Flags Mask Iface
gateway ether ee:ee:ee:ee:ee:ee C eth0

[root@influxdb-847b74cbc5-ddxcd]# ip rou
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0 scope link

[root@node]# cat /proc/sys/net/ipv4/conf/cali7acfda72e71/proxy_arp
1
[root@node]# cat /proc/sys/net/ipv4/conf/cali7acfda72e71/arp_ignore
0

对于第三点,通过iptables命令检查过,也没什么异常规则;

抓包看看啥情况:

  1. 在节点上ping本节点上的pod,同时在cali7acfda72e71上抓包,发现请求到了cali网卡,但没有响应;
  2. podping本节点ip,同时在cali7acfda72e71上抓包,发现cali网卡上有请求和响应,但依然无法ping通;

看起来像是请求被主动丢弃了,跟问题提出人确认问题环境的基本情况,发现该集群有额外安装EDR防病毒软件。为了确认该软件有没有影响,先停掉防病毒软件,观察一段时候后,发现环境恢复正常。重新启动防病毒软件,一段时间后问题复现;

与负责防病毒软件的技术沟通确认,该集群被设置到了防病毒软件的默认策略里,触发条件是默认策略组里面有防止端口扫描和流量阈值配置,结果触发了网络防护,导致节点ip被封了。经过软件提供方的调整,将该集群调整到单独的策略编组,问题解决;

问题解决后,现象1和现象2都消失了,但回过头想想,为什么会出现现象2?当时环境好了没有细究,后来自己的测试环境也出现过这个现象,通过debug日志发现,卡在了调用metric服务的接口上,根本原因就是访问metricpod不通,日志现象比较明显,这里就不贴了:

1
kubectl get pod -n kube-system -owide --v 10

解决方案

  1. 针对该集群环境,单独配置策略编组;
  2. 节点与Pod通信正常后,现象2消失;

参考资料

  1. https://lyyao09.github.io/2021/06/19/k8s/K8S%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5-%E4%B8%9A%E5%8A%A1%E9%AB%98%E5%B9%B6%E5%8F%91%E5%AF%BC%E8%87%B4Pod%E5%8F%8D%E5%A4%8D%E9%87%8D%E5%90%AF/
  2. https://lyyao09.github.io/2021/07/16/k8s/K8S%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5-%E4%B8%9A%E5%8A%A1%E9%AB%98%E5%B9%B6%E5%8F%91%E5%AF%BC%E8%87%B4Pod%E5%8F%8D%E5%A4%8D%E9%87%8D%E5%90%AF(%E7%BB%AD)/
  3. https://developer.aliyun.com/article/409634
  4. https://www.css3.io/31linuxxi-tong-diao-you.htmls

问题背景

为了解决节点上请求部分service延迟63s问题[1],我们临时把OS的版本换成了Redhat 8.4(内核版本4.18.0-305),在VMware上虚出3节点集群后部署跨三层环境失败,提示Harbor部署失败。

原因分析

说明:距离定位这个问题已经有一段时间了,其实最终也没完全定位出根本原因,所以当时也没有整理记录定位过程,这里就简单描述一下,做个记录。

通过查看Harbor的日志,发现部署失败的原因是健康检查失败,因为HarborpodIp:port请求在跨三层下超时,抓包发现请求经过vxlan.calico时止于SYN_SENT报文;

实测发现,该环境上不仅是HarborpodIp:port请求超时,其他pod服务的请求经过跨三层的网络也同样存在问题,说明是一个共性问题,并且pod网段的七层如http请求受影响,4层的icmp请求不受影响);

继续通过抓包确认,podIp:port的请求已经发送到节点网卡上,但跨三层对端的节点没有收到。所以,可以排除主机上的iptables和路由的影响。实际上,也确实跟踪了iptables的请求过程,确认请求没有被丢弃;

综上,初步怀疑请求被跨三层网络的中间设备(如交换机、路由器)丢弃了,之后相关同事在交换机上抓包,发现ping包可以抓到,但http请求依然抓不到,说明交换机侧也没有收到报文,问题原因进一步缩小,可能情况有:

  1. 出口网卡丢弃;
  2. 出网卡后,入交换机之前丢弃

通过ehtool -s xxx命令统计虚机网卡报文信息,没有发现丢弃情况,说明问题不在虚机的出网卡;

关于VMware丢弃报文的情况,找到一些资料[2-6],比如混杂模式,mms配置都会有影响:

1
2
3
4
5
6
1) 通过ping命令指定报文长度,发现跨网段依次ping一下pod的ip均返回正常,确认不是mms问题;
ping -s 1000 177.177.x.x
ping -s 1472 177.177.x.x
ping -s 1500 177.177.x.x

2) 通过临时修改网卡为混杂模式,测试问题依然存在;

进一步在服务器物理网卡上抓包(登录esxi后台,使用pktcap-uw --uplink vmnicX --dir 2 -o result.pcap,其中vmnicX表示虚机关联的上行口物理网卡, dir 2表示同时抓取双向请求), 依然是ping包可以抓到,但http请求依然抓不到,说明服务器物理网卡上也没有收到报文;

最后,丢包范围缩小到VMare下的虚机机请求出网卡后,到服务器物理网卡前 ,这中间涉及到虚拟化的实现,具体还有什么处理就不清楚了,最后改为使用物理服务器部署;

解决方案

临时改用物理服务器部署跨三层集群成功,如果要使用VMware虚机部署,还需要继续排查根因。

参考资料

  1. https://lyyao09.github.io/2022/03/19/k8s/K8S%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5-Calico%E7%9A%84Vxlan%E6%A8%A1%E5%BC%8F%E4%B8%8B%E8%8A%82%E7%82%B9%E5%8F%91%E8%B5%B7K8s%20Service%E8%AF%B7%E6%B1%82%E5%BB%B6%E8%BF%9F/
  2. https://kb.vmware.com/s/article/2113783
  3. https://kb.vmware.com/s/article/52936
  4. https://docs.vmware.com/en/VMware-NSX-T-Data-Center/2.3/troubleshooting/GUID-4E4A9468-1F2B-44E1-A474-AA048A88BF1E.html
  5. https://communities.vmware.com/t5/vCloud-Networking-and-Security/No-SYN-ACK-reply-from-VM-on-virtual-wire-VXLAN/td-p/376545
    1. https://kb.vmware.com/s/article/2051814?lang=zh_CN

问题背景

K8S集群中修改calico的网络为vxlan模式后,发现部分service在节点上无法访问(实际上是延迟访问,延迟时间稳定在1min3s左右)。

1
2
3
4
[root@node2 ~]# time curl -s http://10.96.91.255
real 1m3.136s
user 0m0.005s
sys 0m0.005s

原因分析

先确认问题范围,在环境上找多个service依次测试发现,如果调用service的节点和实际pod不在同一个节点上,则出现延迟,否则请求正常。也就是说跨节点的访问才有问题。而直接用service对应的podIP访问,也不存在问题,猜测问题可能出在servicepod的过程。

再确认基本环境,OSK8Scalico等基础环境没有发生任何变化,仅仅是把calico的网络模式从BGP改为了vxlan,但是这个改动改变了集群内servicepod的请求路径,也即所有的容器请求需要走节点上新增的calico.vxlan接口封装一下。网络模式修改前没有问题,修改后必现,之后切回BGP模式问题就消失了,说明问题可能跟新增的calico.vxlan接口有关系。

先看下环境情况,并触发service请求:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 验证环境:node2(10.10.72.11)——> node1(10.10.72.10)
# 验证方法:node2上curl service:10.96.91.255 ——> node1上pod:166.166.166.168:8082
[root@node2 ~]# ip addr
10: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1410 qdisc noqueue state UNKNOWN group default
link/ether 66:2d:bf:44:a6:8b brd ff:ff:ff:ff:ff:ff
inet 166.166.104.10/32 brd 166.166.104.10 scope global vxlan.calico
valid_lft forever preferred_lft forever

[root@node1 ~]# ip addr
15: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1410 qdisc noqueue state UNKNOWN group default
link/ether 66:f9:37:c3:7e:94 brd ff:ff:ff:ff:ff:ff
inet 166.166.166.175/32 brd 166.166.166.175 scope global vxlan.calico
valid_lft forever preferred_lft forever

[root@node2 ~]# time curl http://10.96.91.255

node1的主机网卡上抓包看看封装后的请求是否已到达:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
[root@node1 ~]# tcpdump -n -vv -i eth0 host 10.10.72.11 and udp
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
07:19:42.730996 IP (tos 0x0, ttl 64, id 6470, offset 0, flags [none], proto UDP (17), length 110)
10.10.72.11.nim-vdrshell > 10.10.72.10.4789: [bad udp cksum 0xffff -> 0x3af0!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 39190, offset 0, flags [DF], proto TCP (6), length 60)
166.166.104.10.35632 > 166.166.166.168.us-cli: Flags [S], cksum 0xe556 (correct), seq 3025623348, win 29200, options [mss 1460,sackOK,TS val 101892130 ecr 0,nop,wscale 7], length 0
07:19:43.733741 IP (tos 0x0, ttl 64, id 6804, offset 0, flags [none], proto UDP (17), length 110)
10.10.72.11.nim-vdrshell > 10.10.72.10.4789: [bad udp cksum 0xffff -> 0x3af0!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 39191, offset 0, flags [DF], proto TCP (6), length 60)
166.166.104.10.35632 > 166.166.166.168.us-cli: Flags [S], cksum 0xe16b (correct), seq 3025623348, win 29200, options [mss 1460,sackOK,TS val 101893133 ecr 0,nop,wscale 7], length 0
07:19:45.736729 IP (tos 0x0, ttl 64, id 7403, offset 0, flags [none], proto UDP (17), length 110)
10.10.72.11.nim-vdrshell > 10.10.72.10.4789: [bad udp cksum 0xffff -> 0x3af0!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 39192, offset 0, flags [DF], proto TCP (6), length 60)
166.166.104.10.35632 > 166.166.166.168.us-cli: Flags [S], cksum 0xd998 (correct), seq 3025623348, win 29200, options [mss 1460,sackOK,TS val 101895136 ecr 0,nop,wscale 7], length 0
07:19:49.744801 IP (tos 0x0, ttl 64, id 9648, offset 0, flags [none], proto UDP (17), length 110)
10.10.72.11.nim-vdrshell > 10.10.72.10.4789: [bad udp cksum 0xffff -> 0x3af0!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 39193, offset 0, flags [DF], proto TCP (6), length 60)
166.166.104.10.35632 > 166.166.166.168.us-cli: Flags [S], cksum 0xc9f0 (correct), seq 3025623348, win 29200, options [mss 1460,sackOK,TS val 101899144 ecr 0,nop,wscale 7], length 0
07:19:57.768735 IP (tos 0x0, ttl 64, id 12853, offset 0, flags [none], proto UDP (17), length 110)
10.10.72.11.nim-vdrshell > 10.10.72.10.4789: [bad udp cksum 0xffff -> 0x3af0!] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 64, id 39194, offset 0, flags [DF], proto TCP (6), length 60)
166.166.104.10.35632 > 166.166.166.168.us-cli: Flags [S], cksum 0xaa98 (correct), seq 3025623348, win 29200, options [mss 1460,sackOK,TS val 101907168 ecr 0,nop,wscale 7], length 0
07:20:05.087057 IP (tos 0x0, ttl 64, id 8479, offset 0, flags [none], proto UDP (17), length 164)
10.10.72.10.34565 > 10.10.72.11.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 3425, offset 0, flags [DF], proto UDP (17), length 114)
166.166.166.168.57850 > 166.166.104.6.domain: [udp sum ok] 10121+ AAAA? influxdb-nginx-service.kube-system.svc.kube-system.svc.cluster.local. (86)
07:20:05.087076 IP (tos 0x0, ttl 64, id 54475, offset 0, flags [none], proto UDP (17), length 164)
10.10.72.10.51841 > 10.10.72.11.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 3424, offset 0, flags [DF], proto UDP (17), length 114)
166.166.166.168.57984 > 166.166.104.6.domain: [udp sum ok] 20020+ A? influxdb-nginx-service.kube-system.svc.kube-system.svc.cluster.local. (86)
07:20:05.087671 IP (tos 0x0, ttl 64, id 13540, offset 0, flags [none], proto UDP (17), length 257)
10.10.72.11.60395 > 10.10.72.10.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 19190, offset 0, flags [DF], proto UDP (17), length 207)
166.166.104.6.domain > 166.166.166.168.57850: [udp sum ok] 10121 NXDomain*- q: AAAA? influxdb-nginx-service.kube-system.svc.kube-system.svc.cluster.local. 0/1/0 ns: cluster.local. SOA ns.dns.cluster.local. hostmaster.cluster.local. 1647633218 7200 1800 86400 5 (179)
07:20:05.087702 IP (tos 0x0, ttl 64, id 13541, offset 0, flags [none], proto UDP (17), length 257)
10.10.72.11.48571 > 10.10.72.10.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 19191, offset 0, flags [DF], proto UDP (17), length 207)
166.166.104.6.domain > 166.166.166.168.57984: [udp sum ok] 20020 NXDomain*- q: A? influxdb-nginx-service.kube-system.svc.kube-system.svc.cluster.local. 0/1/0 ns: cluster.local. SOA ns.dns.cluster.local. hostmaster.cluster.local. 1647633218 7200 1800 86400 5 (179)
07:20:05.088801 IP (tos 0x0, ttl 64, id 8480, offset 0, flags [none], proto UDP (17), length 152)
10.10.72.10.55780 > 10.10.72.11.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 3427, offset 0, flags [DF], proto UDP (17), length 102)
166.166.166.168.56015 > 166.166.104.6.domain: [udp sum ok] 19167+ AAAA? influxdb-nginx-service.kube-system.svc.svc.cluster.local. (74)
07:20:05.089048 IP (tos 0x0, ttl 64, id 13542, offset 0, flags [none], proto UDP (17), length 245)
10.10.72.11.50151 > 10.10.72.10.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 19192, offset 0, flags [DF], proto UDP (17), length 195)
166.166.104.6.domain > 166.166.166.168.56015: [udp sum ok] 19167 NXDomain*- q: AAAA? influxdb-nginx-service.kube-system.svc.svc.cluster.local. 0/1/0 ns: cluster.local. SOA ns.dns.cluster.local. hostmaster.cluster.local. 1647633218 7200 1800 86400 5 (167)
07:20:05.089212 IP (tos 0x0, ttl 64, id 8481, offset 0, flags [none], proto UDP (17), length 148)
10.10.72.10.50272 > 10.10.72.11.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 3430, offset 0, flags [DF], proto UDP (17), length 98)
166.166.166.168.54926 > 166.166.104.6.domain: [udp sum ok] 40948+ A? influxdb-nginx-service.kube-system.svc.cluster.local. (70)
07:20:05.089403 IP (tos 0x0, ttl 64, id 13543, offset 0, flags [none], proto UDP (17), length 241)
10.10.72.11.59882 > 10.10.72.10.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 19193, offset 0, flags [DF], proto UDP (17), length 191)
166.166.104.6.domain > 166.166.166.168.54926: [udp sum ok] 40948 NXDomain*- q: A? influxdb-nginx-service.kube-system.svc.cluster.local. 0/1/0 ns: cluster.local. SOA ns.dns.cluster.local. hostmaster.cluster.local. 1647633218 7200 1800 86400 5 (163)
07:20:05.089524 IP (tos 0x0, ttl 64, id 8482, offset 0, flags [none], proto UDP (17), length 134)
10.10.72.10.58964 > 10.10.72.11.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 3431, offset 0, flags [DF], proto UDP (17), length 84)
166.166.166.168.50263 > 166.166.104.6.domain: [udp sum ok] 18815+ A? influxdb-nginx-service.kube-system.svc. (56)
07:20:05.089681 IP (tos 0x0, ttl 64, id 13544, offset 0, flags [none], proto UDP (17), length 134)
10.10.72.11.51874 > 10.10.72.10.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 19194, offset 0, flags [DF], proto UDP (17), length 84)
166.166.104.6.domain > 166.166.166.168.50263: [udp sum ok] 18815 ServFail- q: A? influxdb-nginx-service.kube-system.svc. 0/0/0 (56)
07:20:05.089706 IP (tos 0x0, ttl 64, id 8483, offset 0, flags [none], proto UDP (17), length 134)
10.10.72.10.59891 > 10.10.72.11.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 3433, offset 0, flags [DF], proto UDP (17), length 84)
166.166.166.168.49202 > 166.166.104.6.domain: [udp sum ok] 58612+ AAAA? influxdb-nginx-service.kube-system.svc. (56)
07:20:05.089859 IP (tos 0x0, ttl 64, id 13545, offset 0, flags [none], proto UDP (17), length 134)
10.10.72.11.44146 > 10.10.72.10.4789: [no cksum] VXLAN, flags [I] (0x08), vni 4096
IP (tos 0x0, ttl 63, id 19195, offset 0, flags [DF], proto UDP (17), length 84)
166.166.104.6.domain > 166.166.166.168.49202: [udp sum ok] 58612 ServFail- q: AAAA? influxdb-nginx-service.kube-system.svc. 0/0/0 (56)

从抓包结果看,出现一个可疑点:前几个报文中提示bad udp cksum 0xffff,请求通的最后几个报文提示的是no cksum

根据这个错误信息,搜索发现是个已知bug,相关的详细定位可以参考[1]-[3],这里就不细说了。大概原因如下所述:

内核中存在一个和VXLAN处理有关的缺陷,该缺陷会导致Checksum Offloading不能正确完成。这个缺陷仅仅在很边缘的场景下才会表现出来。

VXLANUDP头被NAT过的前提下,如果:

  1. VXLAN设备禁用(这是RFC的建议)了UDP Checksum
  2. VXLAN设备启用了Tx Checksum Offloading

就会导致生成错误的UDP Checksum

从资料[1]看,K8Sv1.18.5版本已经修复了这个问题,但我的问题是在v1.21.0上发现的,所以不确定只升级K8S是否可以解决该问题,或者升级后还需要额外配置什么?

从资料[3]和[4]看,calicov3.20.0版本做了修改:在kernels < v5.7时也禁用了calico.vxlan接口的Offloading功能。

本地临时禁用并验证:

1
2
3
4
5
6
7
8
9
10
11
[root@node2 ~]# ethtool --offload vxlan.calico rx off tx off
Actual changes:
rx-checksumming: off
tx-checksumming: off
tx-checksum-ip-generic: off
tcp-segmentation-offload: off
tx-tcp-segmentation: off [requested on]
tx-tcp-ecn-segmentation: off [requested on]
tx-tcp6-segmentation: off [requested on]
tx-tcp-mangleid-segmentation: off [requested on]
udp-fragmentation-offload: off [requested on]
1
2
3
4
[root@node2 ~]# time curl http://10.96.91.255
real 0m0.009s
user 0m0.002s
sys 0m0.007s

请求恢复正常。

解决方案

  1. 临时解决: ethtool --offload vxlan.calico rx off tx off
  2. 永久解决:升级calico >=v3.20.0或升级内核到5.6.13, 5.4.41, 4.19.123, 4.14.181,单独升级K8S >= v1.18.5版本待确认是否能解决

参考资料

  1. https://blog.gmem.cc/nodeport-63s-delay-due-to-kernel-issue
  2. https://cloudnative.to/blog/kubernetes-1-17-vxlan-63s-delay/
  3. https://bowser1704.github.io/posts/vxlan-bug/
  4. https://github.com/projectcalico/calico/issues/3145
  5. https://github.com/projectcalico/felix/pull/2811/files

问题描述

Docker守护进程在异常断电后卡在activating状态,并且内存占用在无限增加。

1
2
3
4
5
6
7
8
9
10
11
12
[root@node01 docker]# service docker status
Redirecting to /bin/systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/docker.service.d
└─restart.conf
Active: activating (start) since Tue 2021-12-07 15:44:32 CST; 1min 18s ago
Docs: https:/docs.docker.com
Main PID: 274797 (dockerd)
Tasks: 481
Memory: 15.4G -- 占用不正常
CGroup: /system
1
2
3
4
5
6
7
8
9
10
11
12
13
[root@node01 docker]# service docker status
Redirecting to /bin/systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/docker.service.d
└─restart.conf
Active: deactivating (stop-sigterm)
Docs: https:/docs.docker.com
Main PID: 274797 (dockerd)
Tasks: 481
Memory: 247.3G -- 一直在疯狂增加
CGroup: /system.slice/docker.service
└─274

定位过程

首先,查看docker版本和docker info信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[root@node01files]# docker version
Client: Docker Engine - Community
Version: 20.10.7
API version: 1.41
Go version: go1.13.15
Git commit: f0df350
Built: Wed Jun 2 11:58:10 2021
OS/Arch: linux/amd64
Context: default
Experimental: true

Server: Docker Engine - Community
Engine:
Version: 20.10.7
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: b0f5bc3
runc:
Version: 1.0.0-rc95
GitCommit: b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
docker-init:
Version: 0.19.0
GitCommit: de40ad0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
[root@node01 files]# docker info
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)
scan: Docker Scan (Docker Inc., v0.8.0)

Server:
Server Version: 20.10.7
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d71fcd7d8303cbf684402823e425e9dd2e99285d
runc version: b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
init version: de40ad0
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-957.21.3.el7.x86_64
Operating System: CentOS 7.6
OSType: linux
Architecture: x86_64
Total Memory: 256GiB
Name: node01
Docker Root Dir: /var/lib/docker
Debug Mode: false
File Descriptors: 366
Goroutines: 290
EventsListeners: 0
Registry: https:/index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: true

版本还算比较新,查看docker日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
Dec 07 14:57:57 node01 dockerd[239420]: time="2021-12-07T14:57:57.831370861+08:00" level=info msg="Starting up"
Dec 07 14:57:57 node01 dockerd[239420]: time="2021-12-07T14:57:57.950367668+08:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Dec 07 14:57:57 node01 dockerd[239420]: time="2021-12-07T14:57:57.950430356+08:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Dec 07 14:57:57 node01 dockerd[239420]: time="2021-12-07T14:57:57.950486773+08:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix://run/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" module=grpc
Dec 07 14:57:57 node01 dockerd[239420]: time="2021-12-07T14:57:57.950524941+08:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Dec 07 14:57:58 node01 dockerd[239420]: time="2021-12-07T14:57:58.004622322+08:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Dec 07 14:57:58 node01 dockerd[239420]: time="2021-12-07T14:57:58.004663918+08:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Dec 07 14:57:58 node01 dockerd[239420]: time="2021-12-07T14:57:58.004697382+08:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix://run/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>}" module=grpc
Dec 07 14:57:58 node01 dockerd[239420]: time="2021-12-07T14:57:58.004726533+08:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Dec 07 14:58:14 node01 dockerd[239420]: time="2021-12-07T14:58:14.832862519+08:00" level=info msg="[graphdriver] using prior storage driver: overlay2"
Dec 07 14:59:00 node01 dockerd[239420]: time="2021-12-07T14:59:00.039554832+08:00" level=info msg="Loading containers: start."
Dec 07 15:02:27 node01 dockerd[239420]: fatal error: runtime: out of memory
Dec 07 15:02:27 node01 dockerd[239420]: runtime stack:
Dec 07 15:02:27 node01 dockerd[239420]: runtime.throw(0x5604d0deee9d, 0x16)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/panic.go:774 +0x74
Dec 07 15:02:28 node01 dockerd[239420]: runtime.sysMap(0xfb6c000000, 0x5c000000, 0x5604d35dab58)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/mem_linux.go:169 +0xc7
Dec 07 15:02:28 node01 dockerd[239420]: runtime.(*mheap).sysAlloc(0x5604d35be240, 0x5beee000, 0x7f6fd1fbab88, 0x5604cf22bf6b)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/malloc.go:701 +0x1cf
Dec 07 15:02:28 node01 dockerd[239420]: runtime.(*mheap).grow(0x5604d35be240, 0x2df77, 0xffffffff)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/mheap.go:1255 +0xa5
Dec 07 15:02:28 node01 dockerd[239420]: runtime.(*mheap).allocSpanLocked(0x5604d35be240, 0x2df77, 0x5604d35dab68, 0x5604cf212927)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/mheap.go:1170 +0x268
Dec 07 15:02:28 node01 dockerd[239420]: runtime.(*mheap).alloc_m(0x5604d35be240, 0x2df77, 0x7f6c01180100, 0x7f6c01184350)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/mheap.go:1022 +0xc6
Dec 07 15:02:28 node01 dockerd[239420]: runtime.(*mheap).alloc.func1()
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/mheap.go:1093 +0x4e
Dec 07 15:02:28 node01 dockerd[239420]: runtime.(*mheap).alloc(0x5604d35be240, 0x2df77, 0x7f6fd1010100, 0x7f6c01184350)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/mheap.go:1092 +0x8c
Dec 07 15:02:28 node01 dockerd[239420]: runtime.largeAlloc(0x5beee000, 0x5604cf250001, 0x7f6c01184350)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/malloc.go:1138 +0x99
Dec 07 15:02:28 node01 dockerd[239420]: runtime.mallocgc.func1()
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/malloc.go:1033 +0x48
Dec 07 15:02:28 node01 dockerd[239420]: runtime.systemstack(0x0)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/asm_amd64.s:370 +0x63
Dec 07 15:02:28 node01 dockerd[239420]: runtime.mstart()
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/proc.go:1146
Dec 07 15:02:28 node01 dockerd[239420]: goroutine 1 [running]:
Dec 07 15:02:28 node01 dockerd[239420]: runtime.systemstack_switch()
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/asm_amd64.s:330 fp=0xc0008bff70 sp=0xc0008bff68 pc=0x5604cf2528e0
Dec 07 15:02:28 node01 dockerd[239420]: runtime.mallocgc(0x5beee000, 0x5604d1e8fde0, 0x1, 0x0)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/malloc.go:1032 +0x8a7 fp=0xc0008c0010 sp=0xc0008bff70 pc=0x5604cf200ef7
Dec 07 15:02:28 node01 dockerd[239420]: runtime.growslice(0x5604d1e8fde0, 0xef119e4000, 0x3107eaa, 0x3107eaa, 0x3107eab, 0x5604cf20146b, 0xfb6bbbfb80, 0x7f6c01184350)
Dec 07 15:02:28 node01 dockerd[239420]: /usr/local/go/src/runtime/slice.go:181 +0x1e4 fp=0xc0008c0078 sp=0xc0008c0010 pc=0x5604cf23b0b4
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/go.etcd.io/bbolt.(*Cursor).first(0xc0008c0288)
Dec 07 15:02:28 node01 dockerd[239420]: docker/vendor/go.etcd.io/bbolt/cursor.go:182 +0x138 fp=0xc0008c00d8 sp=0xc0008c0078 pc=0x5604d012d348
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/go.etcd.io/bbolt.(*Cursor).next(0xc0008c0288, 0x0, 0x9f, 0x9f, 0x7f7150506429, 0x2b8, 0xc0008c0188, 0x5604cf23ee10)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/go.etcd.io/bbolt/cursor.go:234 +0x84 fp=0xc0008c0128 sp=0xc0008c00d8 pc=0x5604d012d684
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/go.etcd.io/bbolt.(*Cursor).Next(0xc0008c0288, 0x7f715050638a, 0x9f, 0x9f, 0xfb6bbcf720, 0x9f, 0x2b8)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/go.etcd.io/bbolt/cursor.go:75 +0x43 fp=0xc0008c0198 sp=0xc0008c0128 pc=0x5604d012ca43
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libkv/store/boltdb.(*BoltDB).List.func1(0xc0034350a0, 0xc0008c0200, 0xc0034350a0)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libkv/store/boltdb/boltdb.go:288 +0x19a fp=0xc0008c02b8 sp=0xc0008c0198 pc=0x5604d0142a4a
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/go.etcd.io/bbolt.(*DB).View(0xc0001dc200, 0xc0008c03f8, 0x0, 0x0)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/go.etcd.io/bbolt/db.go:725 +0xaa fp=0xc0008c0340 sp=0xc0008c02b8 pc=0x5604d0130a3a
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libkv/store/boltdb.(*BoltDB).List(0xc00482f2c0, 0xc001a7af00, 0x5e, 0x0, 0x0, 0x0, 0x0, 0x0)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libkv/store/boltdb/boltdb.go:279 +0x1b3 fp=0xc0008c0430 sp=0xc0008c0340 pc=0x5604d0141583
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork/datastore.(*cache).kmap(0xc003bf5900, 0x5604d220c1a0, 0xc001e7c160, 0x0, 0x5e, 0xc001a7ae40)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork/datastore/cache.go:43 +0x286 fp=0xc0008c0560 sp=0xc0008c0430 pc=0x5604cfcc81a6
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork/datastore.(*cache).list(0xc003bf5900, 0x5604d220c1a0, 0xc001e7c160, 0x0, 0x0, 0x0, 0x0, 0x0)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork/datastore/cache.go:164 +0x7e fp=0xc0008c0678 sp=0xc0008c0560 pc=0x5604cfcc988e
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork/datastore.(*datastore).List(0xc0079ff680, 0xc001a7ae40, 0x5e, 0x5604d220c1a0, 0xc001e7c160, 0x0, 0x0, 0x0, 0x0, 0x0)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork/datastore/datastore.go:517 +0x205 fp=0xc0008c0730 sp=0xc0008c0678 pc=0x5604cfccc745
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork.(*network).getEndpointsFromStore(0xc00289a380, 0xc004a99258, 0x7, 0x5604d2206920, 0xc001a515f0, 0xc002f3ff48)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork/store.go:190 +0x343 fp=0xc0008c09f8 sp=0xc0008c0730 pc=0x5604d01acaf3
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork.(*controller).reservePools(0xc0004b7400)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork/controller.go:977 +0x4c1 fp=0xc0008c0c28 sp=0xc0008c09f8 pc=0x5604d0159701
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork.New(0xc001055d00, 0x9, 0x10, 0xc0007ac870, 0xc001387fb0, 0xc001055d00, 0x9)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/vendor/libnetwork/controller.go:245 +0x615 fp=0xc0008c0dd8 sp=0xc0008c0c28 pc=0x5604d0154815
Dec 07 15:02:28 node01 dockerd[239420]: /docker/daemon.(*Daemon).initNetworkController(0xc00000c1e0, 0xc000207080, 0xc001387fb0, 0xc000236f50, 0xc00000c1e0, 0xc0007170c8, 0xc001387fb0)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/daemon/daemon_unix.go:855 +0xa9 fp=0xc0008c0e90 sp=0xc0008c0dd8 pc=0x5604d0a49429
Dec 07 15:02:28 node01 dockerd[239420]: /docker/daemon.(*Daemon).restore(0xc00000c1e0, 0xc00004e5c0, 0xc000220000)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/daemon/daemon.go:490 +0x50b fp=0xc0008c1088 sp=0xc0008c0e90 pc=0x5604d0a3898b
Dec 07 15:02:28 node01 dockerd[239420]: /docker/daemon.NewDaemon(0x5604d21de4c0, 0xc00004e5c0, 0xc000207080, 0xc0007ac870, 0x0, 0x0, 0x0)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/daemon/daemon.go:1147 +0x2be8 fp=0xc0008c19c8 sp=0xc0008c1088 pc=0x5604d0a3c938
Dec 07 15:02:28 node01 dockerd[239420]: main.(*DaemonCli).start(0xc0007ab410, 0xc000218600, 0x0, 0x0)
Dec 07 15:02:28 node01 dockerd[239420]: /docker/cmd/dockerd/daemon.go:195 +0x743 fp=0xc0008c1d00 sp=0xc0008c19c8 pc=0x5604d0dbb3c3
Dec 07 15:02:28 node01 dockerd[239420]: main.runDaemon(...)

日志中存在明显的异常堆栈打印,分析可知,启动过程中走到Loading containers: start.之后卡住,然后打印fatal error: runtime: out of memory,也就是内存爆了。根据堆栈信息,可以看出异常的调用路径如下:

1
NewDaemon —> restore() —> initNetworkController —> libnetwork.New() —> reservePools() —>  getEndpointsFromStore() —> List —> cache.list(kvObject) —> cache.kmap(kvObject) —> List(keyPrefix) —> Next —> next —> first

拉取对应版本的代码,根据上述调用过程,找到指定的代码位置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// List returns the range of keys starting with the passed in prefix
func (b *BoltDB) List(keyPrefix string) ([]*store.KVPair, error) {
var (
db *bolt.DB
err error
)
b.Lock()
defer b.Unlock()

kv := []*store.KVPair{}

if db, err = b.getDBhandle(); err != nil {
return nil, err
}
defer b.releaseDBhandle()

err = db.View(func(tx *bolt.Tx) error {
bucket := tx.Bucket(b.boltBucket)
if bucket == nil {
return store.ErrKeyNotFound
}

cursor := bucket.Cursor()
prefix := []byte(keyPrefix)

for key, v := cursor.Seek(prefix); bytes.HasPrefix(key, prefix); key, v = cursor.Next() {

dbIndex := binary.LittleEndian.Uint64(v[:libkvmetadatalen])
v = v[libkvmetadatalen:]
val := make([]byte, len(v))
copy(val, v)

kv = append(kv, &store.KVPair{
Key: string(key),
Value: val,
LastIndex: dbIndex,
})
}
return nil
})
if len(kv) == 0 {
return nil, store.ErrKeyNotFound
}
return kv, err
}

可以看出,这段是读取boltdb数据库(用于缓存网络配置),从之前了解看,这个数据库在异常断电时很容易损坏,所以怀疑是数据库损坏了,导致此处的遍历读取超出了预期的循环次数,而每次循环都会创建变量,分配内存,最终被内核OOM

docker社区查找相关issue[1] [2],发现确实存在boltdb数据库损坏的现象,不过最终报错的现象不太一样。最后,在docker社区也提了个issue[3],社区反馈也是怀疑boltdb数据库损坏,并建议可以把local-kv.db文件删除再重启来恢复。

个人觉得,上面报错的地方可以优化一下,对db文件做一次检查,如果检查到异常,提前抛异常,而不是不停地吃内存(由于异常的环境被破坏了,这个想法需要等复现后再考虑优化)。

解决方案

  1. 临时解决:删除local-kv.db文件再重启docker服务。
  2. 永久解决:优化异常代码。

参考资料

  1. https://github.com/moby/moby/issues/37238
  2. https://github.com/moby/moby/issues/42099
  3. https://github.com/moby/moby/issues/43072

问题背景

karaf框架没有直接依赖log4j包,所以简单的升级项目中的log4j或实际项目中没有log4j,都无法解决最近发现的漏洞问题(CVE-2021-44228CVE-2021-45046CVE-2021-45046)。

分析过程

分析发现,karaf框架实际依赖的日志包是org.ops4j.pax.logging.xxx,而org.ops4j.pax.logging.xxx依赖了log4j,相当于做了一层包装。所以,要解决漏洞,有三种升级方式:

  1. 升级框架:这个影响就比较大了,而且框架的版本发布周期比较慢,目前还没有编译好的框架包,要升级框架就需要自己编译出所有的框架包,风险较大;
  2. 升级依赖包:影响较小,如果没有配置依赖包的地方,可能无法升级;(实际确认,无法单独升级)
  3. 修改当前版本依赖包并重新编译:影响较小,如果与最新版本跨度较大,可能修改点会很多;

综合比较,考虑使用第3个方案走走看,从参考资料[1]的代码提交记录看,org.ops4j.pax.logging为了解决log4j漏洞,仅涉及依赖包log4j的版本升级,版本跨度是从1.11.9升级到1.11.12,跨度不大,实际有哪些修改点,先编译看看有没有问题:

1
2
3
4
5
6
7
8
9
pom.xml
<version.org.apache.felix.configadmin>1.9.20</version.org.apache.felix.configadmin>
<version.org.apache.felix.framework>5.6.12</version.org.apache.felix.framework>
<version.org.apache.felix6.framework>6.0.3</version.org.apache.felix6.framework>
- <version.org.apache.logging.log4j>2.16.0</version.org.apache.logging.log4j>
+ <version.org.apache.logging.log4j>2.17.0</version.org.apache.logging.log4j>
<version.org.apache.servicemix.bundles.javax-inject>1_3</version.org.apache.servicemix.bundles.javax-inject>
<version.org.jboss.logging>3.4.1.Final</version.org.jboss.logging>
<version.org.mockito>3.7.7</version.org.mockito>

下载当前使用的版本,从源码的各模块看,可能需要jdk9

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[root@node org.ops4j.pax.logging-logging-1.11.9]# ll
total 144
-rw-r--r-- 1 root root 939 Feb 23 2021 CONTRIBUTORS.txt
-rw-r--r-- 1 root root 755 Feb 23 2021 license-header.txt
-rw-r--r-- 1 root root 12525 Feb 23 2021 LICENSE.txt
drwxr-xr-x 4 root root 99 Dec 21 17:51 pax-logging-api
drwxr-xr-x 4 root root 46 Dec 21 17:51 pax-logging-api-java9
drwxr-xr-x 4 root root 46 Dec 21 17:51 pax-logging-it
drwxr-xr-x 6 root root 106 Feb 23 2021 pax-logging-it-karaf
drwxr-xr-x 4 root root 99 Dec 21 17:51 pax-logging-log4j2
drwxr-xr-x 4 root root 99 Dec 21 17:51 pax-logging-log4j2-extra
drwxr-xr-x 4 root root 99 Dec 21 17:51 pax-logging-logback
drwxr-xr-x 2 root root 21 Feb 23 2021 pax-logging-report
drwxr-xr-x 10 root root 166 Feb 23 2021 pax-logging-samples
drwxr-xr-x 4 root root 99 Dec 21 17:51 pax-logging-service
-rw-r--r-- 1 root root 46604 Dec 21 17:46 pom.xml
-rw-r--r-- 1 root root 67356 Feb 23 2021 readme.adoc
drwxr-xr-x 3 root root 18 Feb 23 2021 src

先下载个maven:3.3.9-jdk-9编译镜像试试:

1
2
3
4
5
6
7
[root@node]# docker pull maven:3.3.9-jdk-9
3.3.9-jdk-9: Pulling from library/maven
...
2ce3b259f3e2: Pull complete
Digest: sha256:ad6b04c52e7f83c05e8840e0b1de0c39ba097c1e40efb294e740db303468cbe8
Status: Downloaded newer image for maven:3.3.9-jdk-9
docker.io/library/maven:3.3.9-jdk-9

启动编译,先尝试编译原始版本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[root@node]# docker run -it -v /home:/home docker.io/library/maven:3.3.9-jdk-9 bash
root@aae0956cb558:/home# cd org.ops4j.pax.logging-logging-1.11.9
root@aae0956cb558:/home/org.ops4j.pax.logging-logging-1.11.9# mvn clean install -Dmaven.test.skip=true
[INFO] Scanning for projects...
[WARNING] Error injecting: org.apache.maven.wagon.providers.http.HttpWagon
java.lang.ExceptionInInitializerError
at java.base/javax.crypto.JceSecurityManager.<clinit>(JceSecurityManager.java:66)
at java.base/javax.crypto.Cipher.getConfiguredPermission(Cipher.java:2610)
at java.base/javax.crypto.Cipher.getMaxAllowedKeyLength(Cipher.java:2634)
...
Caused by: java.lang.SecurityException: Can not initialize cryptographic mechanism
at java.base/javax.crypto.JceSecurity.<clinit>(JceSecurity.java:118)
... 96 more
Caused by: java.lang.SecurityException: Can't read cryptographic policy directory: unlimited
at java.base/javax.crypto.JceSecurity.setupJurisdictionPolicies(JceSecurity.java:324)
at java.base/javax.crypto.JceSecurity.access$000(JceSecurity.java:73)
at java.base/javax.crypto.JceSecurity$1.run(JceSecurity.java:109)
at java.base/javax.crypto.JceSecurity$1.run(JceSecurity.java:106)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.crypto.JceSecurity.<clinit>(JceSecurity.java:105)
... 96 more

参考资料[2],问题出在编译镜像的$JAVA_HOME/conf命令下找不到一个安全相关的配置文件,查看一下发现conf文件夹都不存在,说明编译镜像有问题:

1
2
3
4
5
6
7
8
root@aae0956cb558:/home/org.ops4j.pax.logging-logging-1.11.9# env
JAVA_HOME=/usr/lib/jvm/java-9-openjdk-amd64

root@aae0956cb558:/home/org.ops4j.pax.logging-logging-1.11.9# ls /usr/lib/jvm/java-9-openjdk-amd64
bin docs include jmods legal lib man src.zip

//正常情况下$JAVA_HOME/conf目录下有以下文件
java-9-openjdk logging.properties management net.properties security sound.properties

那就换个新版本试试,下载maven:3.6-openjdk-11编译镜像:

1
2
3
4
5
6
[root@node]# docker pull maven:3.6-openjdk-11
3.6-openjdk-11: Pulling from library/maven
...s
Digest: sha256:1d29ccf46ef2a5e64f7de3d79a63f9bcffb4dc56be0ae3daed5ca5542b38aa2d
Status: Downloaded newer image for maven:3.6-openjdk-11
docker.io/library/maven:3.6-openjdk-11

启动编译:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[root@node]# docker run -it -v /home:/home maven:3.6-openjdk-11 bash
root@ff2407bc2d9e:/# cd /home/org.ops4j.pax.logging-logging-1.11.9
root@ff2407bc2d9e:/home/org.ops4j.pax.logging-logging-1.11.9# mvn clean install -Dmaven.test.skip=true
[INFO] Scanning for projects...
Downloading from knopflerfish: http://resources.knopflerfish.org/repo/maven2/release/org/ops4j/master/4.3.0/master-4.3.0.pom
...
[INFO]
[INFO] --- maven-clean-plugin:3.1.0:clean (default-clean) @ pax-logging-api-java9 ---
[INFO]
[INFO] --- maven-enforcer-plugin:3.0.0-M1:enforce (enforce-maven) @ pax-logging-api-java9 ---
[INFO]
[INFO] --- maven-toolchains-plugin:3.0.0:toolchain (default) @ pax-logging-api-java9 ---
[INFO] Required toolchain: jdk [ version='[9, )' ]
[ERROR] No toolchain found for type jdk
[ERROR] Cannot find matching toolchain definitions for the following toolchain types:
jdk [ version='[9, )' ]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for OPS4J Pax Logging (Build POM) 1.11.9:
[INFO]
[INFO] OPS4J Pax Logging (Build POM) ...................... SUCCESS [ 35.399 s]
[INFO] OPS4J Pax Logging - API (Java9) .................... FAILURE [ 10.045 s]
...
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-toolchains-plugin:3.0.0:toolchain (default) on project pax-logging-api-java9: Cannot find matching toolchain definitions for the following toolchain types:
[ERROR] jdk [ version='[9, )' ]
[ERROR] Please make sure you define the required toolchains in your ~/.m2/toolchains.xml file.

第一个问题解决了,但从报错信息看,应该是要求jdk9的版本,那就再换一个maven:3.5-jdk-9-slim镜像:

1
2
3
4
5
6
7
[root@node]# docker pull maven:3.5-jdk-9-slim
3.5-jdk-9-slim: Pulling from library/maven
...
7afb9733d3e4: Pull complete
Digest: sha256:f5d85a2b5498c0a36a6515722e108969ff2fcfec5bef6c8ef83c8ebc4b671af1
Status: Downloaded newer image for maven:3.5-jdk-9-slim
docker.io/library/maven:3.5-jdk-9-slim

继续编译:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[root@node]# docker run -it -v /home:/home docker.io/library/maven:3.5-jdk-9-slim  bash
root@895be557c3cd:/# cd /home/org.ops4j.pax.logging-logging-1.11.9
root@895be557c3cd:/home/org.ops4j.pax.logging-logging-1.11.9# mvn clean install -Dmaven.test.skip=true
[INFO] Scanning for projects...
Downloading from knopflerfish: http://resources.knopflerfish.org/repo/maven2/release/org/ops4j/master/4.3.0/master-4.3.0.pom
...
[INFO]
[INFO] --- maven-clean-plugin:3.1.0:clean (default-clean) @ pax-logging-api-java9 ---
[INFO]
[INFO] --- maven-enforcer-plugin:3.0.0-M1:enforce (enforce-maven) @ pax-logging-api-java9 ---
[INFO]
[INFO] --- maven-toolchains-plugin:3.0.0:toolchain (default) @ pax-logging-api-java9 ---
[INFO] Required toolchain: jdk [ version='[9, )' ]
[ERROR] No toolchain found for type jdk
[ERROR] Cannot find matching toolchain definitions for the following toolchain types:
jdk [ version='[9, )' ]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] OPS4J Pax Logging (Build POM) 1.11.9 ............... SUCCESS [01:27 min]
[INFO] OPS4J Pax Logging - API (Java9) .................... FAILURE [ 17.560 s]
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-toolchains-plugin:3.0.0:toolchain (default) on project pax-logging-api-java9: Cannot find matching toolchain definitions for the following toolchain types:
[ERROR] jdk [ version='[9, )' ]
[ERROR] Please make sure you define the required toolchains in your ~/.m2/toolchains.xml file.

当前编译镜像已经符合要求了,但依然报错,参考资料[3],需要在~/.m2/目录下创建toolchains.xml文件并做相关配置:

通过env命令查看java的相关环境变量:

1
2
3
4
5
6
root@895be557c3cd:/home/org.ops4j.pax.logging-logging-1.11.9# env
...
JAVA_HOME=/docker-java-home
JAVA_VERSION=9.0.4+12
JAVA_DEBIAN_VERSION=9.0.4+12-4
...

修改versionjdkHome字段:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

root@895be557c3cd:/home/org.ops4j.pax.logging-logging-1.11.9# cat ~/.m2/toolchains.xml
<?xml version="1.0" encoding="UTF-8"?>
<toolchains xmlns="http://maven.apache.org/TOOLCHAINS/1.1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/TOOLCHAINS/1.1.0 http://maven.apache.org/xsd/toolchains-1.1.0.xsd">
<toolchain>
<type>jdk</type>
<provides>
<version>9</version>
<vendor>oracle</vendor>
</provides>
<configuration>
<jdkHome>/docker-java-home</jdkHome>
</configuration>
</toolchain>
</toolchains>

继续编译:

1
2
3
4
5
6
7
8
9
10
11
root@895be557c3cd:/home/org.ops4j.pax.logging-logging-1.11.9# mvn clean install -Dmaven.test.skip=true
[INFO] Scanning for projects...
[INFO] Inspecting build with total of 19 modules...
[INFO] Installing Nexus Staging features:
[INFO] ... total of 19 executions of maven-deploy-plugin replaced with nexus-staging-maven-plugin
[INFO] Reactor Summary:
[INFO]
[INFO] OPS4J Pax Logging (Build POM) 1.11.9 ............... SUCCESS [ 0.335 s]
[INFO] OPS4J Pax Logging - API (Java9) .................... SUCCESS [ 40.867 s]
[INFO] OPS4J Pax Logging - API ............................ SUCCESS [ 51.690 s]
[INFO] OPS4J Pax Logging - API tests ...................... FAILURE [02:11 min]

失败在test模块,测试模块不影响,修改pom.xml注释掉即可;

再编译:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
root@895be557c3cd:/home/org.ops4j.pax.logging-logging-1.11.9# mvn clean install -Dmaven.test.skip=true
...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] OPS4J Pax Logging (Build POM) 1.11.9 ............... SUCCESS [ 0.275 s]
[INFO] OPS4J Pax Logging - API (Java9) .................... SUCCESS [ 1.873 s]
[INFO] OPS4J Pax Logging - API ............................ SUCCESS [ 4.672 s]
[INFO] OPS4J Pax Logging - Log4Jv1 implementation ......... SUCCESS [ 39.435 s]
[INFO] OPS4J Pax Logging - Log4Jv2 implementation ......... SUCCESS [ 8.208 s]
[INFO] OPS4J Pax Logging - Log4j v2 Extra packages ........ SUCCESS [ 0.131 s]
[INFO] OPS4J Pax Logging - Logback implementation ......... SUCCESS [ 15.241 s]
[INFO] OPS4J Pax Logging - Integration Tests .............. SUCCESS [01:48 min]
[INFO] OPS4J Pax Logging - Karaf .......................... SUCCESS [ 0.014 s]
[INFO] OPS4J Pax Logging - Karaf KAR Logger ............... SUCCESS [ 0.148 s]
[INFO] OPS4J Pax Logging - Karaf KAR ...................... SUCCESS [ 28.098 s]
[INFO] OPS4J Pax Logging - Karaf Distribution ............. SUCCESS [ 2.002 s]
[INFO] OPS4J Pax Logging - Karaf Integration Tests 1.11.9 . SUCCESS [ 36.557 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 04:07 min
[INFO] Finished at: 2021-12-21T09:45:18Z
[INFO] ------------------------------------------------------------------------

不容易,终于成功了!开始修改pod.xml,把依赖的log4j包升级上去:

1
2
- <version.org.apache.logging.log4j>2.14.0</version.org.apache.logging.log4j>
+ <version.org.apache.logging.log4j>2.17.0</version.org.apache.logging.log4j>

编译:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
root@895be557c3cd:/home/org.ops4j.pax.logging-logging-1.11.9# mvn clean install -Dmaven.test.skip=true
[INFO] Scanning for projects...
[INFO] Inspecting build with total of 13 modules...
[INFO] Installing Nexus Staging features:
[INFO] ... total of 13 executions of maven-deploy-plugin replaced with nexus-staging-maven-plugin
[INFO] ------------------------------------------------------------------------
...
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /home/org.ops4j.pax.logging-logging-1.11.9/pax-logging-log4j2/src/main/java/org/apache/logging/log4j/core/impl/ThrowableProxy.java:[85,54] cannot find symbol
symbol: variable EMPTY_THROWABLE_PROXY_ARRAY
location: class org.apache.logging.log4j.core.impl.ThrowableProxyHelper
[INFO] 1 error
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] OPS4J Pax Logging (Build POM) 1.11.9 ............... SUCCESS [ 0.302 s]
[INFO] OPS4J Pax Logging - API (Java9) .................... SUCCESS [ 1.836 s]
[INFO] OPS4J Pax Logging - API ............................ SUCCESS [ 10.592 s]
[INFO] OPS4J Pax Logging - Log4Jv1 implementation ......... SUCCESS [ 2.017 s]
[INFO] OPS4J Pax Logging - Log4Jv2 implementation ......... FAILURE [ 4.279 s]

又报错了,找到官方升级过log4j的版本,发现ThrowableProxy方法的构造方法里suppressedProxies字段的赋值有变化,老版本是ThrowableProxyHelper.EMPTY_THROWABLE_PROXY_ARRAY,而新版本变成了ThrowableProxy.EMPTY_ARRAY。问题不大,改一下就行;

1
2
3
4
5
6
7
8
9
10
11
12
src/main/java/org/apache/logging/log4j/core/impl/ThrowableProxy.java
public class ThrowableProxy implements Serializable {
static final ThrowableProxy[] EMPTY_ARRAY = {};
ThrowableProxy() {
this.throwable = null;
this.name = null;
this.extendedStackTrace = ExtendedStackTraceElement.EMPTY_ARRAY;
this.causeProxy = null;
this.message = null;
this.localizedMessage = null;
this.suppressedProxies = ThrowableProxy.EMPTY_ARRAY;
}

再次编译:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
root@895be557c3cd:/home/org.ops4j.pax.logging-logging-1.11.9# mvn clean install -Dmaven.test.skip=true
[INFO] Scanning for projects...
[INFO] Inspecting build with total of 13 modules...
[INFO] Installing Nexus Staging features:
[INFO] ... total of 13 executions of maven-deploy-plugin replaced with nexus-staging-maven-plugin
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] OPS4J Pax Logging (Build POM) 1.11.9 ............... SUCCESS [ 0.290 s]
[INFO] OPS4J Pax Logging - API (Java9) .................... SUCCESS [ 1.819 s]
[INFO] OPS4J Pax Logging - API ............................ SUCCESS [ 4.717 s]
[INFO] OPS4J Pax Logging - Log4Jv1 implementation ......... SUCCESS [ 2.057 s]
[INFO] OPS4J Pax Logging - Log4Jv2 implementation ......... SUCCESS [ 2.654 s]
[INFO] OPS4J Pax Logging - Log4j v2 Extra packages ........ SUCCESS [ 0.144 s]
[INFO] OPS4J Pax Logging - Logback implementation ......... SUCCESS [ 0.908 s]
[INFO] OPS4J Pax Logging - Integration Tests .............. SUCCESS [ 4.402 s]
[INFO] OPS4J Pax Logging - Karaf .......................... SUCCESS [ 0.012 s]
[INFO] OPS4J Pax Logging - Karaf KAR Logger ............... SUCCESS [ 0.138 s]
[INFO] OPS4J Pax Logging - Karaf KAR ...................... SUCCESS [ 1.588 s]
[INFO] OPS4J Pax Logging - Karaf Distribution ............. SUCCESS [ 2.114 s]
[INFO] OPS4J Pax Logging - Karaf Integration Tests 1.11.9 . SUCCESS [ 0.163 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 23.440 s
[INFO] Finished at: 2021-12-21T09:51:37Z
[INFO] ------------------------------------------------------------------------

升级log4j的版本编译成功。

解决方案

把新编译的pax-logging-apipax-logging-log4j替换到依赖仓库中,重新编译交付件,验证漏洞解决,日志功能正常。

参考资料

  1. https://github.com/ops4j/org.ops4j.pax.logging
  2. https://github.com/docker-library/openjdk/issues/101
  3. https://blog.csdn.net/yiqiu3812/article/details/103298980

本次发现的问题是踩到了老版本Kubernetes的坑,查找资料发现fatedier大佬已经做了很棒的分析,此处转载过来仅做学习记录。

作者:fatedier
本文出处:https://blog.fatedier.com/2020/04/17/pod-loopcrash-of-k8s-subpath/
文章版权归本人所有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文链接,否则保留追究法律责任的权利。

问题背景

Kubernetes对于挂载了 subpath 的容器,在 configmap 或其他 volume 变更后,如果容器因为意外退出后,就会持续crash,无法正常启动。

社区相关 issue #68211,问题已经在v1.19版本解决。

分析过程

复现步骤

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
---
apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
volumes:
- configMap:
name: extra-cfg
name: extra-cfg
containers:
- name: test
image: ubuntu:bionic
command: ["sleep", "30"]
resources:
requests:
cpu: 100m
volumeMounts:
- name: extra-cfg
mountPath: /etc/extra.ini
subPath: extra.ini
---
apiVersion: v1
data:
extra.ini: |
somedata
kind: ConfigMap
metadata:
name: extra-cfg

Apply 此配置,Pod 启动完成后,修改 configmap 的内容,等待 30 秒后容器自动退出,kubelet 重启容器,此时观察到容器持续 mount 失败。

1
Error: failed to start container "test": Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:424: container init caused \"rootfs_linux.go:58: mounting \\\"/var/lib/kubelet/pods/e044883a-48da-4d28-b304-1a57dcb32203/volume-subpaths/extra-cfg/test/0\\\" to rootfs \\\"/var/lib/docker/overlay2/31b076d0012aad47aa938b482de24ecda8b41505489a22f63b8a3e4ce39b43ba/merged\\\" at \\\"/var/lib/docker/overlay2/31b076d0012aad47aa938b482de24ecda8b41505489a22f63b8a3e4ce39b43ba/merged/etc/extra.ini\\\" caused \\\"no such file or directory\\\"\"": unknown

原因分析

Configmap Volume 的更新

当容器第一次启动前,kubelet 先将 configmap 中的内容下载到 Pod 对应的 Volume 目录下,例如 /var/lib/kubelet/pods/{Pod UID}/volumes/kubernetes.io~configmap/extra-cfg

同时为了保证对此 volume 下内容的更新是原子的(更新目录时),所以会通过软链接的方式进行更新,目录中文件如下。

1
2
3
4
5
drwxrwxrwx 3 root root 4.0K Mar 29 03:12 .
drwxr-xr-x 3 root root 4.0K Mar 29 03:12 ..
drwxr-xr-x 2 root root 4.0K Mar 29 03:12 ..2020_03_29_03_12_44.788930127
lrwxrwxrwx 1 root root 31 Mar 29 03:12 ..data -> ..2020_03_29_03_12_44.788930127
lrwxrwxrwx 1 root root 16 Mar 29 03:12 extra.ini -> ..data/extra.ini

extra.ini..data/extra.ini 的软链,..data..2020_03_29_03_12_44.788930127 的软链,命名为时间戳的目录存放真实内容。

configmap 更新后,会生成新的时间戳的目录存放更新后的内容。

创建新的软链 ..data_tmp 指向新的时间戳目录,之后重命名为 ..data,重命名是一个原子操作。

最后删除旧的时间戳目录。

容器挂载 subpath Volume 的准备

configmap Volume 准备完成后,kubelet 会将 configmapsubpath 指定的文件 bind mount 到一个特殊的目录下: /var/lib/kubelet/pods/{Pod UID}/volume-subpaths/extra-cfg/{container name}/0

1
2
cat /proc/self/mountinfo|grep extra
2644 219 8:1 /var/lib/kubelet/pods/{Pod UID}/volumes/kubernetes.io~configmap/extra-cfg/..2020_03_29_03_12_13.444136014/extra.ini /var/lib/kubelet/pods/{Pod UID}/volume-subpaths/extra-cfg/test/0 rw,relatime shared:99 - ext4 /dev/sda1 rw,data=ordered

可以看出,bind mount 的文件其实是真实文件的时间戳目录下的内容。

Configmap 更新后,此时间戳目录会被删除,源文件加上了 //deleted

1
2
cat /proc/self/mountinfo|grep extra
2644 219 8:1 /var/lib/kubelet/pods/{Pod UID}/volumes/kubernetes.io~configmap/extra-cfg/..2020_03_29_03_12_13.444136014/extra.ini//deleted /var/lib/kubelet/pods/{Pod UID}/volume-subpaths/extra-cfg/test/0 rw,relatime shared:99 - ext4 /dev/sda1 rw,data=ordered

Bind Mount

当容器启动时,需要将 /var/lib/kubelet/pods/{Pod UID}/volume-subpaths/extra-cfg/test/0 挂载到容器中。

如果原来的时间戳目录被删除,则 mount 会出错: mount: mount(2) failed: No such file or directory

通过简单的命令模拟这个问题:

1
2
3
4
5
# touch a b c
# mount --bind a b
# rm -f a
# mount --bind b c
mount: mount(2) failed: No such file or directory

可以看到,当 a 删除后,b 挂载点无法再被 mount。所以,当容器异常退出需要重启后,如果 configmap 被更新,原先的时间戳文件被删除,这个 subpath 就无法再被 mount 到容器中。

解决方案

Configmap 变更后 Unmount

社区相关 PR: https://github.com/kubernetes/kubernetes/pull/82784

在容器重启前,检查 subpath 挂载点的源文件和新的目标 subpath 文件是否一致。

configmap 被更新后,时间戳目录变更,则检查到不一致。将 /var/lib/kubelet/pods/{Pod UID}/volume-subpaths/extra-cfg/test/0 Unmount,再重新 Bind Mount 当前最新的时间戳目录下的对应文件。

根据社区 PR 中的 comments 来看,此方案可能存在一定风险,尚不明确(有人指出在 4.18 以下内核是不安全的 链接),所以很长时间都没有进展。

通过一段时间的测试,尚未发现明显的问题。

不使用 subpath

使用其他方式绕过这个问题。

例如可以将 Configmap 整个 Mount 到容器的其他目录下,再在容器启动时通过软链的方式链接到对应的路径下。

为什么使用间接 Bind Mount 而不是直接 Mount 软链接

参考 https://kubernetes.io/blog/2018/04/04/fixing-subpath-volume-vulnerability/ 这篇文章。

可以看出原先使用的就是直接 Mount 软链接的方式,但是存在安全漏洞,symlink race 。恶意程序可以通过构造一个软链接,使特权程序(kubelet) 将超出权限范围之外的文件内容挂载到用户容器中。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
initContainers:
- name: prep-symlink
image: "busybox"
command: ["bin/sh", "-ec", "ln -s / /mnt/data/symlink-door"]
volumeMounts:
- name: my-volume
mountPath: /mnt/data
containers:
- name: my-container
image: "busybox"
command: ["/bin/sh", "-ec", "ls /mnt/data; sleep 999999"]
volumeMounts:
- mountPath: /mnt/data
name: my-volume
subPath: symlink-door
volumes:
- name: my-volume
emptyDir: {}

使用如上的配置,通过 emptyDir,在 initContainer 中在挂载的 Volume 目录中创建了一个指向根目录的软链接。

之后正常的容器启动,但是指定了 subpath,如果 kubelet 直接 Mount 软链接,会将宿主机的根目录 Mount 到用户容器中。

为了解决这个问题,需要解析出软链接对应的真实文件路径,并且判断此路径是否是在 Volume 目录下,校验通过后才能挂载到容器中。但是由于校验和挂载之间存在时间差,此文件还是有可能会被篡改。

社区讨论后,通过引入中间 Bind Mount 的机制,相当于给这个文件加锁,将原文件的路径固化,之后再 Mount 到容器中时,只会 Mount 当时创建挂载点时的源文件。

更新

给社区提交的修复 PR 已经被合入 89629

功能总结

导出数据包

方法:File | Export Packet Dissections | as”CSV”(Comma Separated Values packet summary)file…

  1. 导出格式有纯文本、CSV、XML、JSON等;
  2. 不仅可以输出选中列,还可以输出当前页面展示的列,以及所有列(在导出弹框中设置);
  3. 可以设置包格式,比如导出统计行、统计头、包详细数据等;

添加展示列

方法:Package Details 面板中展开包详情,找到指定字段右键单击并选择Apply as Column 选项

显示一个TCP/UDP 会话

方法1:选中一个包,右键选择Conversation Filter|[TCPIUDP]命令

方法2:选中一个包,右键选择Follow[TCPIUDP] Stream 命令

方法3:工具栏选择Statistics|Conversations命令

方法4:在TCP 头部,通过右键单击stream index 字段并选择Apply as Filter 命令

命令总结

捕获过滤命令

捕获过滤器仅支持协议过滤。

1. 主机相关过滤命令*

  • host 10.3.1.1: 捕获到达/来自10.3.1.1主机的数据(支持IPv6地址)。
  • not host 10.3.1.1: 捕获除了到达/来自10.3.1.1主机的所有数据。
  • src host 10.3.1.1: 捕获来自10.3.1.1 主机上的数据。
  • dst host 10.3.1.1: 捕获到达10.3.1.1 主机上的数据。
  • host 10.3.1.1 or host 10.3.1.2: 捕获到达/来自10.3.1.1主机上的数据,和到达/来自10.3.1.2 主机的数据。

2. 端口相关过滤命令*

  • port 53: 捕获到达/来自端口号为53的UDP/TCP 数据(典型的DNS 数据)。
  • not port 53 : 捕获除到达/来自端口号为53的所有UDP/TCP 数据。
  • port 80: 捕获到达/来自端口号为80的UDP/TCP 数据(典型的HTTP 数据)。
  • udp port 67 : 捕获到达/来自端口号为67的UDP 数据(典型的DHCP 数据)。
  • tcp port 21: 捕获到达/来自端口号为21的TCP 数据(典型的FTP 数据)。
  • portrange 1-80: 捕获到达/来自1~80端口号的UDP/TCP 数据。
  • tcp portrange 1-80: 捕获到达/来自1~80端口号的TCP 数据。

3. 主机和端口混合过滤命令*

  • port 20 or port 21 :捕获到达/来自20 或21 端口号的所有UDP/TCP 数据。
  • host 10.3.1.1 and port 80: 捕获到达/来自端口号为80, 并且是到达/来自10.3.1.1主机的UDP/TCP 数据。
  • host 10.3.1.1 and not port 80: 捕获到I来自10.3.1.1 主机,并且是非80 端口的UDP/TCP 数据。
  • udp src port 68 and udp dst port 67: 捕获来自端口为68, 目标端口号为67 的所有UDP 数据(典型的DHCP 客户端到DHCP 服务器的数据) 。
  • udp src port 67 and udp dst port 68: 捕获来自端口号为67, 目标端口号为68 的所有UDP 数据(典型的DHCP 服务器到DHCP 客户端的数据)。

4. IP地址范围过滤命令

  • net 192.168.0.0/24:捕获到达/来自192.168.0.0网络中任何主机的数据。
  • net 192.168.0.0 mask 255.255.255.0: 捕获到达/来自192.168.0.0网络中任何主机的
    数据。
  • ip6 net 2406:daOO:ff00::/64: 捕获到达/来自2406:daOO:ffDO:OOOO ( IPv6) 网络中任
    何主机的数据。
  • not dst net 192.168.0.0/24: 捕获除目的IP地址是192.168.0.0网络外的所有数据。
  • dst net 192.168.0.0/24:捕获到达IP地址为192.168.0.0网络内的所有数据。
  • src net 192.168.0.0/24: 捕获来自IP地址为192.168.0.0网络内的所有数据。

5. 广播或多播地址过滤命令

  • ip broadcast: 捕获到255.255.255.255 的数据。
  • ip multicast: 捕获通过224.0.0.0~239.255.255.255的数据。
  • dst host ff02::1: 捕获所有主机到IPv6多播地址的数据。
  • dst host ff02::2: 捕获所有路由到IPv6多播地址的数据。(跟上一个有什么区别?

6. MAC地址过滤命令

  • ether host 00:08:15:00:08:15: 捕获到达/来自00:08:15:00:08:15主机的数据。
  • ether src 02:0A:42:23:41:AC: 捕获来自02:0A:42:23:41:AC 主机的数据。
  • ether dst 02:0A:42:23:41:AC: 捕获到达02:0A:42:23:41:AC 主机的数据。
  • not ether host 00:08:15:00:08:15:捕获到达/来自除了00:08:15:00:08:15的任何MAC
    地址的流量。

7. 特定ICMP协议过滤命令

  • icmp:捕获所有ICMP 数据包。
  • icmp[0]=8 : 捕获所有ICMP 字段类型为8 (Echo Request) 的数据包。
  • icmp[0]=17: 捕获所有ICMP 字段类型为17 (Address Mask Request) 的数据包。
  • icmp[0]=8 or icmp[0]=0: 捕获所有ICMP 字段类型为8 (Echo Request) 或ICMP
    字段类型为0 (Echo Reply) 的数据包。
  • icmp[0]=3 and not icmp[1]=4 :捕获所有ICMP 字段类型为3 (Destination
    Unreachable) 的包,除了ICMP 字段类型为3/代码为4 (Fragmentation Needed and
    Don’t Fragment was Set) 的数据包。

显示过滤命令

显示过滤器可以帮助用户在捕捉结果中进行数据查找。该过滤器可以在得到的捕捉结果中修改,以显示有用数据。

既支持协议过滤也支持内容过滤。

1. 通用语法格式

1
2
Protocol | [String1] [String2] | Comparison-Operator | Value | Logical-Operations | Other-expression
协议(2~7层) 协议子类 比较运算符 比较值 逻辑运算符 其他表达式

其中比较运算符有如下6个:

英文写法 C 语言写法 含义
eq == 等于
ne != 不等于
gt > 大于
lt < 小于
ge >= 大于等于
le <= 小于等于
contains - 包含
matches - 匹配

逻辑运算符有如下4个:

英文写法 C 语言写法 含义
and && 逻辑与
or || 逻辑或
xor ^^ 逻辑异或
not ! 逻辑非

2. 协议过滤命令*

  • arp: 显示所有ARP 流量,包括免费ARP 、ARP 请求和ARP 应答。
  • ip(v6): 显示所有IPv4/IPv6 流量,包括有IPv4(IPv6) 头部嵌入式的包(如ICMP 目标不可达的数据包,返回到ICMP 头后进入到IPv4 头部)。
    • ip(v6).src
    • ip(v6).dst
    • ip(v6).host
    • ip(v6).addr
  • tcp: 显示所有基于TCP 的流量数据。

3. 应用过滤命令*

  • bootp: 显示所有DHCP 流量(ipv4下基于BOOTP,ipv6下不是基于BOOTP,过滤时使用dhcpv6) 。
  • dns: 显示所有DNS 流量,包括基于TCP 传输和UDP 的DNS 请求和响应。
  • tftp: 显示所有TFTP (简单文件传输协议)流量。
  • http: 显示所有HTTP 命令、响应和数据传输包。但是不显示TCP 握手包、TCP确认包或TCP 断开连接的包。
  • http contains “GET”: 显示HTTP 客户端发送给HTTP 服务器的所有GET 请求数据。
  • icmp: 显示所有ICMP 流量。

4. 字段存在过滤命令

  • bootp.option.hostname: 显示所有DHCP 流量,包含主机名( DHCP 是基于BOOTP) 。
  • http.host: 显示所有包含有HTTP 主机名字段的HTTP 包。该包通常是由客户端发送给一个Web 服务器的请求。
  • ftp.request.command: 显示所有FTP 命令数据,如USER 、PASS 或RETR 命令。
  • ftp.request.arg matches “admin”: 显示匹配admin 字符串的数据。
  • tcp.analysis.flags: 显示所有与TCP 标识相关的包,包括丢包、重发或者零窗口标志。
  • tcp.analysis.zero_window: 显示被标志的包,来表示发送方的缓冲空间已满。

5. 逻辑运算过滤命令

  • &&或and: ip.src=l0.2.2.2 && tcp.port=80,表示显示源地址10.2.2.2 主机,并且端口号为80 的所有IPv4 流量。
  • ||或or: tcp.port=80 || tcp.port=43,表示显示到达/来自80 或443 端口的所有TCP数据。
  • !或not: !arp,表示查看除ARP 外的所有数据。
  • !=或ne: tcp.flags.syn != 1,表示查看TCP SYN 标志位不是1 的TCP 数据帧。

注:

ip.addr != 10.2.2.2 表示显示IP 源或目标地址字段非10.2.2.2 的数据包。如果一个包的源或目标IP 地址字段中不包含10.2.2.2, 则显示该数据包。在该语法中使用了一个隐含或,并且不会过滤掉任何数据包。

!ip.addr == 10.2.2.2 表示显示在IP 源和目标地址字段不包含10.2.2.2 的数据包。当排除到达/来自一个特定IP 地址的数据时,这是一个合适的过滤器语法。

!tcp.flags.syn==l 表示显示TCP SYN 标志位不等于1的所有TCP 包和其他协议包,如UDP 、ARP数据包将匹配该过滤器。因为UDP 和ARP 协议中没有TCP SYN 标志位为1 的数据包。

tcp.flags.syn != 1 表示仅显示包括SYN 设置为0 的TCP 包。

6. 时间过滤命令

  • frame.time_delta > 1,表示时间延迟超过1 秒的数据,显示捕获文件中所有包的时间。
  • tcp.time_delta > 1,表示TCP 时间差大于1 秒的数据。

注:上述命令主要用于判断各种网络延迟。