Normal Scheduled 41m default-scheduler Successfully assigned xx/xxx-v3falue7-6f59dd5766-npd2x to node1 Warning FailedCreatePodSandBox 26m (x301 over 41m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "xxx-v3falue7-6f59dd5766-npd2x": Error response from daemon: start failed: : pipe2: too many open files: unknown Normal SandboxChanged 66s (x808 over 41m) kubelet Pod sandbox changed, it will be killed and re-created.
因为使用的是docker作为CRI,所以先查看docker日志:
1 2 3 4 5 6 7
time="2023-11-13T14:56:05.734166795+08:00" level=info msg="/etc/resolv.conf does not exist" time="2023-11-13T14:56:05.734193544+08:00" level=info msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers: [nameserver 8.8.8.8 nameserver 8.8.4.4]" time="2023-11-13T14:56:05.734202079+08:00" level=info msg="IPv6 enabled; Adding default IPv6 external servers: [nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844]" time="2023-11-13T14:56:05.740830618+08:00" level=error msg="stream copy error: reading from a closed fifo" time="2023-11-13T14:56:05.740850537+08:00" level=error msg="stream copy error: reading from a closed fifo" time="2023-11-13T14:56:05.751993232+08:00" level=error msg="1622cfb1c90d926b867db7bcb0a86498ccad59db81223e861ac515ec75ed7c27 cleanup: failed to delete container from containerd: no such container" time="2023-11-13T14:56:05.752024358+08:00" level=error msg="Handler for POST /v1.41/containers/1622cfb1c90d926b867db7bcb0a86498ccad59db81223e861ac515ec75ed7c27/start returned error: start failed: : fork/exec /usr/bin/containerd-shim-runc-v2: too many open files: unknown"
从docker日志看,错误原因是:fork/exec /usr/bin/containerd-shim-runc-v2: too many open files: unknown,基本确认是**containerd的文件句柄打开数量过多**。
[root@node1 ~]# systemctl status containerd.service ● containerd.service - containerd container runtime Loaded: loaded (/usr/lib/systemd/system/containerd.service; disabled; vendor preset: disabled) Active: active (running) since Sat 2023-11-01 11:02:14 CST; 1 weeks 10 days ago Docs: https://containerd.io Main PID: 1999 (containerd) Tasks: 1622 Memory: 3.5G CGroup: /system.slice/containerd.service ├─ 999 /usr/bin/containerd cat /proc/999/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes 319973 319973 processes Max open files 1024 524288 files
[Service] ExecStartPre=-/sbin/modprobe overlay ExecStart=/usr/bin/containerd KillMode=process Delegate=yes LimitNOFILE=1048576 # Having non-zero Limit*s causes performance problems due to accounting overhead # in the kernel. We recommend using cgroups to do container-local accounting. LimitNPROC=infinity LimitCORE=infinity TasksMax=infinity