0%

总结分享-KubeSphere 3.4.0版本离线部署问题记录

根据kubesphere官方资料[1],记录搭建离线部署环境出现的几个问题。

问题1:联网主机制作离线安装包失败

制作过程中出现部分镜像拉取超时,可能是网络问题,多次重试即可。

1
[root@node kubesphere]# ./kk artifact export -m manifest-sample.yaml -o kubesphere.tar.gz

问题2:安装harbor阶段失败

安装harbor阶段出现unable to sign certificate: must specify a CommonName错误:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[root@node1 kubesphere]# ./kk init registry -f config-sample.yaml -a kubesphere.tar.gz
19:37:46 CST [GreetingsModule] Greetings
19:37:47 CST message: [master]
Greetings, KubeKey!
19:37:47 CST success: [master]
19:37:47 CST [UnArchiveArtifactModule] Check the KubeKey artifact md5 value
19:37:47 CST success: [LocalHost]
...
19:48:16 CST success: [master]
19:48:16 CST [ConfigureOSModule] configure the ntp server for each node
19:48:17 CST skipped: [master]
19:48:17 CST [InitRegistryModule] Fetch registry certs
19:48:18 CST success: [master]
19:48:18 CST [InitRegistryModule] Generate registry Certs
[certs] Using existing ca certificate authority
19:48:18 CST message: [LocalHost]
unable to sign certificate: must specify a CommonName
19:48:18 CST failed: [LocalHost]
error: Pipeline[InitRegistryPipeline] execute failed: Module[InitRegistryModule] exec failed:
failed: [LocalHost] [GenerateRegistryCerts] exec failed after 1 retries: unable to sign certificate: must specify a CommonName

参考资料[2],修改配置registry相关配置:

1
2
3
4
5
6
7
8
9
10
11
registry:
type: harbor
auths:
"dockerhub.kubekey.local":
username: admin
password: Harbor12345
certsPath: "/etc/docker/certs.d/dockerhub.kubekey.local"
privateRegistry: "dockerhub.kubekey.local"
namespaceOverride: "kubesphereio"
registryMirrors: []
insecureRegistries: []

问题3:创建集群阶段

提示下载kubernetes的二进制文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@node1 kubesphere]# ./kk create cluster -f config-sample.yaml -a kubesphere.tar.gz
23:29:32 CST [NodeBinariesModule] Download installation binaries
23:29:32 CST message: [localhost]
downloading amd64 kubeadm v1.22.12 ...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (6) Could not resolve host: storage.googleapis.com; 未知的错误
23:29:32 CST [WARN] Having a problem with accessing https://storage.googleapis.com? You can try again after setting environment 'export KKZONE=cn'
23:29:32 CST message: [LocalHost]
Failed to download kubeadm binary: curl -L -o /home/k8s/kubesphere/kubekey/kube/v1.22.12/amd64/kubeadm https://storage.googleapis.com/kubernetes-release/release/v1.22.12/bin/linux/amd64/kubeadm error: exit status 6
23:29:32 CST failed: [LocalHost]
error: Pipeline[CreateClusterPipeline] execute failed: Module[NodeBinariesModule] exec failed:
failed: [LocalHost] [DownloadBinaries] exec failed after 1 retries: Failed to download kubeadm binary: curl -L -o /home/k8s/kubesphere/kubekey/kube/v1.22.12/amd64/kubeadm https://storage.googleapis.com/kubernetes-release/release/v1.22.12/bin/linux/amd64/kubeadm error: exit status 6

这个错误是因为config-sample.yaml不是通过命令生成的,所以kubernetes的版本不对,查看命令的帮助信息,发现kubesphere的默认版本是v3.4.1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[root@node1 kubesphere]# ./kk create cluster -f config-sample.yaml -a kubesphere.tar.gz -h
Create a Kubernetes or KubeSphere cluster

Usage:
kk create cluster [flags]

Flags:
-a, --artifact string Path to a KubeKey artifact
--container-manager string Container runtime: docker, crio, containerd and isula. (default "docker")
--debug Print detailed information
--download-cmd string The user defined command to download the necessary binary files. The first param '%s' is output path, the second param '%s', is the URL (default "curl -L -o %s %s")
-f, --filename string Path to a configuration file
-h, --help help for cluster
--ignore-err Ignore the error message, remove the host which reported error and force to continue
--namespace string KubeKey namespace to use (default "kubekey-system")
--skip-pull-images Skip pre pull images
--skip-push-images Skip pre push images
--with-kubernetes string Specify a supported version of kubernetes
--with-kubesphere Deploy a specific version of kubesphere (default v3.4.1)
--with-local-storage Deploy a local PV provisioner
--with-packages install operation system packages by artifact
--with-security-enhancement Security enhancement
-y, --yes

修改命令,指定kubesphere版本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[root@node1 kubesphere]# ./kk create cluster -f config-sample.yaml -a kubesphere.tar.gz --with-kubesphere 3.4.0
W1205 00:36:57.266052 1453 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.96.0.10]; the provided value is: [169.254.25.10]
[init] Using Kubernetes version: v1.23.15
[preflight] Running pre-flight checks
[WARNING FileExisting-socat]: socat not found in system path
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 24.0.6. Latest validated version: 20.10
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileExisting-conntrack]: conntrack not found in system path
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
00:36:58 CST stdout: [master]
[preflight] Running pre-flight checks
W1205 00:36:58.323079 1534 removeetcdmember.go:80] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
W1205 00:36:58.327376 1534 cleanupnode.go:109] [reset] Failed to evaluate the "/var/lib/kubelet" directory. Skipping its unmount and cleanup: lstat /var/lib/kubelet: no such file or directory
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/dockershim /var/run/kubernetes /var/lib/cni]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
00:36:58 CST message: [master]
init kubernetes cluster failed: Failed to exec command: sudo -E /bin/bash -c "/usr/local/bin/kubeadm init --config=/etc/kubernetes/kubeadm-config.yaml --ignore-preflight-errors=FileExisting-crictl,ImagePull"

问题4:部分Pod卡在启动阶段

1
2
3
4
kubesphere-system              ks-apiserver-86757d49bb-m9pp4          ContainerCreating
kubesphere-system ks-console-cbdb4558c-7z6lg Running
kubesphere-system ks-controller-manager-64b5dcb7d-9mrsw ContainerCreating
kubesphere-system ks-installer-ff66855c9-d8x4k Running

通过资料[3]可知,可以使用命令kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-install -o jsonpath='{.items[0].metadata.name}') -f查询进度,上面的异常Pod需要等所有组件安装后才起来:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#####################################################
### Welcome to KubeSphere! ###
#####################################################

Console: http://10.10.10.30:30880
Account: admin
Password: P@88w0rd
NOTES:
1. After you log into the console, please check the
monitoring status of service components in
"Cluster Management". If any service is not
ready, please wait patiently until all components
are up and running.
2. Please change the default password after login.

#####################################################
https://kubesphere.io 2023-12-05 01:24:00
#####################################################
01:24:04 CST success: [master]
01:24:04 CST Pipeline[CreateClusterPipeline] execute successfully
Installation is complete.

Please check the result using the command:

kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l 'app in (ks-install, ks-installer)' -o jsonpath='{.items[0].metadata.name}') -f

问题5:metrics-server启动失败

通过查看相关日志可知,如果把harbor仓库安装在master节点,端口会冲突:

1
2
3
4
5
6
7
8
9
10
11
12
[root@master ~]# kubectl logs -f -n kube-system metrics-server-6d987cb45c-4swvd
panic: failed to create listener: failed to listen on 0.0.0.0:4443: listen tcp 0.0.0.0:4443: bind: address already in use

goroutine 1 [running]:
main.main()
/go/src/sigs.k8s.io/metrics-server/cmd/metrics-server/metrics-server.go:39 +0xfc
[root@master ~]# netstat -anp|grep 4443
tcp 0 0 0.0.0.0:4443 0.0.0.0:* LISTEN 22372/docker-proxy
tcp6 0 0 :::4443 :::* LISTEN 22378/docker-proxy

[root@master ~]# docker ps |grep harbor|grep 4443
1733e9580af5 goharbor/nginx-photon:v2.5.3 "nginx -g 'daemon of…" 4 hours ago Up 4 hours (healthy) 0.0.0.0:4443->4443/tcp, :::4443->4443/tcp, 0.0.0.0:80->8080/tcp, :::80->8080/tcp, 0.0.0.0:443->8443/tcp, :::443->8443/tcp nginx

修改端口后恢复。

问题6:部分Pod拉取镜像失败

1
2
3
4
kubesphere-logging-system      opensearch-cluster-data-0            init:ImagePullBackOff
kubesphere-logging-system opensearch-cluster-master-0 init:ImagePullBackOff
istio-system istio-cni-node-vlzt7 ImagePullBackOff
kubesphere-controls-system kubesphere-router-test-55b5fcc887-xlzsh ImagePullBackOff

查看发现,init失败,是因为用了busybox镜像,但离线包没有提前下载:

1
2
3
4
5
6
7
8
initContainers:
- args:
- chown -R 1000:1000 /usr/share/opensearch/data
command:
- sh
- -c
image: busybox:latest
imagePullPolicy: Always

后面两个镜像拉取失败问题,同样是因为离线包没有提前下载:

1
Normal   BackOff    21s (x51 over 15m)  kubelet            Back-off pulling image "dockerhub.kubekey.local/kubesphereio/install-cni:1.14.6"

手动下载导入到离线环境后,异常Pod恢复。

参考资料

1.https://kubesphere.io/zh/docs/v3.3/installing-on-linux/introduction/air-gapped-installation/

2.https://github.com/kubesphere/kubekey/issues/1762#issuecomment-1681625989

3.https://github.com/kubesphere/ks-installer/issues/907