Device: Nvidia Jetson AGX Orin 64GB
OS: Ubuntu 20.04
Jetpack Version: 5.1.2-b104
Issue Description:
1. Error Running K3S on Nvidia Jetson AGX Orin
Following the official instructions from Rancher, after installing K3S on Orin, I checked the system component pods. Some pods show as “not Ready” and some critical pods are missing when compared to an x86 setup:
# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-7f9dc8d998-rcw8z 0/1 Running 0 39h
kube-system helm-install-traefik-crd-d8nzr 1/1 Running 316 (5m11s ago) 39h
kube-system helm-install-traefik-ntgks 1/1 Running 316 (5m13s ago) 39h
kube-system local-path-provisioner-85674b7ddf-mfjxp 0/1 CrashLoopBackOff 428 (4m3s ago) 39h
kube-system metrics-server-68f955568b-55kg4 0/1 CrashLoopBackOff 428 (9s ago) 39h
On an x86 server (Ubuntu 22.04 OS), following the same installation steps, the pods are running fine.
# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default simple-pod 1/1 Running 0 123m
kube-system coredns-54fd9cb578-26982 1/1 Running 0 62m
kube-system helm-install-traefik-crd-wv2gn 1/1 Completed 0 3h19m
kube-system helm-install-traefik-m69k7 0/1 Completed 1 3h19m
kube-system local-path-provisioner-85674b7ddf-x7gdk 1/1 Running 0 3h19m
kube-system metrics-server-68f955568b-pm58m 2/2 Running 0 3h18m
kube-system svclb-traefik-d14f6f79-gzn2r 1/1 Running 0 3h18m
kube-system traefik-cb458865d-2szr4 1/1 Running 0 3h18m
Ignoring the above error and attempting to run my application on ORIN, I found that pods can communicate via IP, but cannot communicate via its exposed Service name. The error message is “Domain name cannot be resolved,” which I suspect is due to the CoreDNS ‘not ready’ issue.
The logs from the CoreDNS pod on Orin are as follows, indicating that it cannot connect to the apiserver (the service address is 10.43.0.1:443):
[ERROR] plugin/kubernetes: Unhandled Error
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.31.2/tools/cache/reflector.go:243: failed to list *v1.EndpointSlice: Get "https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: i/o timeout
[ERROR] plugin/kubernetes: Unhandled Error
[INFO] plugin/ready: Still waiting on: "kubernetes"
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.override
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
[INFO] plugin/ready: Still waiting on: "kubernetes"
The logs from the metrics-server pod show the same issue, unable to connect to 10.43.0.1:443.
# kubectl logs -f metrics-server-5985cb9d7-kxtkg -n kube-system
Error: unable to load configmap based request-header-client-ca-file: Get "https://10.43.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication": dial tcp 10.43.0.1:443: i/o timeout
I added a route for the 10.43.0.0/16 clusterIP network segment (ip route add 10.43.0.0/16 dev flannel.1
), checked the firewall and found no blocking policies, but the issue remains.
Running curl -k https://10.43.0.1:443/healthz
, the command hangs and then shows:
curl: (28) Failed to connect to 10.43.0.1 port 443: Connection timed out
Using k3s check-config
to checks the modules on Orin and reports missing network-related kernel modules (especially the CONFIG_IP_SET
module), which indicates the need to recompile the kernel to complete these modules.
# k3s check-config
cat: /sys/kernel/security/apparmor/profiles: No such file or directory
Verifying binaries in /var/lib/rancher/k3s/data/7ddc159b1f5dc21c2fb0faf67a15be16624cfe19fb4dc756ec057f276b092b3a/bin:
- sha256sum: good
- links: good
System:
- /usr/sbin iptables v1.8.4 (legacy): ok
- swap: should be disabled
- routes: ok
Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000
info: reading kernel config from /proc/config.gz ...
Generally Necessary:
- cgroup hierarchy: cgroups Hybrid mounted, cpuset|memory controllers status: good
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_IP_NF_TARGET_REJECT: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_MULTIPORT: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_POSIX_MQUEUE: enabled
Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: enabled
- CONFIG_IP_NF_TARGET_REDIRECT: enabled (as module)
- CONFIG_IP_SET: missing
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
- "overlay":
- CONFIG_VXLAN: enabled
Optional (for encrypted networks):
- CONFIG_CRYPTO: enabled
- CONFIG_CRYPTO_AEAD: enabled
- CONFIG_CRYPTO_GCM: enabled (as module)
- CONFIG_CRYPTO_SEQIV: enabled
- CONFIG_CRYPTO_GHASH: enabled
- CONFIG_XFRM: enabled
- CONFIG_XFRM_USER: enabled
- CONFIG_XFRM_ALGO: enabled
- CONFIG_INET_ESP: enabled (as module)
- CONFIG_INET_XFRM_MODE_TRANSPORT: missing
- Storage Drivers:
- "overlay":
- CONFIG_OVERLAY_FS: enabled (as module)
STATUS: pass
2. Flashing Failed After Kernel Customization in Jetpack 6.2
Recognizing the problem may come from kernel missing, I tried upgrading to Jetpack 6.2 (which includes Ubuntu 22.04) and followed the Nvidia Jetson Linux Developer Guide to customize the kernel and create a new image and rootfs. I then used SDK Manager to flash this image to Orin, but the flashing process fails with the following error messages:
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/bin/sync': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/cut': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/sha1sum': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/dirname': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/du': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/chattr': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/busybox': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/findmnt': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/whoami': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/bin/ps': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/bin/ip': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/w': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/xxd': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/diff': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/lsblk': No such file or directory
which indicates that several essential commands that were present in the original rootfs are missing after compiling the kernel. After flashing this image, the Orin device fails to boot.
Then, I tried a different flashing method where I manually replaced the missing kernel modules and then flashed the image onto Orin, but the boot still fails with errors as :