Fail to install K3S and customize kernel on Jetson AGX ORIN

Device: Nvidia Jetson AGX Orin 64GB
OS: Ubuntu 20.04
Jetpack Version: 5.1.2-b104

Issue Description:

1. Error Running K3S on Nvidia Jetson AGX Orin
Following the official instructions from Rancher, after installing K3S on Orin, I checked the system component pods. Some pods show as “not Ready” and some critical pods are missing when compared to an x86 setup:

# kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS             RESTARTS          AGE
kube-system   coredns-7f9dc8d998-rcw8z                  0/1     Running            0                 39h
kube-system   helm-install-traefik-crd-d8nzr            1/1     Running            316 (5m11s ago)   39h
kube-system   helm-install-traefik-ntgks                1/1     Running            316 (5m13s ago)   39h
kube-system   local-path-provisioner-85674b7ddf-mfjxp   0/1     CrashLoopBackOff   428 (4m3s ago)    39h
kube-system   metrics-server-68f955568b-55kg4           0/1     CrashLoopBackOff   428 (9s ago)      39h

On an x86 server (Ubuntu 22.04 OS), following the same installation steps, the pods are running fine.

# kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS             RESTARTS          AGE
default       simple-pod                                1/1     Running            0                 123m
kube-system   coredns-54fd9cb578-26982                  1/1     Running            0                 62m
kube-system   helm-install-traefik-crd-wv2gn            1/1     Completed          0                 3h19m
kube-system   helm-install-traefik-m69k7                0/1     Completed          1                 3h19m
kube-system   local-path-provisioner-85674b7ddf-x7gdk   1/1     Running            0                 3h19m
kube-system   metrics-server-68f955568b-pm58m           2/2     Running            0                 3h18m
kube-system   svclb-traefik-d14f6f79-gzn2r              1/1     Running            0                 3h18m
kube-system   traefik-cb458865d-2szr4                   1/1     Running            0                 3h18m

Ignoring the above error and attempting to run my application on ORIN, I found that pods can communicate via IP, but cannot communicate via its exposed Service name. The error message is “Domain name cannot be resolved,” which I suspect is due to the CoreDNS ‘not ready’ issue.
The logs from the CoreDNS pod on Orin are as follows, indicating that it cannot connect to the apiserver (the service address is 10.43.0.1:443):

[ERROR] plugin/kubernetes: Unhandled Error
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.31.2/tools/cache/reflector.go:243: failed to list *v1.EndpointSlice: Get "https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: i/o timeout
[ERROR] plugin/kubernetes: Unhandled Error
[INFO] plugin/ready: Still waiting on: "kubernetes"
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.override
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
[INFO] plugin/ready: Still waiting on: "kubernetes"

The logs from the metrics-server pod show the same issue, unable to connect to 10.43.0.1:443.

# kubectl logs -f metrics-server-5985cb9d7-kxtkg -n kube-system
Error: unable to load configmap based request-header-client-ca-file: Get "https://10.43.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication": dial tcp 10.43.0.1:443: i/o timeout

I added a route for the 10.43.0.0/16 clusterIP network segment (ip route add 10.43.0.0/16 dev flannel.1), checked the firewall and found no blocking policies, but the issue remains.

Running curl -k https://10.43.0.1:443/healthz , the command hangs and then shows:
curl: (28) Failed to connect to 10.43.0.1 port 443: Connection timed out

Using k3s check-config to checks the modules on Orin and reports missing network-related kernel modules (especially the CONFIG_IP_SET module), which indicates the need to recompile the kernel to complete these modules.

# k3s check-config
cat: /sys/kernel/security/apparmor/profiles: No such file or directory

Verifying binaries in /var/lib/rancher/k3s/data/7ddc159b1f5dc21c2fb0faf67a15be16624cfe19fb4dc756ec057f276b092b3a/bin:
- sha256sum: good
- links: good

System:
- /usr/sbin iptables v1.8.4 (legacy): ok
- swap: should be disabled
- routes: ok

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: cgroups Hybrid mounted, cpuset|memory controllers status: good
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_IP_NF_TARGET_REJECT: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_MULTIPORT: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_POSIX_MQUEUE: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: enabled
- CONFIG_IP_NF_TARGET_REDIRECT: enabled (as module)
- CONFIG_IP_SET: missing
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: enabled (as module)
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: enabled
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled
      - CONFIG_XFRM_ALGO: enabled
      - CONFIG_INET_ESP: enabled (as module)
      - CONFIG_INET_XFRM_MODE_TRANSPORT: missing
- Storage Drivers:
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)

STATUS: pass

2. Flashing Failed After Kernel Customization in Jetpack 6.2
Recognizing the problem may come from kernel missing, I tried upgrading to Jetpack 6.2 (which includes Ubuntu 22.04) and followed the Nvidia Jetson Linux Developer Guide to customize the kernel and create a new image and rootfs. I then used SDK Manager to flash this image to Orin, but the flashing process fails with the following error messages:

15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/bin/sync': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/cut': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/sha1sum': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/dirname': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/du': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/chattr': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/busybox': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/findmnt': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/whoami': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/bin/ps': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/bin/ip': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/w': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/xxd': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/diff': No such file or directory
15:25:21 ERROR: Flash Jetson Linux - flash: cp: cannot stat '/home/ps/nvidia/nvidia_sdk/JetPack_6.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/bin/lsblk': No such file or directory

which indicates that several essential commands that were present in the original rootfs are missing after compiling the kernel. After flashing this image, the Orin device fails to boot.

Then, I tried a different flashing method where I manually replaced the missing kernel modules and then flashed the image onto Orin, but the boot still fails with errors as :

Hi,
Here are some suggestions for the common issues:

1. Performance

Please run the below command before benchmarking deep learning use case:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

2. Installation

Installation guide of deep learning frameworks on Jetson:

3. Tutorial

Startup deep learning tutorial:

4. Report issue

If these suggestions don’t help and you want to report an issue to us, please attach the model, command/step, and the customized app (if any) with us to reproduce locally.

Thanks!

Hi,

JetPack 5.1.2 and JetPack 6.2 environments are quite different.
It’s recommended to use this topic to track k3s + JetPack 5.1.2 issue and file a new one for JetPack 6.2.

Could you test the below command to setup k3s?

$ sudo apt install curl
$ curl -sfL https://get.k3s.io | sh -s - --docker
$ sudo k3s kubectl get pods --all-namespaces

Thanks.

It’s

Thanks for your advice.

I did follow this step to install k3s on my orin with Jetpack5.1.2. What's difference was that I used default containerd as the container runtime.

Hi,

Could you try if k3s can work with the nvidia runtime?
Thanks.

any update regarding this issue which you have reported ? were you able to solve this ?
Now I am encountering same issue on Jetson Orion developer kit 64gb device…

Your coredns are down.
Check if you have dns server in
/etc/resolv.conf
nameserver 1.1.1.1
For example.
If not add some dns server, if it’s an air-gap environment add some dummy ip.
After that restart the pod and check if coredns are started.
If all working well try to restart the orin and check again resolv.conf if the ip are still appear in file if not, some service override your ip you can add again the ip settings to file and after that run
sudo chattr +i /etc/resolv.conf
To prevent from another service override the file.
If you want edit /etc/resolv.conf
run sudo chattr -i /etc/resolv.conf

thanks for the response, but above steps didn’t work. eventually i learnt that only a specific version of k3s is working “v1.30.6+k3s1” with my configuration of jetpack 5.1.2 and deepstream 6.3. I have used the below command to install it.

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=“v1.30.6+k3s1” sh -

/etc/resolv.conf file looks like :

sudo kubectl get pods -A

1 Like

Thanks a lot for your efforts and happy to see the successful running.
I was able to get it working by simply upgrading the jetpack version to 6.2 as I wasn’t sure which older version of K3s would be compatible with JetPack 5.1.2.

1 Like

127.0.0.53 it’s not valid upstream server for k3s
You can read about it here