Jetson Orin Nano Dev Board Pods Stuck in ContainersCreating State

user109281 · July 5, 2024, 9:24pm

I have a Jetson Orin Nano and am trying to run k3s on it. However, all the pods/containers will never be created:

tyler@orin-nano-01:~$ curl -sfL https://get.k3s.io | sh -s - --docker --write-kubeconfig-mode 644 --write-kubeconfig $HOME/.kube/config
[INFO]  Finding release for channel stable
[INFO]  Using v1.29.6+k3s1 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.29.6+k3s1/sha256sum-arm64.txt
[INFO]  Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.29.6+k3s1/k3s-arm64
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping installation of SELinux RPM
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Skipping /usr/local/bin/ctr symlink to k3s, command exists in PATH at /usr/bin/ctr
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s
tyler@orin-nano-01:~$ kubectl get nodes
NAME           STATUS   ROLES                  AGE   VERSION
orin-nano-01   Ready    control-plane,master   17s   v1.29.6+k3s1
tyler@orin-nano-01:~$ kubectl get pods -A
NAMESPACE     NAME                                     READY   STATUS              RESTARTS   AGE
kube-system   coredns-6799fbcd5-f9mgq                  0/1     ContainerCreating   0          7s
kube-system   helm-install-traefik-5892k               0/1     ContainerCreating   0          8s
kube-system   helm-install-traefik-crd-xlkb2           0/1     ContainerCreating   0          8s
kube-system   local-path-provisioner-6f5d79df6-5bjpw   0/1     ContainerCreating   0          7s
kube-system   metrics-server-54fd9b65b-2szjw           0/1     ContainerCreating   0          7s
tyler@orin-nano-01:~$ kubectl describe pod coredns-6799fbcd5-f9mgq -n kube-system
Name:                 coredns-6799fbcd5-f9mgq
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      coredns
Node:                 orin-nano-01/192.168.1.230
Start Time:           Fri, 05 Jul 2024 15:28:04 -0500
Labels:               k8s-app=kube-dns
                      pod-template-hash=6799fbcd5
Annotations:          <none>
Status:               Pending
IP:                   
IPs:                  <none>
Controlled By:        ReplicaSet/coredns-6799fbcd5
Containers:
  coredns:
    Container ID:  
    Image:         rancher/mirrored-coredns-coredns:1.10.1
    Image ID:      
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=1s period=10s #success=1 #failure=3
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=2s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /etc/coredns/custom from custom-config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vdbv4 (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   False 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  custom-config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns-custom
    Optional:  true
  kube-api-access-vdbv4:
    Type:                     Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:   3607
    ConfigMapName:            kube-root-ca.crt
    ConfigMapOptional:        <nil>
    DownwardAPI:              true
QoS Class:                    Burstable
Node-Selectors:               kubernetes.io/os=linux
Tolerations:                  CriticalAddonsOnly op=Exists
                              node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                              node-role.kubernetes.io/master:NoSchedule op=Exists
                              node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints:  kubernetes.io/hostname:DoNotSchedule when max skew 1 is exceeded for selector k8s-app=kube-dns
Events:
  Type     Reason                  Age              From               Message
  ----     ------                  ----             ----               -------
  Normal   Scheduled               19s              default-scheduler  Successfully assigned kube-system/coredns-6799fbcd5-f9mgq to orin-nano-01
  Warning  FailedCreatePodSandBox  15s              kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "coredns-6799fbcd5-f9mgq": Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: openat2 /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod37077870_be63_4b9e_9bfa_872ce17336ca.slice/docker-6f56fb05536e2283dd3f00dc1a83ec36b0d7d5f1bde2ccd22643b87bcc0146ed.scope/cpu.weight: no such file or directory: unknown
  Warning  FailedCreatePodSandBox  5s               kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "coredns-6799fbcd5-f9mgq": Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: openat2 /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod37077870_be63_4b9e_9bfa_872ce17336ca.slice/docker-a2c97b1da3c071dc360217b408e7fc0cd11fb1b8afd2b4e9c67283ddf1f5d083.scope/cpu.weight: no such file or directory: unknown
  Normal   SandboxChanged          4s (x2 over 9s)  kubelet            Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  1s               kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "coredns-6799fbcd5-f9mgq": Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: openat2 /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod37077870_be63_4b9e_9bfa_872ce17336ca.slice/docker-497c9db51f20eb47b69e87fa46232f891b1747c6b403106989de0116c960ed71.scope/cpu.weight: no such file or directory: unknown

Some digging online seemed to point to cgroup related problems, but my tinkering with that hasn’t led to much.

Some background:

I’m running a custom kernel to enable the iSCSI TCP module. I followed this guide, and enabled CONFIG_ISCI_TCP=m and CONFIG_SCSI_ISCSI_ATTRS=m (to eventually support Longhorn pods). I’ve also enabled CONFIG_FAIR_GROUP_SCHED=y and CONFIG_RT_GROUP_SCHED=y in attempts to fix this issue (to no avail). Everything else should be standard.
- When I was running the “standard” kernel, k3s was able to create and run the pods.
I’m booting directly from an SSD, following this quick start guide.
I’ve updated and upgraded packages with sudo apt update && sudo apt upgrade.

I’m running the latest version of Jetpack:

tyler@orin-nano-01:~$ apt list --installed | grep nvidia-jetpack

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

nvidia-jetpack-dev/stable,now 6.0+b106 arm64 [installed,automatic]
nvidia-jetpack-runtime/stable,now 6.0+b106 arm64 [installed,automatic]
nvidia-jetpack/stable,now 6.0+b106 arm64 [installed]

Other machine info:

tyler@orin-nano-01:~$ uname -a
Linux orin-nano-01 5.15.136-rt-tegra #5 SMP PREEMPT_RT Fri Jul 5 13:52:58 CDT 2024 aarch64 aarch64 aarch64 GNU/Linux

Can anyone provide guidance on what I might be missing or what additional steps I should take to fix this?

AastaLLL · July 8, 2024, 4:18am

Hi,

There is a known issue in the nvidia-container.
COuld you check the below comment and check if it can help with your issue as well?

Thanks.

user109281 · July 8, 2024, 3:15pm

I ran sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml --mode=csv and now get this output:

tyler@orin-nano-01:~$ nvidia-ctk cdi list
INFO[0000] Found 2 CDI devices                          
nvidia.com/gpu=0
nvidia.com/gpu=all

However the same issue persists for all pods:

  Warning  Failed            25s (x2 over 47s)      kubelet            Error: failed to start container "coredns": Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: openat2 /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod2f270a53_b746_4fef_8e7c_0030e1910a27.slice/docker-coredns.scope/cpu.max: no such file or directory: unknown

This is my Docker daemon config for reference:

tyler@orin-nano-01:~$ sudo cat /etc/docker/daemon.json
{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

I’m not sure if this is a red herring or not, but attempting to verify CDI devices can be used from a Docker container:

tyler@orin-nano-01:~$ docker run --rm -ti --runtime=nvidia nvcr.io/nvidia/k8s/cuda-sample:devicequery-cuda12.5.0-ubuntu22.04
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: failed to create NVIDIA Container Runtime: failed to construct OCI spec modifier: requirements not met: cuda>=12.5||brand=unknown&&driver>=470&&driver<471||brand=grid&&driver>=470&&driver<471||brand=tesla&&driver>=470&&driver<471||brand=nvidia&&driver>=470&&driver<471||brand=quadro&&driver>=470&&driver<471||brand=quadrortx&&driver>=470&&driver<471||brand=nvidiartx&&driver>=470&&driver<471||brand=vapps&&driver>=470&&driver<471||brand=vpc&&driver>=470&&driver<471||brand=vcs&&driver>=470&&driver<471||brand=vws&&driver>=470&&driver<471||brand=cloudgaming&&driver>=470&&driver<471||brand=unknown&&driver>=535&&driver<536||brand=grid&&driver>=535&&driver<536||brand=tesla&&driver>=535&&driver<536||brand=nvidia&&driver>=535&&driver<536||brand=quadro&&driver>=535&&driver<536||brand=quadrortx&&driver>=535&&driver<536||brand=nvidiartx&&driver>=535&&driver<536||brand=vapps&&driver>=535&&driver<536||brand=vpc&&driver>=535&&driver<536||brand=vcs&&driver>=535&&driver<536||brand=vws&&driver>=535&&driver<536||brand=cloudgaming&&driver>=535&&driver<536||brand=unknown&&driver>=550&&driver<551||brand=grid&&driver>=550&&driver<551||brand=tesla&&driver>=550&&driver<551||brand=nvidia&&driver>=550&&driver<551||brand=quadro&&driver>=550&&driver<551||brand=quadrortx&&driver>=550&&driver<551||brand=nvidiartx&&driver>=550&&driver<551||brand=vapps&&driver>=550&&driver<551||brand=vpc&&driver>=550&&driver<551||brand=vcs&&driver>=550&&driver<551||brand=vws&&driver>=550&&driver<551||brand=cloudgaming&&driver>=550&&driver<551 not met: unknown.

The output of nvidia-smi for reference:

tyler@orin-nano-01:~$ nvidia-smi
Mon Jul  8 09:43:28 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 540.3.0                Driver Version: N/A          CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Orin (nvgpu)                  N/A  | N/A              N/A |                  N/A |
| N/A   N/A  N/A               N/A /  N/A | Not Supported        |     N/A          N/A |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

So I installed CUDA 12.5 seemingly successfully, but nvidia-smi’s output remains the same:

tyler@orin-nano-01:~$ nvidia-smi
Mon Jul  8 10:13:06 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 540.3.0                Driver Version: N/A          CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Orin (nvgpu)                  N/A  | N/A              N/A |                  N/A |
| N/A   N/A  N/A               N/A /  N/A | Not Supported        |     N/A          N/A |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Again, not sure if this piece of info is distracting or not.

AastaLLL · July 10, 2024, 6:19am

Hi,

The container you test is for a desktop environment.

For Orin, please try l4t-cuda below instead:

Thanks.

user109281 · July 10, 2024, 4:22pm

Ah, running docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-cuda:12.2.12-devel /bin/bash and compiling deviceQuery from the cuda-samples repo (tag v12.2), these are the results:

root@orin-nano-01:/cuda-samples-12.2/Samples/1_Utilities/deviceQuery# ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Orin"
  CUDA Driver Version / Runtime Version          12.2 / 12.2
  CUDA Capability Major/Minor version number:    8.7
  Total amount of global memory:                 7622 MBytes (7991873536 bytes)
  (008) Multiprocessors, (128) CUDA Cores/MP:    1024 CUDA Cores
  GPU Max Clock rate:                            624 MHz (0.62 GHz)
  Memory Clock rate:                             624 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        167936 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 12.2, NumDevs = 1
Result = PASS

This seems to align pretty well with running the deviceQuery natively on the Jetson Orin Nano:

tyler@orin-nano-01:~/Downloads/cuda-samples-12.2/Samples/1_Utilities/deviceQuery$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Orin"
  CUDA Driver Version / Runtime Version          12.2 / 12.5
  CUDA Capability Major/Minor version number:    8.7
  Total amount of global memory:                 7622 MBytes (7991873536 bytes)
  (008) Multiprocessors, (128) CUDA Cores/MP:    1024 CUDA Cores
  GPU Max Clock rate:                            624 MHz (0.62 GHz)
  Memory Clock rate:                             624 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        167936 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 12.5, NumDevs = 1
Result = PASS

So it seems we’re back to square 1?

user109281 · July 10, 2024, 9:22pm

So I went back and turns out disabling the real-time configuration for the kernel (and just building the generic one) fixes this issue. Now the pods run as expected!

tyler@orin-nano-01:~$ kubectl get pods -A
NAMESPACE     NAME                                     READY   STATUS      RESTARTS   AGE
kube-system   coredns-6799fbcd5-c9nhn                  1/1     Running     0          41s
kube-system   helm-install-traefik-8wjzw               0/1     Completed   1          41s
kube-system   helm-install-traefik-crd-9cmjv           0/1     Completed   0          41s
kube-system   local-path-provisioner-6f5d79df6-kkp6c   1/1     Running     0          41s
kube-system   metrics-server-54fd9b65b-rkt78           1/1     Running     0          41s
kube-system   svclb-traefik-cd82205c-d4qw6             2/2     Running     0          25s
kube-system   traefik-7d5f6474df-5whb6                 1/1     Running     0          25s

AastaLLL · July 12, 2024, 8:32am

Thanks for the update!

system · July 30, 2024, 7:14am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Devicequery in jetson nana works for docker but not for kubernetes Jetson Nano docker	6	1642	October 15, 2021
Cannot passthrough GPU to Kubernetes pod on the Jetson AGX Orin dev kit Jetson AGX Orin gpu , kubernetes	15	430	April 20, 2025
Kubeedge gpu jetson Jetson Orin Nano gpu	5	51	July 29, 2025
Can't find GPU in Kubernets on Jetson Nano cluster Jetson Nano nvbugs , neural-network-framework	27	4128	October 18, 2021
Sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml Jetson Orin Nano	3	36	July 30, 2025
Fail to install K3S and customize kernel on Jetson AGX ORIN Jetson AGX Orin kernel	10	334	February 10, 2025
K3s on Jetson Nano 4GB with Jetson MATE \| (NVIDIA Container Toolkit) upgrade Jetson Nano containers , gpu	15	3110	September 25, 2023
JetPack 6.3 containerd and kubernetes Jetson AGX Orin nvbugs , containers	12	1093	August 22, 2024
Kubernetes on Jetson Orin Nanos Jetson Orin Nano jetson	8	85	August 19, 2025
Cann't use gpu resources in containerd with k8s and orin nano Jetson Orin Nano gpu-computing	3	357	August 5, 2024

Jetson Orin Nano Dev Board Pods Stuck in ContainersCreating State

Related topics