Devicequery in jetson nana works for docker but not for kubernetes

What am i doing wrong here ? devicequery works for docker container but not when running the same image in kubernetes.

raj@raj-desktop:~/nvdli-data$ docker run -it jitteam/devicequery ./deviceQuery
./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “NVIDIA Tegra X1”
CUDA Driver Version / Runtime Version 10.2 / 10.0
CUDA Capability Major/Minor version number: 5.3
Total amount of global memory: 1972 MBytes (2067636224 bytes)
( 1) Multiprocessors, (128) CUDA Cores/MP: 128 CUDA Cores
GPU Max Clock rate: 922 MHz (0.92 GHz)
Memory Clock rate: 13 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS
raj@raj-desktop:~/nvdli-data$ kubectl apply -f devicequery.yaml
pod/devicequery unchanged
raj@raj-desktop:~/nvdli-data$ kubectl logs devicequery
./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
→ CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

pod yaml

apiVersion: v1
kind: Pod
metadata:
name: devicequery
spec:
containers:
- name: nvidia
image: jitteam/devicequery:latest
command: [ “./deviceQuery” ]

It was because the kubernetes[k3s] was using containerd instead of docker and nvidia device config setting for containerd is not clear. is there a sample for config.toml that will work for jetson nano ?

Installation Guide — NVIDIA Cloud Native Technologies documentation

if k3s was using docker, its working correctly.

May be related to Support for containerd due to Kubernetes 1.20 changes on ARM64 devices · Issue #1468 · NVIDIA/nvidia-docker (github.com)

Hi,

Here is a tutorial to run device_query with Kubernetes.
Could you check if you setup an environment in a similar way?

Thanks.

Yes, having k3s with docker as container runtime works. The issue is k3s with containerd (which is the default) doesnt work. Like changing docker daemon to use nvidia-container-runtime, similar config change is needed to allow containerd config to use nvidia plugin. I made the config change per documentation and think config is correct, but running the devicequery as pod with k3s/containerd/nvidia-plugin seems to be not working.

This is what the config change looks like:

diff config.toml.orig config.toml
68c68
< default_runtime_name = “runc”

  default_runtime_name = "nvidia"

85a86,94

      SystemCgroup = true
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
      privileged_without_host_devices = false
      runtime_engine = ""
      runtime_root = ""
      runtime_type = "io.containerd.runc.v1"
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
        BinaryName = "/usr/bin/nvidia-container-runtime"
        SystemdCgroup = true

—And the devicequery log:
kubectl logs devicequery-ctr
./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
→ CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

Announcing containerd Support for the NVIDIA GPU Operator | NVIDIA Developer Blog

looks like GPUoperator is not yet supported for jetson nano.

Hi,

If k8s is used, it should work on Jetson Nano.
Here is a previous discussion for your reference:

Thanks.