I’m attempting to get this Jetson playing nicely with Kubernetes, using cri-o.
I’ve installed the latest stable toolkit on the Jetson.
NVIDIA Container Runtime version 1.13.5
commit: 6b8589dcb4dead72ab64f14a5912886e6165c079
spec: 1.1.0-rc.2
runc version 1.1.7-0ubuntu1~20.04.1
spec: 1.0.2-dev
go: go1.18.1
libseccomp: 2.5.1
CRI-O is configured with the following
[crio.runtime]
default_runtime = "nvidia"
[crio.runtime.runtimes.nvidia]
runtime_path = "/usr/bin/nvidia-container-runtime"
runtime_type = "oci"
runtime_root = "/run/nvidia-container-runtime"
nvidia-device-plugin is installed, and has labeled the node accordingly with nvidia.com/gpu: 1
The node shows as such
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
jetson1 Ready control-plane 7d16h v1.27.4+k0s 192.168.4.53 <none> Ubuntu 20.04.6 LTS 5.10.104-tegra cri-o://1.27.1
I’ve applied a RuntimeClass (though I thought I could do without it if the CRI is defaulting to nvidia)
---
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: gpu-enabled-class
handler: nvidia
And this is the Pod that I’m testing
---
apiVersion: v1
kind: Pod
metadata:
name: nvidia-query
spec:
runtimeClassName: gpu-enabled-class
restartPolicy: OnFailure
containers:
- name: nvidia-query
image: dudo/test_cuda
resources:
limits:
nvidia.com/gpu: 1
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
This pod executes as intended, running this script, but it doesn’t utilize any gpu when checking jtop
. If I run the script directly on the Jetson, jtop
shows as expected, and the gpu is utilized, but from the container, nada.
Any ideas on what might be misconfigured? Any recommendations on how to debug this further?