Hello,
I am trying to create a kubernetes cluster on top of AGX Xavier Jetsons with Jetpack 4.6.1. I created the cluster succesfully and after that, in order to expose the gpus to the services-pods, following the nvidias readme:
https://github.com/NVIDIA/k8s-device-plugin?fbclid=IwAR1_LG86MIM4P-KbGsZ5kkIbwHchpKh9HX6P47pI-rbOmhk6TA3iVQ6Jeac
I tried to add the nvidias kubernetes plug in. After doing that, the plug in could not start due to the following error:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 19s kubelet Pod sandbox changed, it will be killed and re-created.
Normal Killing 19s kubelet Stopping container nvidia-device-plugin-ctr
Normal Pulled 16s kubelet Container image "nvcr.io/nvidia/k8s-device-plugin:v0.14.0" already present on machine
Normal Created 15s kubelet Created container nvidia-device-plugin-ctr
Warning Failed 10s kubelet Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: src: /etc/vulkan/icd.d/nvidia_icd.json, src_lnk: /usr/lib/aarch64-linux-gnu/tegra/nvidia_icd.json, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/etc/vulkan/icd.d/nvidia_icd.json, dst_lnk: /usr/lib/aarch64-linux-gnu/tegra/nvidia_icd.json
src: /usr/lib/aarch64-linux-gnu/libcuda.so, src_lnk: tegra/libcuda.so, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libcuda.so, dst_lnk: tegra/libcuda.so
src: /usr/lib/aarch64-linux-gnu/libdrm_nvdc.so, src_lnk: tegra/libdrm.so.2, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libdrm_nvdc.so, dst_lnk: tegra/libdrm.so.2
src: /usr/lib/aarch64-linux-gnu/libv4l2.so.0.0.999999, src_lnk: tegra/libnvv4l2.so, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libv4l2.so.0.0.999999, dst_lnk: tegra/libnvv4l2.so
src: /usr/lib/aarch64-linux-gnu/libv4lconvert.so.0.0.999999, src_lnk: tegra/libnvv4lconvert.so, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libv4lconvert.so.0.0.999999, dst_lnk: tegra/libnvv4lconvert.so
src: /usr/lib/aarch64-linux-gnu/libv4l/plugins/nv/libv4l2_nvargus.so, src_lnk: ../../../tegra/libv4l2_nvargus.so, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libv4l/plugins/nv/libv4l2_nvargus.so, dst_lnk: ../../../tegra/libv4l2_nvargus.so
src: /usr/lib/aarch64-linux-gnu/libv4l/plugins/nv/libv4l2_nvcuvidvideocodec.so, src_lnk: ../../../tegra/libv4l2_nvcuvidvideocodec.so, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libv4l/plugins/nv/libv4l2_nvcuvidvideocodec.so, dst_lnk: ../../../tegra/libv4l2_nvcuvidvideocodec.so
src: /usr/lib/aarch64-linux-gnu/libv4l/plugins/nv/libv4l2_nvvidconv.so, src_lnk: ../../../tegra/libv4l2_nvvidconv.so, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libv4l/plugins/nv/libv4l2_nvvidconv.so, dst_lnk: ../../../tegra/libv4l2_nvvidconv.so
src: /usr/lib/aarch64-linux-gnu/libv4l/plugins/nv/libv4l2_nvvideocodec.so, src_lnk: ../../../tegra/libv4l2_nvvideocodec.so, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libv4l/plugins/nv/libv4l2_nvvideocodec.so, dst_lnk: ../../../tegra/libv4l2_nvvideocodec.so
src: /usr/lib/aarch64-linux-gnu/libvulkan.so.1.2.141, src_lnk: tegra/libvulkan.so.1.2.141, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libvulkan.so.1.2.141, dst_lnk: tegra/libvulkan.so.1.2.141
src: /usr/lib/aarch64-linux-gnu/tegra/libcuda.so, src_lnk: libcuda.so.1.1, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/tegra/libcuda.so, dst_lnk: libcuda.so.1.1
src: /usr/lib/aarch64-linux-gnu/tegra/libnvbufsurface.so, src_lnk: libnvbufsurface.so.1.0.0, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/tegra/libnvbufsurface.so, dst_lnk: libnvbufsurface.so.1.0.0
src: /usr/lib/aarch64-linux-gnu/tegra/libnvbufsurftransform.so, src_lnk: libnvbufsurftransform.so.1.0.0, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/tegra/libnvbufsurftransform.so, dst_lnk: libnvbufsurftransform.so.1.0.0
src: /usr/lib/aarch64-linux-gnu/tegra/libnvbuf_utils.so, src_lnk: libnvbuf_utils.so.1.0.0, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/tegra/libnvbuf_utils.so, dst_lnk: libnvbuf_utils.so.1.0.0
src: /usr/lib/aarch64-linux-gnu/tegra/libnvdsbufferpool.so, src_lnk: libnvdsbufferpool.so.1.0.0, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/tegra/libnvdsbufferpool.so, dst_lnk: libnvdsbufferpool.so.1.0.0
src: /usr/lib/aarch64-linux-gnu/tegra/libnvid_mapper.so, src_lnk: libnvid_mapper.so.1.0.0, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/tegra/libnvid_mapper.so, dst_lnk: libnvid_mapper.so.1.0.0
src: /usr/share/glvnd/egl_vendor.d/10_nvidia.json, src_lnk: ../../../lib/aarch64-linux-gnu/tegra-egl/nvidia.json, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/share/glvnd/egl_vendor.d/10_nvidia.json, dst_lnk: ../../../lib/aarch64-linux-gnu/tegra-egl/nvidia.json
src: /usr/lib/aarch64-linux-gnu/libcudnn.so.8, src_lnk: libcudnn.so.8.2.1, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libcudnn.so.8, dst_lnk: libcudnn.so.8.2.1
src: /usr/lib/aarch64-linux-gnu/libcudnn.so, src_lnk: /etc/alternatives/libcudnn_so, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libcudnn.so, dst_lnk: /etc/alternatives/libcudnn_so
src: /usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8, src_lnk: libcudnn_ops_infer.so.8.2.1, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8, dst_lnk: libcudnn_ops_infer.so.8.2.1
src: /usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so, src_lnk: /etc/alternatives/libcudnn_ops_infer_so, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so, dst_lnk: /etc/alternatives/libcudnn_ops_infer_so
src: /usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8, src_lnk: libcudnn_ops_train.so.8.2.1, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8, dst_lnk: libcudnn_ops_train.so.8.2.1
src: /usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so, src_lnk: /etc/alternatives/libcudnn_ops_train_so, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so, dst_lnk: /etc/alternatives/libcudnn_ops_train_so
src: /usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8, src_lnk: libcudnn_adv_infer.so.8.2.1, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8, dst_lnk: libcudnn_adv_infer.so.8.2.1
src: /usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so, src_lnk: /etc/alternatives/libcudnn_adv_infer_so, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so, dst_lnk: /etc/alternatives/libcudnn_adv_infer_so
src: /usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8, src_lnk: libcudnn_cnn_infer.so.8.2.1, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8, dst_lnk: libcudnn_cnn_infer.so.8.2.1
src: /usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so, src_lnk: /etc/alternatives/libcudnn_cnn_infer_so, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so, dst_lnk: /etc/alternatives/libcudnn_cnn_infer_so
src: /usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8, src_lnk: libcudnn_adv_train.so.8.2.1, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8, dst_lnk: libcudnn_adv_train.so.8.2.1
src: /usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so, src_lnk: /etc/alternatives/libcudnn_adv_train_so, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so, dst_lnk: /etc/alternatives/libcudnn_adv_train_so
src: /usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8, src_lnk: libcudnn_cnn_train.so.8.2.1, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8, dst_lnk: libcudnn_cnn_train.so.8.2.1
src: /usr/include/cudnn_adv_infer.h, src_lnk: /etc/alternatives/cudnn_adv_infer_h, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/include/cudnn_adv_infer.h, dst_lnk: /etc/alternatives/cudnn_adv_infer_h
src: /usr/include/cudnn_adv_train.h, src_lnk: /etc/alternatives/cudnn_adv_train_h, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/include/cudnn_adv_train.h, dst_lnk: /etc/alternatives/cudnn_adv_train_h
src: /usr/include/cudnn_backend.h, src_lnk: /etc/alternatives/cudnn_backend_h, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/include/cudnn_backend.h, dst_lnk: /etc/alternatives/cudnn_backend_h
src: /usr/include/cudnn_cnn_infer.h, src_lnk: /etc/alternatives/cudnn_cnn_infer_h, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/include/cudnn_cnn_infer.h, dst_lnk: /etc/alternatives/cudnn_cnn_infer_h
src: /usr/include/cudnn_cnn_train.h, src_lnk: /etc/alternatives/cudnn_cnn_train_h, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/include/cudnn_cnn_train.h, dst_lnk: /etc/alternatives/cudnn_cnn_train_h
src: /usr/include/cudnn.h, src_lnk: /etc/alternatives/libcudnn, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/include/cudnn.h, dst_lnk: /etc/alternatives/libcudnn
src: /usr/include/cudnn_ops_infer.h, src_lnk: /etc/alternatives/cudnn_ops_infer_h, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/include/cudnn_ops_infer.h, dst_lnk: /etc/alternatives/cudnn_ops_infer_h
src: /usr/include/cudnn_ops_train.h, src_lnk: /etc/alternatives/cudnn_ops_train_h, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/include/cudnn_ops_train.h, dst_lnk: /etc/alternatives/cudnn_ops_train_h
src: /usr/include/cudnn_version.h, src_lnk: /etc/alternatives/cudnn_version_h, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/include/cudnn_version.h, dst_lnk: /etc/alternatives/cudnn_version_h
src: /etc/alternatives/libcudnn, src_lnk: /usr/include/aarch64-linux-gnu/cudnn_v8.h, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/etc/alternatives/libcudnn, dst_lnk: /usr/include/aarch64-linux-gnu/cudnn_v8.h
src: /etc/alternatives/libcudnn_adv_infer_so, src_lnk: /usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/etc/alternatives/libcudnn_adv_infer_so, dst_lnk: /usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8
src: /etc/alternatives/libcudnn_adv_train_so, src_lnk: /usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/etc/alternatives/libcudnn_adv_train_so, dst_lnk: /usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8
src: /etc/alternatives/libcudnn_cnn_infer_so, src_lnk: /usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/etc/alternatives/libcudnn_cnn_infer_so, dst_lnk: /usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8
src: /etc/alternatives/libcudnn_cnn_train_so, src_lnk: /usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/etc/alternatives/libcudnn_cnn_train_so, dst_lnk: /usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8
src: /etc/alternatives/libcudnn_ops_infer_so, src_lnk: /usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/etc/alternatives/libcudnn_ops_infer_so, dst_lnk: /usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8
src: /etc/alternatives/libcudnn_ops_train_so, src_lnk: /usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/etc/alternatives/libcudnn_ops_train_so, dst_lnk: /usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8
src: /etc/alternatives/libcudnn_so, src_lnk: /usr/lib/aarch64-linux-gnu/libcudnn.so.8, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/etc/alternatives/libcudnn_so, dst_lnk: /usr/lib/aarch64-linux-gnu/libcudnn.so.8
src: /etc/alternatives/cudnn_adv_infer_h, src_lnk: /usr/include/aarch64-linux-gnu/cudnn_adv_infer_v8.h, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/etc/alternatives/cudnn_adv_infer_h, dst_lnk: /usr/include/aarch64-linux-gnu/cudnn_adv_infer_v8.h
src: /etc/alternatives/cudnn_backend_h, src_lnk: /usr/include/aarch64-linux-gnu/cudnn_backend_v8.h, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/etc/alternatives/cudnn_backend_h, dst_lnk: /usr/include/aarch64-linux-gnu/cudnn_backend_v8.h
src: /etc/alternatives/cudnn_cnn_train_h, src_lnk: /usr/include/aarch64-linux-gnu/cudnn_cnn_train_v8.h, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/etc/alternatives/cudnn_cnn_train_h, dst_lnk: /usr/include/aarch64-linux-gnu/cudnn_cnn_train_v8.h
src: /etc/alternatives/cudnn_ops_train_h, src_lnk: /usr/include/aarch64-linux-gnu/cudnn_ops_train_v8.h, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/etc/alternatives/cudnn_ops_train_h, dst_lnk: /usr/include/aarch64-linux-gnu/cudnn_ops_train_v8.h
src: /etc/alternatives/cudnn_adv_train_h, src_lnk: /usr/include/aarch64-linux-gnu/cudnn_adv_train_v8.h, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/etc/alternatives/cudnn_adv_train_h, dst_lnk: /usr/include/aarch64-linux-gnu/cudnn_adv_train_v8.h
src: /etc/alternatives/cudnn_cnn_infer_h, src_lnk: /usr/include/aarch64-linux-gnu/cudnn_cnn_infer_v8.h, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/etc/alternatives/cudnn_cnn_infer_h, dst_lnk: /usr/include/aarch64-linux-gnu/cudnn_cnn_infer_v8.h
src: /etc/alternatives/cudnn_ops_infer_h, src_lnk: /usr/include/aarch64-linux-gnu/cudnn_ops_infer_v8.h, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/etc/alternatives/cudnn_ops_infer_h, dst_lnk: /usr/include/aarch64-linux-gnu/cudnn_ops_infer_v8.h
src: /etc/alternatives/cudnn_version_h, src_lnk: /usr/include/aarch64-linux-gnu/cudnn_version_v8.h, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/etc/alternatives/cudnn_version_h, dst_lnk: /usr/include/aarch64-linux-gnu/cudnn_version_v8.h
src: /usr/lib/aarch64-linux-gnu/libcudnn_static.a, src_lnk: /etc/alternatives/libcudnn_stlib, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libcudnn_static.a, dst_lnk: /etc/alternatives/libcudnn_stlib
src: /usr/lib/libvisionworks_sfm.so.0.90, src_lnk: libvisionworks_sfm.so.0.90.4, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/libvisionworks_sfm.so.0.90, dst_lnk: libvisionworks_sfm.so.0.90.4
src: /usr/lib/libvisionworks.so, src_lnk: libvisionworks.so.1.6, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/libvisionworks.so, dst_lnk: libvisionworks.so.1.6
src: /usr/lib/libvisionworks_tracking.so.0.88, src_lnk: libvisionworks_tracking.so.0.88.2, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/libvisionworks_tracking.so.0.88, dst_lnk: libvisionworks_tracking.so.0.88.2
src: /usr/lib/aarch64-linux-gnu/libnvinfer.so.8, src_lnk: libnvinfer.so.8.2.1, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libnvinfer.so.8, dst_lnk: libnvinfer.so.8.2.1
src: /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8, src_lnk: libnvinfer_plugin.so.8.2.1, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8, dst_lnk: libnvinfer_plugin.so.8.2.1
src: /usr/lib/aarch64-linux-gnu/libnvparsers.so.8, src_lnk: libnvparsers.so.8.2.1, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libnvparsers.so.8, dst_lnk: libnvparsers.so.8.2.1
src: /usr/lib/aarch64-linux-gnu/libnvonnxparser.so.8, src_lnk: libnvonnxparser.so.8.2.1, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libnvonnxparser.so.8, dst_lnk: libnvonnxparser.so.8.2.1
src: /usr/lib/aarch64-linux-gnu/libnvinfer.so, src_lnk: libnvinfer.so.8.2.1, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libnvinfer.so, dst_lnk: libnvinfer.so.8.2.1
src: /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so, src_lnk: libnvinfer_plugin.so.8.2.1, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so, dst_lnk: libnvinfer_plugin.so.8.2.1
src: /usr/lib/aarch64-linux-gnu/libnvparsers.so, src_lnk: libnvparsers.so.8.2.1, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libnvparsers.so, dst_lnk: libnvparsers.so.8.2.1
src: /usr/lib/aarch64-linux-gnu/libnvonnxparser.so, src_lnk: libnvonnxparser.so.8, dst: /run/containerd/io.containerd.runtime.v2.task/k8s.io/nvidia-device-plugin-ctr/rootfs/usr/lib/aarch64-linux-gnu/libnvonnxparser.so, dst_lnk: libnvonnxparser.so.8
, stderr: nvidia-container-cli: mount error: open failed: /sys/fs/cgroup/devices/system.slice/containerd.service/kubepods-besteffort-pod42d1b19e_561f_489a_9bff_1afbbdab3791.slice/devices.allow: no such file or directory: unknown
Warning BackOff 9s (x2 over 10s) kubelet Back-off restarting failed container nvidia-device-plugin-ctr in pod nvidia-device-plugin-daemonset-pz65q_kube-system(42d1b19e-561f-489a-9bff-1afbbdab3791)
The same error i get when i try to deploy a container from ngc (compatible with jetpack 4.6.1).
After several hours trying to figure out whats going wrong, i found that the nvidia-container-toolkit that jetpack 4.6.1 uses is 1.7.0-1 version and the plug in is compatible with >= 1.11.0. The problem is that after trying to install nvidia-container-toolkit_1.11.0 to the AGX, i get compatibility errors:
orfeas@xavier-agx-01:~/Documents/jetpack_debs/jetpack_511$ sudo dpkg -i nvidia-container-toolkit_1.11.0_rc.1-1_arm64.deb
(Reading database ... 170933 files and directories currently installed.)
Preparing to unpack nvidia-container-toolkit_1.11.0_rc.1-1_arm64.deb ...
Unpacking nvidia-container-toolkit (1.11.0~rc.1-1) over (1.7.0-1) ...
dpkg: dependency problems prevent configuration of nvidia-container-toolkit:
nvidia-container-toolkit depends on libnvidia-container-tools (>= 1.10.0-1); however:
Version of libnvidia-container-tools on system is 1.7.0-1.
dpkg: error processing package nvidia-container-toolkit (--install):
dependency problems - leaving unconfigured
Errors were encountered while processing:
nvidia-container-toolkit
This is probably due to the fact that nvidia-container-toolkit_1.11.0_rc.1-1_arm64.deb is compatible only with jetpacks 5.*.
I could easily just reflash the jetsons with a jetpack 5.* and fix the problem but the nvidias images (for jetpack 5.*) are about 12GB (with pytorch) and this is prohibited in the case I am studying. In jetpack 4.6.1 the nvidias images are about 1.9GB (with pytorch) and this is ok, thats why i am trying to set it up on jetpack 4.6.1.
What can i do in order to get my cluster up and running with gpu support in jetpack 4.6.1 (or in jetpack 5.* but without 12GB images…)?
My kubernetes version is:
1.26.3
This is the first time i am asking to nvidia developer forum so if you need further information i can post it.
Thank you in advance.