Nvidia-container-runtime 1.13 (experimental branch) on k3s

shak04995 · June 8, 2023, 7:41am

Hi!
Currently, I am using three Jetson Nanos to setup a K3s + containerd cluster with GPU support. By default, they come with nvidia-container-runtime 1.7 which it has a bug, thus I upgraded to the experimental branch to receive the nvidia-container-runtime 1.13 which worked but that lead to the containers not pulling cuda and cudart libraries not being pulled from the host OS as reported on the L4T Base webpage (given that Jeptack 5.x uses 1.9). Therefore, my question is: is there any way to configure the runtime to pull the libraries from the OS or do I need to build the images with the cuda libraries?

I also tried bind mounting the paths for my container but I find that TF reserves between 10 MB to 50 MB of memory, thus my application fails with OOM.

Any guidance will be much appreaciated!

AastaLLL · June 8, 2023, 8:27am

Hi,

We need to check this with our internal team.
Will update more info with you later.

Thanks.

elezar · June 8, 2023, 9:06am

Firstly, it is not required to use the experimental branch. The stable repositories for installing the NVIDIA Container Toolkit packages can be configured by following the steps outlined in our documentation. For Tegra support in the device plugin, at least NVIDIA Container Toolkit v1.11.0 is required as this automatically includes the mounts required to detect Jetpack-based systems.

With regards to not including the CUDA libraries from the host. This was a design decision to enable portability of containers. Ideally, containers would package the runtime dependencies such as the CUDA Toolkit and Runtime library (the same holds for CUDNN and CUBLAS, for example).

There is an (undocumented) option to revert back to the previous (<1.10.0) behaviour, but it should only be used as a last resort.

Note that as implemented, the NVIDIA Container Runtime considers the files l4t.csv, drivers.csv, and devices.csv in /etc/nvidia-container-runtime/host-files-for-container.d by default. Would including the relevant libraries in drivers.csv could be a reasonable workaround for you at the moment?

shak04995 · June 9, 2023, 8:05pm

Hi @elezar!

Sorry for my late response but thank you for your comments! It worked as before!

May I ask why when using bind mounts, TF would only reserve a few MB but through the runtime, it would reserve what it needs (sometimes ~200MB to 1GB)?

Again many thanks!

Best,
Isaac

AastaLLL · June 26, 2023, 4:12am

Hi,

This depends on the available memory amount at the runtime.
Thanks.

Topic		Replies	Views
Nvidia-container-runtime on Jetson Nano using Yocto Jetson Nano	3	2805	February 11, 2020
Cuda library is not found in jetson-containers docker Jetson Xavier NX cuda , docker	8	2078	February 1, 2023
Host libraries for nvidia-container-runtime Jetson AGX Xavier containers	3	1460	December 14, 2022
Unable to run Nvidia cuda sample app (./deviceQuery) inside container Jetson TX2 cuda , docker	3	864	October 18, 2021
Jetson Orin NX nvidia-container-runtime Jetson Orin NX containers , kubernetes	8	1902	September 11, 2023
How do I install a newer version of container toolkit? Jetson Xavier NX containers	6	2151	September 13, 2023
CUDA. CuDNN, OpenCV unavaliable after installing Jetpack 5.1.1 on Xavier NX Jetson Xavier NX opencv , cuda , python	10	1122	May 22, 2023
Nvidia container runtime and versions of L4T < 32.2 Jetson TX2	4	771	October 18, 2021
How to use TensorRT in container with python3 application? Jetson Nano tensorrt , jetson-inference , nano	6	2119	October 11, 2021
Why it doesn't see CUDA docker container? Jetson Nano cuda , ubuntu , containers	4	1163	August 30, 2023

Nvidia-container-runtime 1.13 (experimental branch) on k3s

Related topics