Libnvidia-ml location and source?

avi24 · September 14, 2023, 3:59pm

I have been trying to compile libnvidia-container from source (with some success) on multiple platforms, including musl-based.

When I actually try to run nvidia-container-cli, it runs, but if I try to do an action, like nvidia-container-cli list, I get an error finding the libnvidia-ml.so.1:

$ ./nvidia-container-cli list
nvidia-container-cli: initialization error: load library failed: Error loading shared library libnvidia-ml.so.1: no such file or directory

Looking at the library on a normal Jetpack, I see that the library is at the following path, installed by the following deb pkg:

$ find /usr -name 'libnvidia-ml*'
/usr/local/cuda-11.4/targets/aarch64-linux/lib/stubs/libnvidia-ml.so
$ dpkg -S /usr/local/cuda-11.4/targets/aarch64-linux/lib/stubs/libnvidia-ml.so
cuda-nvml-dev-11-4: /usr/local/cuda-11.4/targets/aarch64-linux/lib/stubs/libnvidia-ml.so

Is the source for that available anywhere? I want to get it installed on OSes that do not use rpm/yum/zypper, including musl-based (no glibc).
Why is the search coded into the app itself, rather than using the normal linker and LD_LIBRARY_PATH or similar to find it?

Thanks.

AastaLLL · September 18, 2023, 3:42am

Hi,

It looks like the library is included in the CUDA package.
Here is the OTA download link for your reference: Index

Thanks.

avi24 · September 18, 2023, 7:24am

Hi,

Those are Debian packages, not binary files. It always is possible to extract them, but that gets messy.

They also won’t work on non-glibc-based systems.

Is the source available?

avi24 · September 18, 2023, 8:05am

Also, I find it interesting. In order to run containers with access to GPUs, I need the CUDA libraries both outside the container to start it, and inside the container to run CUDA apps?

dusty_nv · September 18, 2023, 2:01pm

@avi24 on JetPack 4, yes you need CUDA/cuDNN/TensorRT installed on your device, and they will be mounted into container by --runtime nvidia. You don’t need CUDA/cuDNN/TensorRT packages installed inside the JetPack 4 containers. The containers should be derived on l4t-base.

On JetPack 5 (which you are presumably using since your CUDA version is 11.4), the CUDA Toolkit/ect are installed inside the container and not mounted by --runtime nvidia. However there are still some drivers that get mounted that you can find in /etc/nvidia-container-runtime/host-files-for-container.d/l4t.csv

avi24 · September 18, 2023, 2:40pm

Hi @dusty_nv ; thanks for hopping in to answer.

On JetPack 5 (which you are presumably using since your CUDA version is 11.4)

Correct.

the CUDA Toolkit/ect are installed inside the container and not mounted by --runtime nvidia

So the host and the container have their own copies. I always can mount them (e.g. with docker run -v or similar for other container runtimes), but Nvidia Container Toolkit no longer takes ownership of mounting them in. Is that correct?

But doesn’t the host still need them? Without them, we miss the .so libraries needed for libnvidia-container, etc.?

dusty_nv · September 18, 2023, 2:52pm

That is correct - if you check l4t.csv, there are no files from /usr/local/cuda listed in there. There are however lower-level GPU drivers that get mounted from /usr/lib/aarch64-linux-gnu/tegra/

Your issue with nvidia-container-cli and libnvidia-ml.so aside, you should only need CUDA Toolkit on the device if you need to use it outside container (like for compiling code with NVCC/ect). FWIW, I haven’t used nvidia-container-cli and just stick with docker run --runtime nvidia

avi24 · September 18, 2023, 2:57pm

That was my issue. nvidia-container-cli doesn’t have it linked in, but actually searches for libnvidia-ml.so.1. I found it in the source defined here and loaded here. dlopen; no idea why.

avi24 · October 2, 2023, 9:31am

I just reread what you wrote 2 weeks ago @dusty_nv:

FWIW, I haven’t used nvidia-container-cli and just stick with docker run --runtime nvidia

I need to refresh my memory, but doesn’t docker run --runtime nvidia actually execute the thin runc wrapper, which calls libnvidia-container, which then has the dependency on those libraries anyways? See the architecture doc here.

Or is it possible that libnvidia-container does not actually depend upon it, and just that nvidia-container-cli does, so maybe I can bypass it? I will be working via containerd, and not docker, but the idea is the same.

I guess we can try and see what it depends upon.

avi24 · October 2, 2023, 10:02am

Aha! You are right! It does work! I am not sure why nvidia-container-cli does that runtime dependency, but I am able to bypass it, and libnvidia-container (along with /usr/bin/nvidia-container-runtime (the runc wrapper) do not.

Still much to figure out with CDI, but getting there.

avi24 · October 2, 2023, 11:20am

@dusty_nv wrote:

the CUDA Toolkit/ect are installed inside the container and not mounted by --runtime nvidia. However there are still some drivers that get mounted that you can find in /etc/nvidia-container-runtime/host-files-for-container.d/l4t.csv

I see a lot of devices and firmware directories, those all make sense.

What about the libraries, almost all under /usr/lib/aarch64-linux-gnu/? Why would those be mounted in? Shouldn’t those be part of the container filesystem?

dusty_nv · October 2, 2023, 2:34pm

Those are typically lower-level drivers that are tied to the JetPack-L4T version, so they aren’t installed into the container themselves since the aim is to have JetPack 5 containers be more portable across JetPack versions.

avi24 · October 2, 2023, 6:30pm

Does that mean that I’m getting those because I’m running an older version of jetpack? Or of the container (cannot be that, since CDI yaml is created before any container is run)? Or the executable nvidia-ctk?

dusty_nv · October 3, 2023, 12:20am

Sorry Avi, I don’t personally have experience using other docker runtimes/ect and typically stick to the default --runtime nvidia to maintain compatibility. For more in-depth knowledge about the nvidia container runtime, you might want to file an issue against the libnvidia-container github. Happy to help otherwise though.

avi24 · October 3, 2023, 2:25pm

Yeah, maybe I will. Thanks @dusty_nv

system · October 17, 2023, 2:26pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory CUDA Setup and Installation	1	1529	April 5, 2024
Cuda directory inside container doesn't contain enough libraries to import torch Jetson AGX Xavier tensorrt , cuda , pytorch	4	719	June 16, 2023
Missing library libnvbuf_utils.so.1.0.0 when running docker container Jetson AGX Xavier docker	4	2712	October 18, 2021
Cuda library is not found in jetson-containers docker Jetson Xavier NX cuda , docker	8	2217	February 1, 2023
What does the CDI config via nvidia-ctk, and why does it mount so libs in the container? Jetson Xavier NX containers	19	1951	November 22, 2023
How to find the library packages for specific board needed to mount into containers? Jetson Xavier NX containers	21	1151	January 22, 2024
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system Jetson AGX Orin jetson-inference , docker	7	747	January 23, 2025
JetPack 6.0: Cannot run a base CUDA Docker container Jetson AGX Orin cuda , docker	5	109	February 5, 2025
Combine CUDA container with Xilinx container Container: CUDA cuda , containers , ai	0	975	July 6, 2021
Error while loading shared libraries: libnvdla_compiler.so Jetson Orin NX cudnn	11	3426	January 26, 2024

Libnvidia-ml location and source?

Related topics