Libnvidia-ml location and source?

I have been trying to compile libnvidia-container from source (with some success) on multiple platforms, including musl-based.

When I actually try to run nvidia-container-cli, it runs, but if I try to do an action, like nvidia-container-cli list, I get an error finding the

$ ./nvidia-container-cli list
nvidia-container-cli: initialization error: load library failed: Error loading shared library no such file or directory

Looking at the library on a normal Jetpack, I see that the library is at the following path, installed by the following deb pkg:

$ find /usr -name 'libnvidia-ml*'
$ dpkg -S /usr/local/cuda-11.4/targets/aarch64-linux/lib/stubs/
cuda-nvml-dev-11-4: /usr/local/cuda-11.4/targets/aarch64-linux/lib/stubs/
  1. Is the source for that available anywhere? I want to get it installed on OSes that do not use rpm/yum/zypper, including musl-based (no glibc).
  2. Why is the search coded into the app itself, rather than using the normal linker and LD_LIBRARY_PATH or similar to find it?



It looks like the library is included in the CUDA package.
Here is the OTA download link for your reference: Index



Those are Debian packages, not binary files. It always is possible to extract them, but that gets messy.

They also won’t work on non-glibc-based systems.

Is the source available?

Also, I find it interesting. In order to run containers with access to GPUs, I need the CUDA libraries both outside the container to start it, and inside the container to run CUDA apps?

@avi24 on JetPack 4, yes you need CUDA/cuDNN/TensorRT installed on your device, and they will be mounted into container by --runtime nvidia. You don’t need CUDA/cuDNN/TensorRT packages installed inside the JetPack 4 containers. The containers should be derived on l4t-base.

On JetPack 5 (which you are presumably using since your CUDA version is 11.4), the CUDA Toolkit/ect are installed inside the container and not mounted by --runtime nvidia. However there are still some drivers that get mounted that you can find in /etc/nvidia-container-runtime/host-files-for-container.d/l4t.csv

Hi @dusty_nv ; thanks for hopping in to answer.

On JetPack 5 (which you are presumably using since your CUDA version is 11.4)


the CUDA Toolkit/ect are installed inside the container and not mounted by --runtime nvidia

So the host and the container have their own copies. I always can mount them (e.g. with docker run -v or similar for other container runtimes), but Nvidia Container Toolkit no longer takes ownership of mounting them in. Is that correct?

But doesn’t the host still need them? Without them, we miss the .so libraries needed for libnvidia-container, etc.?

That is correct - if you check l4t.csv, there are no files from /usr/local/cuda listed in there. There are however lower-level GPU drivers that get mounted from /usr/lib/aarch64-linux-gnu/tegra/

Your issue with nvidia-container-cli and aside, you should only need CUDA Toolkit on the device if you need to use it outside container (like for compiling code with NVCC/ect). FWIW, I haven’t used nvidia-container-cli and just stick with docker run --runtime nvidia

That was my issue. nvidia-container-cli doesn’t have it linked in, but actually searches for I found it in the source defined here and loaded here. dlopen; no idea why.