What does the CDI config via nvidia-ctk, and why does it mount so libs in the container?

I am trying to understand what the Nvidia CDI hooks do, as configured by nvidia-ctk cdi generate.

Having analyzed the generated nvidia.yaml, it mainly does the following:

  • mounts lots of devices - makes sense
  • mounts lots of library paths
  • creates lots of symlinks

First, since the approach from more recent versions of the container images is not to mount lots of libs (like cuda) in, but rather to have them inside the container (much more aligned with the container philosophy), why is it finding all of these hundreds of libraries on the host OS and mounting them in, as opposed to having them available as part of the container image?

For that matter, how can it even know that they exist? There is no guarantee that the base OS will be 100% jetpack.

Second, are all of the links sections just adding various links needed inside the container to many of those mounted libraries? Coming back to the “container philosophy”, why would those not already exist in the container image?

Hi,

Do you have the list of the mounted libraries?

Suppose these are some hardware-dependent libraries rather than the user space libs like CUDA.
These libraries are included in the OS (ex. r35.4.1) and need to be mounted to ensure functionality.

Thanks.

Sure. I can make a generated CDI file from a Xavier NX public. I just ran nvidia-ctk cdi generate and the results are from the nvidia.yaml file.

It is 3367 lines long, so I doubt I can paste it directly here, but I can put it in a gist.
See this link.

The particularly interesting section for mounts - besides devices, of course - is from lines 149 to 1684, or this link.

I just read the libnvidia-container architecture doc in depth. It explains some of the dependencies better, but now I am confused as to how the requirements described work with CDI.

Hi,

The libraries under /usr/lib/aarch64-linux-gnu/tegra/ are Jetson specific libraries.
For example, libnvdla_compiler.so is related to the DLA hardware.

Thanks.

Meaning they are Jetson in general? Or specific to this particular version of the hardware (in this case a specific version of a Xavier NX board)?

Hi,

The library needs to be compatible with the OS (e.g. r35.4).
Some libraries are also different cross platforms.

Thanks.

Is it compatible with the OS, or compatible with the specific GPU driver? I read the architecture doc, which implied the latter, but I am unclear.

Either way, I think you are saying, “if you want to use a Jetson, you need the device (obviously), the correct kernel and driver for the specific GPU on the device, and very specific versions of user-space libraries that match that specific GPU and specific kernel drivers.” Is that correct? If so, is there a maintained mapping somewhere that says, “device → kernel driver → userspace libs” for each one?

In development, it is somewhat reasonable to assume everything will just use the ready-to-run Jetpack; in production, systems generally are very tightly controlled and not running these full-blown OSes and packages. We need some way to get the right elements on there.

Hi,

For L4T, the GPU driver and some hardware-related driver is integrated into the OS.
These need to be compatible (from the same L4T branch) so mounting is a good way to ensure this.

But user space libraries, like CUDA, and cuDNN, might not have such constraints.
Thanks.

is integrated into the OS.
These need to be compatible (from the same L4T branch) so mounting is a good way to ensure this

That makes sense. But what do you do when you aren’t using the full-blown JetPack distribution? Sure, it is great in development, but in secure production, lots of places are going to use their own hardened and custom-built OS. How do I get the right libraries for the specific kernel version, driver version, and hardware?

user space libraries, like CUDA, and cuDNN, might not have such constraints

That part wasn’t so clear to me, maybe even different, based on the architecture doc.

Hi,

We do provide L4T RootFileSystem so users can build their custom OS:
https://docs.nvidia.com/jetson/archives/r35.4.1/DeveloperGuide/text/SD/RootFileSystem.html

When you set up the device, there are two steps: “flash” and “install components”.
The package installed at the “install components” stage is the user space package.

Thanks.

Hi, thanks for the answer.

Using a pre-existing root filesystem based on Ubuntu is not “custom OS”; it is just Ubuntu with some package changes. What do you do when an enterprise has standardized on a specific build of ArchLinux or Alpine or RHEL that allows no customized filesystems? You absolutely must use that filesystem+kernel (i.e. operating system) when deploying anything in production? Sure, there are ways to get specific files approved for adding to the “blessed golden image”, but an entire rootfs generated by some outside script? Not a chance.

Let’s address that scenario: I am an enterprise, I have my own custom OS build that will never be replaced by your root filesystem or kernel. I can install certain binaries and kernel drivers and userspace libraries, but that is it.

  • drivers are covered (at least as far as I can tell)
  • userspace binaries are covered (all are OSS on GitHub, I managed to build them)
  • userspace libraries: unsuccessful.

For this to work, you need to make available either the source to userspace libraries that are required at the host level, or such libraries as standalone binaries with clear lists of dependencies and when/where they work.

Hi,

Unfortunately, we now only support such use cases.
Below is an example from Yocto for your reference:

Thanks.

I am a little lost. Why the yocto reference?

Hi,

We currently only support Linux-based custom OS.
Yocto is a good sample to build with our rootfs.

Thanks.

The rootfs provided is Ubuntu-based, which is but a single distribution (“distro”) of the hundreds (thousands?) of Linux-based. Is it more correct to state, “we currently only support Ubuntu Linux-based custom OS”?

Hi,
You are correct that minimum rootfs is based on Ubuntu. You may build your rootfs based on it. Or may check with the contributor for Yocto rootfs.

Or may check with the contributor for Yocto rootfs.

Do you mind expanding further on this? What do you mean?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

avi24,
Your question seems to have become more general over the course of this forum issue: how can one use an alternate distro with Jetson?

Until JetPack 6, the main answer we’ve had for custom Linux distro is Yocto – we and Jetson partners have put effort into enabling Yocto support for Jetson. Other than that, NVIDIA simply offers a reference filesystem based on Ubuntu. The Jetson Linux page has the files available for download, e.g., the BSP package which includes conf files for the various combinations of Jetson reference carrier board + Jetson module.

JetPack 6 will enable customers and partners to bring their own kernel and will better enable custom distro support. We will continue to provide an Ubuntu-based reference filesystem and Debian packages. If you need to use a custom distro, part of the task will be as you said above, getting specific files from our reference approved for repackaging and adding to your “blessed golden image.”

Digging into Yocto should be instructive, and of course JetPack 6 Developer Preview is expected at the end of the month. Please open a new forum issue anytime with a specific Question about whatever issue crops up.