I am trying to understand what the Nvidia CDI hooks do, as configured by nvidia-ctk cdi generate.
Having analyzed the generated nvidia.yaml, it mainly does the following:
mounts lots of devices - makes sense
mounts lots of library paths
creates lots of symlinks
First, since the approach from more recent versions of the container images is not to mount lots of libs (like cuda) in, but rather to have them inside the container (much more aligned with the container philosophy), why is it finding all of these hundreds of libraries on the host OS and mounting them in, as opposed to having them available as part of the container image?
For that matter, how can it even know that they exist? There is no guarantee that the base OS will be 100% jetpack.
Second, are all of the links sections just adding various links needed inside the container to many of those mounted libraries? Coming back to the “container philosophy”, why would those not already exist in the container image?
Suppose these are some hardware-dependent libraries rather than the user space libs like CUDA.
These libraries are included in the OS (ex. r35.4.1) and need to be mounted to ensure functionality.
I just read the libnvidia-container architecture doc in depth. It explains some of the dependencies better, but now I am confused as to how the requirements described work with CDI.
Is it compatible with the OS, or compatible with the specific GPU driver? I read the architecture doc, which implied the latter, but I am unclear.
Either way, I think you are saying, “if you want to use a Jetson, you need the device (obviously), the correct kernel and driver for the specific GPU on the device, and very specific versions of user-space libraries that match that specific GPU and specific kernel drivers.” Is that correct? If so, is there a maintained mapping somewhere that says, “device → kernel driver → userspace libs” for each one?
In development, it is somewhat reasonable to assume everything will just use the ready-to-run Jetpack; in production, systems generally are very tightly controlled and not running these full-blown OSes and packages. We need some way to get the right elements on there.
For L4T, the GPU driver and some hardware-related driver is integrated into the OS.
These need to be compatible (from the same L4T branch) so mounting is a good way to ensure this.
But user space libraries, like CUDA, and cuDNN, might not have such constraints.
Thanks.
is integrated into the OS.
These need to be compatible (from the same L4T branch) so mounting is a good way to ensure this
That makes sense. But what do you do when you aren’t using the full-blown JetPack distribution? Sure, it is great in development, but in secure production, lots of places are going to use their own hardened and custom-built OS. How do I get the right libraries for the specific kernel version, driver version, and hardware?
user space libraries, like CUDA, and cuDNN, might not have such constraints
That part wasn’t so clear to me, maybe even different, based on the architecture doc.
When you set up the device, there are two steps: “flash” and “install components”.
The package installed at the “install components” stage is the user space package.
Using a pre-existing root filesystem based on Ubuntu is not “custom OS”; it is just Ubuntu with some package changes. What do you do when an enterprise has standardized on a specific build of ArchLinux or Alpine or RHEL that allows no customized filesystems? You absolutely must use that filesystem+kernel (i.e. operating system) when deploying anything in production? Sure, there are ways to get specific files approved for adding to the “blessed golden image”, but an entire rootfs generated by some outside script? Not a chance.
Let’s address that scenario: I am an enterprise, I have my own custom OS build that will never be replaced by your root filesystem or kernel. I can install certain binaries and kernel drivers and userspace libraries, but that is it.
drivers are covered (at least as far as I can tell)
userspace binaries are covered (all are OSS on GitHub, I managed to build them)
userspace libraries: unsuccessful.
For this to work, you need to make available either the source to userspace libraries that are required at the host level, or such libraries as standalone binaries with clear lists of dependencies and when/where they work.
The rootfs provided is Ubuntu-based, which is but a single distribution (“distro”) of the hundreds (thousands?) of Linux-based. Is it more correct to state, “we currently only support Ubuntu Linux-based custom OS”?
avi24,
Your question seems to have become more general over the course of this forum issue: how can one use an alternate distro with Jetson?
Until JetPack 6, the main answer we’ve had for custom Linux distro is Yocto – we and Jetson partners have put effort into enabling Yocto support for Jetson. Other than that, NVIDIA simply offers a reference filesystem based on Ubuntu. The Jetson Linux page has the files available for download, e.g., the BSP package which includes conf files for the various combinations of Jetson reference carrier board + Jetson module.
JetPack 6 will enable customers and partners to bring their own kernel and will better enable custom distro support. We will continue to provide an Ubuntu-based reference filesystem and Debian packages. If you need to use a custom distro, part of the task will be as you said above, getting specific files from our reference approved for repackaging and adding to your “blessed golden image.”
Digging into Yocto should be instructive, and of course JetPack 6 Developer Preview is expected at the end of the month. Please open a new forum issue anytime with a specific Question about whatever issue crops up.