How to find the library packages for specific board needed to mount into containers?

This is a continuation of this issue; I had to start a new one because that one was locked after no answers for too long, and @prlawrence kindly jumped in, and asked we continue in the forum.

@prlawrence wrote:

Until JetPack 6, the main answer we’ve had for custom Linux distro is Yocto – we and Jetson partners have put effort into enabling Yocto support for Jetson. Other than that, NVIDIA simply offers a reference filesystem based on Ubuntu.

Neither of those is great if you have a custom OS build you are using in an enterprise.

The Jetson Linux page has the files available for download, e.g., the BSP package which includes conf files for the various combinations of Jetson reference carrier board + Jetson module.

I downloaded the BSP. The conf files (which we have been using) mainly reference xml files, e.g. here is p3509-0000+p3668-0001-qspi-emmc.conf, which matches my board:

source "${LDK_DIR}/p3668.conf.common";
EMMC_CFG=flash_l4t_t194_spi_emmc_p3668.xml;
DTB_FILE=tegra194-p3668-0001-p3509-0000.dtb;
EMMCSIZE=17179869184;
RECROOTFSSIZE=100MiB;

# Rootfs A/B:
if [[ "${ROOTFS_AB}" == 1 && "${ROOTFS_ENC}" == "" ]]; then
        rootfs_ab=1;
        EMMC_CFG=flash_l4t_t194_spi_emmc_p3668_rootfs_ab.xml;
# Disk encryption support:
elif [[ "${ROOTFS_AB}" == "" && "${ROOTFS_ENC}" == 1 ]]; then
        disk_enc_enable=1;
        EMMC_CFG=flash_l4t_t194_spi_emmc_p3668_enc_rfs.xml;
# Rootfs A/B + Disk encryption support:
elif [[ "${ROOTFS_AB}" == 1 && "${ROOTFS_ENC}" == 1 ]]; then
        rootfs_ab=1;
        disk_enc_enable=1;
        EMMC_CFG=flash_l4t_t194_spi_emmc_p3668_enc_rootfs_ab.xml;
fi;

If I look at that xml, a lot there, but I don’t see anything about, “here are the libraries to install,” or even, “for this board, here is the repo and deb packages to install.” If I had that, I probably could map them, download the package, extract the files, and then make them available to mount inside containers.

JetPack 6 will enable customers and partners to bring their own kernel and will better enable custom distro support. We will continue to provide an Ubuntu-based reference filesystem and Debian packages. If you need to use a custom distro, part of the task will be as you said above, getting specific files from our reference approved for repackaging and adding to your “blessed golden image.”

Yeah, I am really looking forward to seeing what I can do with that. If that helps, all the better.

BTW, I did look in the BSP binary package you linked to above, I see 45 .deb files, within which there are 277 files or directories at or under /usr/lib/aarch64-linux-gnu/, which is where it looks like all of the mounts in the CDI file come from.

Of those, 228 are .so files (or symlinks to them, I didn’t go to that layer of resolution).

I guess first question is, is that all of the files? Is there anything more specific added later?

If so, second question is, I had thought these need to be tightly tied to the specific drivers and kernel, yet these are just in “Jetson Linux 35.4.1”, with a single BSP download for “Orin and Xavier modules and developer kits”. So are there not specific variants per board or per kernel+driver?

Hi,

1.
Please find below the file for the list:
/etc/nvidia-container-runtime/host-files-for-container.d/l4t.csv

2.
rel 35.4.1 BSP supports both Orin and Xavier so the files are identical.

Thanks.

Hi @AastaLLL

I see that file is in nv_tegra/l4t_deb_packages/nvidia-l4t-init_35.4.1-20230801124926_arm64.deb. Extracting that file, the contents show:

  • 40 devices from /dev/
  • 3 directories, all /lib/firmware/tegra*
  • 37 symlinks, almost all in /usr/lib/aarch64-linux-gnu/, with one /usr/share/glvnd/egl_vendor.d/10_nvidia.json and one /etc/vulkan/icd.d/nvidia_icd.json
  • 219 libraries, almost all /usr/lib/aarch64-linux-gnu/, 3 in /lib/firmware/tegra*/, and 1 in /etc/vulkansc/

I think you are saying, if you find all of these various files in the various .deb packages and make them available to the containers (or in the right place so that CDI can find them, etc.), then everything should work?

How interesting. So these aren’t tied to the specific hardware or driver versions? Then why mount them at all? Why not have them part of the container?

Hi,

1.
Yes, the .deb files listed in l4t.csv are the mounting library list.

2.
Sorry that the comment might not be clear enough.
Orin and Xavier use the same BSP branch but the driver might be different according to the build command.

Are you finding the information link below?
https://docs.nvidia.com/jetson/archives/r35.4.1/DeveloperGuide/text/AT/JetsonLinuxDevelopmentTools/DebuggingOnJetsonPlatforms.html?highlight=specific#device-specific-features-and-limitations

Thanks.

Good morning (or afternoon).

Excellent. So we can list the contents of the deb files, find l4t.csv, find the files listed there, extract them. This is not quite as good as having a machine-accessible online download, but we should be able to work with it.

I don’t understand what that means. Assuming I have several different models - Orin Nano, Orin NX, Orin AGX, Xavier NX, Xavier AGX, Nano - it sounds like you are saying they all use the same BSP? And therefore the same CUDA libraries (which are mostly what is mounted into the container)?

I am not sure what “driver might be different according to the build command” means.

I found it. I do not understand what it has to do with the conversation.

Thanks

Hey @AastaLLL and @prlawrence ; I think we are pretty close here. Just those above questions.

Hi,

Sorry for the late update.

The same BSP is used for different modules but the building configuration is different.
From SDKmanager, it will automatically download the “Linux_for_Tegra” of the selected target.

There might be some identical items but we don’t have file-level separation.
This means we only provide the L4T package built for each target’s configuration.
We don’t have an option to distinguish the shared/separate file under the “Linux_for_Tegra” folder.

Thanks.

Hi @AastaLLL ; thanks for the update.

I don’t understand how to reconcile this with the earlier statement. There is just one L4T distribution, under the BSP page. Under that, there is just one l4t.csv in the .deb file I referenced earlier, that lists all of the things we will need to be mounted into containers.

When you write, “building configuration is different”, how is it different? What is the path it uses to determine which files are different? I see the .conf for the specific board, which references the .xml for the specific board. Is that how it determines? If so, how do I look at that file and figure out which of the files in l4t.csv (and which .deb package files they are in) are for which board?

The SDK manager clearly must, if it follows the process you describe above. What is the logic for doing so?

@prlawrence wrote:

Can I get some insight into how that will work in JP6?

Hi,

The answer is for “So these aren’t tied to the specific hardware or driver versions”.
If you just want to know the user space deb, the package is identical.

Thanks.

Yes, that is quite helpful, thank you.

So the reason we mount the userspace libraries in - as opposed to having them in the container - isn’t that they might be different from, say, a Xavier to an Orin. But that the same container might be run on a server with a discrete H100 or a Xavier, and the libraries might be different between those two. Is that it?

Also, where can I get information on what is planned in this area for JP6?

@AastaLLL it occurs to me that I didn’t understand a part of this.

l4t.csv contains the list of files for the container. But nvidia-ctk cdi generates the file /etc/cdi/nvidia.yaml. What is the relationship between this? Does nvidia-ctk just read l4t.csv to know what to put in the CDI file?

Hi,

So the reason we mount the userspace libraries in - as opposed to having them in the container - isn’t that they might be different from, say, a Xavier to an Orin. But that the same container might be run on a server with a discrete H100 or a Xavier, and the libraries might be different between those two. Is that it?

The answer above is for iGPUs which all use L4T.
dGPU doesn’t use L4T so the behavior will be different.

We need to check with our internal team for the CDI problem.
Will update more info with you later.

Thanks.

Yes, that is fine. I am looking to solve solely for the Jetsons (i.e. Tegra iGPUs). If they all are the same (from a userspace library perspective), then that part is solved.

I will await your response on the CDI part. Thanks again.

The l4t.csv, devices.csv, and drivers.csv file define the list of optional entities that are required in a container. When running nvidia-ctk cdi generate with --mode=csv (which is auto-detected on Tegra-based systems with no discrete GPU available), the nvidia-ctk tooling parses the l4t.csv, devices.csv and drivers.csv files at /etc/nvidia-container-runtime/host-files-for-container.d and uses these as input for generating a CDI specification. Note that since the csv files contain entries for multiple platforms, these are considered optional and only entries that are located are included in the CDI specification.

This means that once generated, the CDI specification will contain the CDI representations of the CSV file entries that are present on the host.

Note that the nvidia-ctk cdi generate command does allow for CSV files to be explicitly specified as command line arguments. If this is the case, then the specified files are used instead of the default paths. This means that you could collect the required files in an alternate location and construct modified CSV files to use as inputs.

One current limitation of the tooling is that the same path is used on the host as in the container, and manual adjustments are required to the CDI specification if the container paths need to be updated. We are working on streamlining this by providing the relevant config options or additional tooling.

This really does explain, thank you. Since nvidia-ctk is open source, do you mind linking to the PR with the planned changes? I found the --library-search-path, --nvidia-ctk-path and --csv.dile options; what else is planned?

Unfortunately we don’t have a publicly visible roadmap. One of the ease-of-use features that I can think of is Add transformer for container roots (!507) · Merge requests · nvidia / container-toolkit / container-toolkit · GitLab which should allow the roots of container paths to be transformed using the nvidia-ctk CLI.

In any case, as far as I can tell from reading the nvidia-ctk source code, these all are executed relative to the root of the container, not the host, so unless the paths are different when mounted in, it should work, right?

Sorry for the delay in responding here.

What are you referring to when you ask:

these all are executed relative to the root of the container