Possible bug in NVIDIA's L4T Debian packages for CUDA

There is a peculiarity with the Debian packages included with the the L4T r32.7.2 release:

The nvidia-l4t-cuda package installs libraries into /usr/lib/aarch64-linux-gnu/tegra but the nvidia-l4t-core package configures the shared library search path to be /usr/lib/tegra, a directory that does not exist in the sample root filesystem.

Is there a mistake in the nvidia-tegra.conf file?

Because of this, the symlink libcuda.so.1 -> libcuda.so.1.1 does not get created by ldconfig. Several tools seem to expect libcuda.so.1, including the NVIDIA Container Toolkit and deviceQuery if I build it from the CUDA Toolkit 10.2 samples.


The Debian Policy Manual says,

The run-time library package should include the symbolic link for the SONAME that ldconfig would create for the shared libraries. For example, the libgdbm3 package should include a symbolic link from /usr/lib/libgdbm.so.3 to libgdbm.so.3.0.0.

I’m not an expert on Debian packaging but it seems that the nvidia-l4t-cuda package doesn’t follow this rule; the SONAME of the library is libcuda.so.1 but the package does not contain a file by that name.

Hi,

Both libcuda.so.1 and libcuda.so.1.1 are presented in our environment.
Could you please double-check your environment or setting?

$ ll /usr/lib/aarch64-linux-gnu/tegra/libcuda.so*
lrwxrwxrwx 1 root root       14 Feb 19  2022 /usr/lib/aarch64-linux-gnu/tegra/libcuda.so -> libcuda.so.1.1
lrwxrwxrwx 1 root root       14 Dec 10 09:15 /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 -> libcuda.so.1.1
-rw-r--r-- 1 root root 15870624 Feb 19  2022 /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1.1

Thanks.

Thank you for your reply.

I assume that programs like deviceQuery are working for you? Could you also please include the output of this command? It tells me what your library search paths are and which packages they came from.

for F in /etc/ld.so.conf /etc/ld.so.conf.d/*.conf; do
    dpkg -S $F; cat $F; echo
done

Your system may be able to pick up the libcuda.so.1 inside /usr/lib/aarch64-linux-gnu/tegra but I wonder how it is able to find this file, especially depending on what your /etc/ld.so.conf.d/nvidia-tegra.conf says.


Note that in your output, the libcuda.so.1 file has a modification date of Dec 10 while the other files are from Feb 19. Therefore I believe your file was created after, and not installed with the package, contrary to what the Debian Policy Manual recommends.

You can confirm this with the following command:

dpkg -S /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1

By the way, in the r35.1 sample rootfs I see that /etc/ld.so.conf.d/nvidia-tegra.conf now says /usr/lib/aarch64-linux-gnu/tegra. So it seems more likely that the file in r32.7.2 is just incorrect.

Are you saying only the “/usr/lib/aarch64-linux-gnu/tegra” is showing up, and not “/usr/lib/aarch64-linux-gnu”? What do you see from this:
ld --verbose | grep SEARCH_DIR | tr -s ' ;' \\012

No, I’m saying that (in addition to the distro defaults) only /usr/lib/tegra shows up, because that is literally the contents of the nvidia-l4t-core package’s ld.so.conf.d/nvidia-tegra.conf file. And this is wrong.

$ mkdir /tmp/contents
$ dpkg-deb -R Linux_for_Tegra_r32.7.2/nv_tegra/l4t_deb_packages/nvidia-l4t-core_32.7.2-20220417024839_arm64.deb /tmp/contents
$ cat /tmp/contents/etc/ld.so.conf.d/nvidia-tegra.conf
/usr/lib/tegra

That directory is incorrect because none of the NVIDIA libraries are installed there. The directory doesn’t get created by any of the r32.7.2 packages.

And furthermore, the nvidia-tegra.conf file from r35.1 contains the expected path of /usr/lib/aarch64-linux-gnu/tegra. There must have been a reason they changed it.


Regarding your command, I’ve patched my nvidia-tegra.conf so it’s not the same as a vanilla system, but I see the following. Curiously there’s no tegra directory?

$ ld --verbose | grep SEARCH_DIR | tr -s ' ;' \\012
SEARCH_DIR("=/usr/local/lib/aarch64-linux-gnu")
SEARCH_DIR("=/lib/aarch64-linux-gnu")
SEARCH_DIR("=/usr/lib/aarch64-linux-gnu")
SEARCH_DIR("=/usr/local/lib")
SEARCH_DIR("=/lib")
SEARCH_DIR("=/usr/lib")
SEARCH_DIR("=/usr/aarch64-linux-gnu/lib")

This is quite long, and not what I expected. You might not want to go down that rabbit hole unless you are fixing bugs related to this. TLDR: Linking and searching seems to be doing as expected, but printing of default search paths via ld looks like it has a bug (a “normally” cosmetic only issue).

A short answer is that ldconfig creates “/etc/ld.so.cache” after consulting search paths such as that from “/etc/ld.so.conf.d/nvidia-tegra.conf”. This cache file is what the linker actually uses to find content. The cache file is generated correctly, but the option to print the linker’s path (as found in ld.so.cache) appears to have a bug and simply does not print the path.

I think the linker is important enough that it might be worth NVIDIA’s time to find out why it uses “/usr/lib/aarch64-linux-gnu/tegra/*”, but fails print that this path is actively searched (other software could use this as a search method and end up with bugs; since other paths are not effected, it might be that the bug is only important to people who put something in a path the linker does not print…implying NVIDIA might find it useful to understand the lack of printing of their tegra/ in the path, although since the path is searched, it might not be a high priority).

Now on to the long tale of printing the missing linker path.


The ld search path from nvidia-tegra.conf would only append. This might be wrong, but I see in the search path does include the base:

  • /usr/lib/aarch64-linux-gnu

Since you’ve looked at “/etc/ld.so.conf.d/nvidia-tegra.conf” and found “/usr/lib/aarch64-linux-gnu/tegra” I too would expect to see this in the search path. However, take a look at some of the library .so files in “/usr/lib/aarch64-linux-gnu/tegra”. I looked at this on an Xavier NX, found the default search path lacking just as you did, verified this is listed in the nvidia-tegra.conf for ld, and then listed all libraries found via “ldconfig -p” (and grep’d for that library; my example was “libnvidia-glcore”; I did not check all libraries there are found, but a sample found all that I looked for). Oddly, I found the libraries of that directory are still found by ldconfig -p. So there are two places left which I know of to check.

One is that ld will consult libnamespec.so, followed by libnamespec.a, but these don’t exist on most systems (including Jetsons). On the other hand, there is one last place this can be found: In the environment variable “LD_LIBRARY_PATH”, or otherwise passed in the environment. I also don’t see this. I have no idea how “ldconfig -p” is listing these libraries from “/usr/lib/aarch64-linux-gnu/tegra”. Odd.

So I followed all of the symbolic links of ld until I found the actual hard link (on an NX with an R32.x L4T):
/usr/bin/aarch64-linux-gnu-ld.bfd

The above is provided by Ubuntu and is not from NVIDIA. I had wondered if perhaps this was from NVIDIA and modified for the tegra/ search location, but it is not. Confusing mystery.

I did find that the actual location which causes ld to search there is from “/etc/ld.so.cache”. This in turn is generated when “ldconfig” is run (which runs at boot or after a package installs a library). I ran strace (which monitors system calls to the kernel) and found that indeed ldconfig does consult “/etc/ld.so.conf.d/nvidia-tegra.conf”:
openat(AT_FDCWD, "/etc/ld.so.conf.d/nvidia-tegra.conf", O_RDONLY) = 4

This lead to finding “/usr/lib/aarch64-linux-gnu/tegra”:

newfstatat(AT_FDCWD, "/usr/lib/aarch64-linux-gnu/tegra", {st_mode=S_IFDIR|0755, st_size=12288, ...}, 0) = 0

At this point it is obvious that ld is in fact doing what it should because ld.so.cache is correct, and the cache is correct because ldconfig created it after reading “/etc/ld.so.conf.d/*”.

I ran strace on “ld --verbose”, and indeed it reads from ld.so.cache. The tegra/ location never shows up in the trace file. Since the cache has that information it seems to be a bug of ld, at least a bug of the --verbose option. The tegra/ directory is indeed being found and cached and linked to, but the print of search locations fails to see this, while the printing and actual linking of default libraries found succeeds (the search path listing fails, individual contents do print).

Thanks for the thorough investigation.

Point of clarification: ld is the GNU linker that links object files into executables or libraries at compile time, versus ld.so which is the (Linux?) dynamic linker that loads libraries at run time. To my knowledge ldconfig is only related to the dynamic linker. (You say ld reads ld.so.cache but is that really ld reading it or is that ld.so reading it as it loads ld?) So the peculiarity of what ld --verbose outputs in this instance is maybe a red herring and not specifically what I’m calling out in this thread.

I am most interested in this part of your comment:

Since you’ve looked at /etc/ld.so.conf.d/nvidia-tegra.conf and found /usr/lib/aarch64-linux-gnu/tegra I too would expect to see this in the search path.

Is that what you see in nvidia-tegra.conf? Because mine contains literally the string /usr/lib/tegra which is what I’m trying to point out in this thread here. I don’t think that string is correct. If your file says something else, I’m curious what version of the nvidia-l4t-core package you have installed.

Yes, that is a bug in Jetpack 4.

Keep in mind this was from an Xavier NX and not AGX. Plus this was R32.4, and so there might be differences, but yes, my “/etc/ld.so.conf.d/nvidia-tegra.conf” says “/usr/lib/aarch64-linux-gnu/tegra”. If you want this directory, and for some reason need a workaround which won’t go away upon package update, then you could manually create this file:
/etc/ld.so.conf.d/custom-tegra.conf

Inside that file you would have this content:

/usr/lib/aarch64-linux-gnu/tegra

FYI, you can have multiple directories listed in one file with each directory on one line. Be careful though, you don’t want to add a directory which cannot be examined (you might end up flashing again).

The “acid test” is to first run “ldconfig -p” and see if your library is found. Just as an example, you might search for libEGL_nvidia:
ldconfig -p | grep 'libEGL_nvidia
(find a library at the location which you believe to be missing, and then check it with “ldconfig -p”)

If that is indeed a problem, then it is something NVIDIA could add to the package which provides that file to make sure the “/etc/ld.so.conf.d/” is searching for that directory. Or an empty directory with that path search could just be added and an ld.so.conf.d/ file could be added without asking if any packages have a library there (and the libraries going there could have the package creating the directory and conf file as a dependency).

Almost forgot: ld finds and links dynamic libraries for the system. ld.so is a library and not an executable. This has the same function as ld, but it is for use in programs, e.g., written in C. For example, the program “ld” itself is not static by default (it could be; a cross-linker tends to be static, but it isn’t a requirement). The command “ldd <program>” shows the linkage of an executable. Run this command:

# ldd /usr/bin/ld
        linux-vdso.so.1 (0x0000007f7c501000)
        libbfd-2.30-system.so => /usr/lib/aarch64-linux-gnu/libbfd-2.30-system.so (0x0000007f7c1ee000)
        libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000007f7c1d9000)
        libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007f7c080000)
        /lib/ld-linux-aarch64.so.1 (0x0000007f7c4d5000)
        libz.so.1 => /lib/aarch64-linux-gnu/libz.so.1 (0x0000007f7c053000)

Note that “ldconfig” (the generator of the cache) is itself not dynamically linked:

# ldd `which ldconfig`
        not a dynamic executable

I don’t know of specific programs linked to ld.so, but if you ever wonder, then run ldd against that program (or the interpreter executing a script).

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.