CUDA toolkit for Jetson Nano

Hi,

We are running our custom built root filesystem and kernel on Jetson Nano for our production hardware. We are at the point of enabling CUDA on the product, but we can’t find Jetson Nano CUDA toolkit download link. Can nVidia please share the download link with us?

Thanks.

1 Like

Hi jasaw81,

You can use SDK Manager to install sdk components.
Before install sdk components, please check your image version first, the image version need match CUDA version.
Example: R32.4.2 (JetPack-4.4) + CUDA 10.2
You can check detail version from L4T Archive.

I have downloaded all the deb files using the SDK Manager, but it’s not exactly clear to me which deb file contains the libraries that I need to install on the Nano (I’m not using nVidia’s development SD image). I’m only interested in cross-compiling openCV with CUDA support on a Ubuntu host. (btw, SDK Manager is not working on my Ubuntu 18.04. I end up lots of deb files, which I can manually extract using “ar” command).

From the cross-compilation instructions, I can see nvcc accepts --compiler-bindir option for selecting the host compiler, but it’s not clear to me which CUDA files/libraries I need to install on the Nano.

Can you please provide some assistance?

Hi,

For CUDA, you can install it with this script:

#!/bin/bash
# $1: cuda debian file name 
# $2: cuda_ver (example 6.0, 6.5)
# $3: cuda_dash_ver (6-0, 6-5)

if [ $# != 3 ] ; then
    echo "Incorrect arguments, use following command to install cuda"
    echo "$0 cuda_deb_file_name 6.5 6-5"
    exit 1
fi

while sudo fuser /var/lib/dpkg/lock > /dev/null 2>&1; do
    echo "Waiting for other apt-get command to finish"
    sleep 3
done

sudo dpkg --force-all -i ~/cuda-l4t/$1
sleep 5
sudo apt-key add /var/cuda-repo-*$3-local*/*.pub
sleep 2

sudo apt-get -y update
#fix bug 2405352 missing gnupg break auto install process
sudo apt install -y gnupg
sudo apt-get -y --allow-downgrades install cuda-toolkit-$3 libgomp1 libfreeimage-dev libopenmpi-dev openmpi-bin

grep -q "export PATH=.*/usr/local/cuda-$2/bin" ~/.bashrc || echo "export PATH=/usr/local/cuda-"$2"/bin:$PATH">>~/.bashrc

if dpkg --print-architecture | grep -q arm64; then
    lib_dir=lib64
else
    lib_dir=lib
fi
grep -q "export LD_LIBRARY_PATH=/usr/local/cuda-$2/$lib_dir" ~/.bashrc || echo "export LD_LIBRARY_PATH=/usr/local/cuda-"$2"/"$lib_dir":$LD_LIBRARY_PATH" >> ~/.bashrc
export LD_LIBRARY_PATH=/usr/local/cuda-$2/$lib_dir:$LD_LIBRARY_PATH

Thanks, that’s a good start. That gives me “x86_64-linux” in /usr/local/cuda/targets directory, but where is the “aarch64” target that I need to install into my custom built Jetson Nano image?

I tried dpkg installing the “arm64” variant of deb files anyway using the --force-all option, but it doesn’t give me aarch64 target files anywhere on my development machine. I would expect “cuda-repo-l4t-10-0-local-10.0.326_1.0-1_arm64.deb” to give me aarch64 libraries.

The aarch64/arm64 won’t install directly onto the PC. There can be an intermediate download of the “.deb” packages, which then get copied to the Jetson. Actual details vary since the older L4T/JetPack/SDK Manager releases always did this through SDKM, whereas newer releases can also do this directly on the Jetson via the “apt” mechanism.

If you have at one point used SDKM to download files for install to the Jetson, then you might want to look at “~/Downloads/nvidia/sdkm_downloads/”. If you are using L4T R32.3.1 or newer (JetPack4.3+), then see this as an example command on the Jetson (example is from R32.4.2/JetPack4.4):

apt search cuda
sudo apt-get install cuda-tools-10-2 cuda-libraries-10-2

I did use the SDKM to download all the L4T R32.3.1 (JetPack4.3) deb files (amd64 and arm64), so I would expect all the aarch64 CUDA libraries to be in the arm64 deb files, since the deb files are huge.

Correct me if I’m wrong, the way I see it, I have 2 options:

Option 1:
apt-get install cuda-toolkit-<version> on Jetson Nano, tar up all the files in /usr/local/cuda and transfer it to my build machine. As part of my build process, I copy the relevant CUDA libraries over to my image that gets loaded onto a Nano.

Option 2:
Use ar command to extract all the arm64 deb files on my build machine, and hope that all the CUDA libraries are in the arm64 deb files. Can nVidia please confirm that this option will work?

Option 3 (maybe ???):
nVidia or someone provides a direct download link to the aarch64 L4T CUDA libraries that I can simply extract and copy into my image, similar to the rest of the L4T tarballs (kernel src, BSP, nvidia drivers, …).

Not an option:
Have my custom built image pulled down from my build machine in the cloud just to run apt-get install cuda-toolkit on a Nano, then extract the image from Nano, upload back into the cloud as a build pipeline.

Question:
Can nVidia please confirm the above 3 options, which option will work, which option is recommended?

1 Like

Hi,

I’m confused about this issue.
To cross-compile an app for Jetson platform, please make sure the following setup:

1. Host:
Install CUDA toolkit from the sdkmanager.
This will make sure your toolkit includes the cross-compiling library.
Since this installation is for host, the package should with the name of amd.

2. Device:
Install CDUA toolkit from the sdkmanager.
Please uncheck the OS part in the STEP02. Install the package with sdkmanager is an easier way.
If you want to apply this manually, please prepare the deb file and run the script on the Nano directly.

Thanks.

1 Like

2. Device:
Install CDUA toolkit from the sdkmanager.
Please uncheck the OS part in the STEP02. Install the package with sdkmanager is an easier way.
If you want to apply this manually, please prepare the deb file and run the script on the Nano directly.

This is the part that is not clear to me, which is why I asked about the 3 options in my previous post.

Let me explain my image build process:

  1. On my x86 machine, install aarch64 Linaro toolchain and x86 CUDA toolkit.
  2. On my x86 machine, I expect to install aarch64 CUDA toolkit/libraries. The aarch64 CUDA libraries will be installed into the image that I’m building at a later step.
  3. On my x86 machine, cross-compile kernel, libraries, applications (including cross-compilation of CUDA applications, linking against aarch64 CUDA libraries).
  4. On my x86 machine, create an empty image and copy all compiled binaries, and aarch64 CUDA libraries into the image.
  5. On my x86 machine, put the bootloader and partition information into the image using nVidia’s BSP tool, etc.
  6. My custom built image is ready to be loaded into Nano modules.

Notice that nowhere in my build process involves a Jetson Nano hardware because all of the above steps happen in the cloud, on x86 machine.

Hope that gives you a bit of background on what I’m trying to achieve here.

So now my question is, how to I achieve step 2?
Install CUDA toolkit on a Jetson Nano like you mentioned, but in addition to your guide, tar up the /usr/local/cuda on the device and transfer it to my x86 machine? Is this the only way to do it?

1 Like

Hi @jasaw81
SDKManager can install all required packages on host machine and target hardware. Is is designed for version control of all packages.
https://docs.nvidia.com/sdk-manager/install-with-sdkm-jetson/index.html

The suggestion is to install CUDA toolkit on your host machine, either Ubuntu 16.04 or 18.04. Manually installing the toolkit is not recommended.

Hi DaneLLL,

As mentioned in my previous post, SDKManager is broken on my host Ubuntu 18.04. It downloads CUDA toolkit but won’t install no matter what I try, which is why I had to install things manually.

SDKManager can install all required packages on host machine and target hardware.

Running SDKManager on the target is the problem here. I am NOT running Ubuntu nor nVidia’s sample image on my target, and my target root filesystem is locked down (not writable) for security purpose. In order to install CUDA toolkit on my target, I had to

  1. install it via SDKManager on nVidia’s sample image.
  2. extract the CUDA toolkit out of the sample image and insert it into my custom image.
  3. send my custom image to factory to burn into the target hardware. Any change to software is not possible after this point.

So now back to my questions:

  1. Is extracting the CUDA libraries out of the sample image the only way to get the aarch64 CUDA libraries?
  2. Extracting the CUDA libraries out of the sample image and inserting it into my custom image does NOT cause any licensing issue or violate user agreement right?
  3. Can nVidia provide a direct download link to aarch64 CUDA libraries?

Hi,

This looks to be an issue in the host machine. You may seeif you can get another host PC to run SDKManager without problem.

For installing CUDA on target, you can copy the deb file like
cuda-repo-l4t-10-0-local-10.0.166_1.0-1_arm64.deb
to your Nano and install it through apt command. And then generate the custom image.

Packages for the wrong architecture will not install on the host PC, and the Jetson’s architecture is wrong for a PC. You have to install the PC version on the PC. Cross compile and cross development is a special case, and does not install in the usual way.

On a PC used for cross development you will have a toolchain which is a cross toolchain…that chain executes on PC architecture, but understands the arm64/aarch64/ARMv8-a code. This is simple for bare metal as there is no linker involved, and no libraries needed to link against. Cross toolchains are often available through the host PC’s native package tools since the PC can run the executable. Linaro also provides some of these.

If you want to link against a library in a cross development environment, then you need a “runtime” linker, and this linker is special. The cross linker (“runtime”), like the cross toolchain, runs on PC architecture, but works with 64-bit arm64/aarch64/ARMv8-a code. The host o/s cannot use the linked code, but it can perform a link. Cross linker runtime tools are often available through the host PC’s native package tools since the PC can run the executable. Linaro also provides some of these.

The code which the cross linker/runtime links against is purely 64-bit arm64/aarch64/ARMv8-a, and the host can never use this. Thus the host will refuse to install this as a normal library to be searched for in its own linker path. This is the “sysroot”, and is basically a skeleton of the libraries and library related content copied directly from the Jetson (or any 64-bit arm64/aarch64/ARMv8-a compatible version).

“Sysroot” libraries make life much more difficult and are where it gets complicated. Linaro and some other providers have available a very bare minimum environment, including libc and perhaps a tiny subset of what you might find on a fully running system. That content is enough to build other content, but we are talking about building everything. Often, to build some basic and simple library which your application uses there will be perhaps up to a hundred other packages you must build first. Libraries often link to other libraries, and so there is this big recursive dependency all the way back to libc. If just one of those touches X11, then you’ll probably end up building many hundreds of side projects just to get to the point where you can cross compile and link your simple program.

However, there is a shortcut around this. You can copy the content from a running Jetson if that content was installed there. You’d want to install the development packages with header files, and not just the libraries. After this you can either recursively copy the right library structures into your host PC within the correct directory for aarch64 to prevent mixing with the PC content, or you can clone the rootfs and loopback mount the clone. One can name the clone or a symbolic link to the clone without much effort, and suddenly you have the entire Jetson library support using the exact version of library the Jetson itself has.

Note that a clone can be mounted read-only, and that this is often not an issue for use in a cross development environment. SDK Manager provides arm64/aarch64 packages which can only install on the Jetson, and virtually nobody will provide those for install onto a PC. This isn’t to say you can’t do it, but unless you are an expert at this the odds are you’re just going to mess up your PC and complicate your life with a foreign architecture. After you go through all of that effort you will find that the libraries are the wrong version, or that one of the niche packages on the Jetson is not available that way.

You can also run a QEMU environment and install arm64/aarch64/ARMv8-a packages in this since the environment truly is not PC, but you run into the same issues of perhaps not having the Jetson’s version of those packages available, or having other issues.

For your locked root filesystem, is this basically a clone of the Jetson? If so, then you can probably name the path to the aarch64 files there as your sysroot. If this is an actual clone I consider it better, because then you can loopback mount it. If you have files directly copied to your system, then you have to either erase them or make different locations whenever a version changes. With a clone you name the aarch64 sysroot with a symbolic link, and when you loopback mount this, you magically get that exact and entire environment. If you have four versions and four clones, a loopback sysroot is trivial to switch between them in a matter of seconds.

For installing CUDA on target, you can copy the deb file like
cuda-repo-l4t-10-0-local-10.0.166_1.0-1_arm64.deb
to your Nano and install it through apt command. And then generate the custom image.

This is exactly what I’m trying to avoid because it is a manual step, unless I run the custom image in QEMU as pointed out by @linuxdev. Currently I have my build pipeline that is fully automatic, i.e. commit some code, and it cross compiles everything and generates the final image. Deb file is essentially a tar file, so I extracted the cuda-repo-l4t-10-0-local-10.0.166_1.0-1_arm64.deb file, but it doesn’t seem to have the full CUDA headers and libraries. Maybe I’m missing another deb file?

I already have everything (including CUDA) cross compiled (using sysroot, cross toolchain, etc) and working on my custom image. I had to install CUDA on a Jetson Nano and extract the installed CUDA files back out to my host machine, which is way too manual in my opinion.

The more fundamental question is, am I violating nVidia’s user agreement or licensing by doing this? I’m developing this as a commercial product, so I want to be clear on this. Can nVidia please clarify?

If nVidia can provide direct download links to the aarch64 CUDA libraries, it would solve a whole lot of issues.

Hi,

Here is the json file generated from the sdkmanager for JetPack4.4.
Suppose you can find the corresponding link and installation steps from it directly.

sdkml3_jetpack_l4t_44_dp.json.zip (13.6 KB)

Thanks.

The json mentioned by @AastaLLL contains a base URL, and then several packages you can get by appending to the base URL. Then wget should work.

sdkml3_jetpack_l4t_44_dp.json.zip gives me exactly what I need, direct download URLs. I can confirm that hitting the URLs directly does work. Thank you for that.

I’m currently still on JetPack 4.3. You mentioned SDK Manager generates the json file. Can you please point me to the location of the generated json file?

Hi,

Good to hear this.

The .json file is located at the JetPack download folder.
For example, I can find the file in this directory.

${HOME}/Downloads/nvidia/sdkm_downloads

Attached JetPack4.3 GA json file for your reference:
sdkml3_jetpack_l4t_43_ga.json.zip (13.7 KB)

Thanks.

2 Likes

Hello,

In order to use the cuda toolkit
Do I have to have nvidia gpu on hostpc?

Thank you.

Yes.

1 Like