Manually installing CUDA 11.0.2 on Jetson Xavier NX - Help!

I’m trying to get a working CUDA 11.0.2 installation on a Jetson Xavier NX. I installed only the base Jetson OS (Jetpack 4.6) using the Nvidia SDK manager. I didn’t install CUDA at that time because SDK manager’s CUDA install is version 10.2, and fills almost the entire 16 GB of eMMC with other stuff I don’t want. There was no option to install only the CUDA toolkit; if there were, I would have gladly chosen it.

So instead, I followed instructions from these two websites; the instrux are identical:

These all appear to have succeeded. Afterwards, I added /usr/local/cuda/bin to my PATH, and /usr/local/cuda/lib64 to my LD_LIBRARY PATH. Executing nvcc --version gives me valid-looking output:

zeus@neptune:/usr/local/cuda/samples/bin/sbsa/linux/release$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Thu_Jun_11_22:26:42_PDT_2020
Cuda compilation tools, release 11.0, V11.0.194
Build cuda_11.0_bu.TC445_37.28540450_0

I then tried to build and run the CUDA sample applications, as suggested in NVidia’s documentation here:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#mandatory-post

Most of the samples built (but not all). Fortunately one that did build was the deviceQuery sample. Running the deviceQuery sample, as suggested above, gives me this not-so-encouraging output:

zeus@neptune:/usr/local/cuda/samples/bin/sbsa/linux/release$ sudo ./deviceQuery 
./deviceQuery Starting...
 CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 999
-> unknown error
Result = FAIL

I also notice that my 16 GB of eMMC storage is now 87% full. It was only about 30% full with just the base Jetson OS installed. Does CUDA 11 really need 10+ GB of drive space?

Any ideas for what I might have missed? Is there an easier means of getting CUDA 11 onto the Xavier NX?

Thanks!

Hi,

Please noted that there are some dependencies between the CUDA library and the GPU driver.
GPU driver is integrated into the OS system on Jetson.
And now Jetson only support the driver compatible with CUDA 10.2.

You can still manually install it from the link below but please use version 10.2:
https://repo.download.nvidia.com/jetson/

Thanks.

OK. I’m happy to use CUDA 10.2. On the download page you pointed me at, there are literally hundreds of .deb packages. I’m guessing that I would only want the “t194” and “common” packages involving CUDA, like these:

nvidia-cuda_4.6-b197_arm64.deb
nvidia-cuda_4.6-b199_arm64.deb
cuda-command-line-tools-10-2_10.2.460-1_arm64.deb
cuda-compiler-10-2_10.2.460-1_arm64.deb
cuda-cudart-10-2_10.2.300-1_arm64.deb
cuda-cudart-dev-10-2_10.2.300-1_arm64.deb
cuda-cuobjdump-10-2_10.2.300-1_arm64.deb
cuda-cupti-10-2_10.2.300-1_arm64.deb
cuda-cupti-dev-10-2_10.2.300-1_arm64.deb
cuda-documentation-10-2_10.2.300-1_arm64.deb
cuda-driver-dev-10-2_10.2.300-1_arm64.deb
cuda-gdb-10-2_10.2.300-1_arm64.deb
cuda-gdb-src-10-2_10.2.300-1_arm64.deb
cuda-libraries-10-2_10.2.460-1_arm64.deb
cuda-libraries-dev-10-2_10.2.460-1_arm64.deb
cuda-memcheck-10-2_10.2.300-1_arm64.deb
cuda-minimal-build-10-2_10.2.460-1_arm64.deb
cuda-nvcc-10-2_10.2.300-1_arm64.deb
cuda-nvdisasm-10-2_10.2.300-1_arm64.deb
cuda-nvgraph-10-2_10.2.300-1_arm64.deb
cuda-nvgraph-dev-10-2_10.2.300-1_arm64.deb
cuda-nvml-dev-10-2_10.2.300-1_arm64.deb
cuda-nvprof-10-2_10.2.300-1_arm64.deb
cuda-nvprune-10-2_10.2.300-1_arm64.deb
cuda-nvrtc-10-2_10.2.300-1_arm64.deb
cuda-nvrtc-dev-10-2_10.2.300-1_arm64.deb
cuda-nvtx-10-2_10.2.300-1_arm64.deb
cuda-samples-10-2_10.2.300-1_arm64.deb
cuda-toolkit-10-2_10.2.460-1_arm64.deb
cuda-tools-10-2_10.2.460-1_arm64.deb
cuda-visual-tools-10-2_10.2.460-1_arm64.deb

There are no instructions of any kind. Can you provide, or point me at, instructions to get CUDA 10.2 installed on a Xavier NX where only the Jetson OS (JetPack 4.6) has been pre-installed via SDK Manager? I’d rather not screw up again, and waste both your time and mine. Thanks!

You can install CUDA by sudo apt-get install nvidia-cuda or sudo apt-get install nvidia-jetpack

Update, I think I have figured it out. Steps to install CUDA-10.2 were:

wget https://repo.download.nvidia.com/jetson/common/pool/main/c/cuda-toolkit-10-2/cuda-toolkit-10-2_10.2.460-1_arm64.deb
sudo apt install ./cuda-toolkit-10-2_10.2.460-1_arm64.deb

Then I manually added the following lines to my ~./profile:

# add CUDA to PATH and LD_LIBRARY_PATH
PATH="/usr/local/cuda/bin:$PATH"
LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"

Now, I can compile and run the deviceQuery sample:

cd /usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make
…/…/bin/aarch64/linux/release/deviceQuery

And I get the following (much more encouraging!) output:

zeus@uranus:/usr/local/cuda/samples/1_Utilities/deviceQuery$ ../../bin/aarch64/linux/release/deviceQuery 
../../bin/aarch64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Xavier"
  CUDA Driver Version / Runtime Version          10.2 / 10.2
  CUDA Capability Major/Minor version number:    7.2
  Total amount of global memory:                 7765 MBytes (8142626816 bytes)
  ( 6) Multiprocessors, ( 64) CUDA Cores/MP:     384 CUDA Cores
  GPU Max Clock rate:                            1109 MHz (1.11 GHz)
  Memory Clock rate:                             1109 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS

My total disk usage has gone up from 5 to 8.4 GB, leaving about 4.7 GB on the 16GB eMMC free to work with. This is enough for me to continue working on the Xavier NX.

Let me know if there are any other steps or tests I should make before continuing.