Nvgpu lost and No devices were found with nvidia-smi

On Jetpack 6 GA.(Jetson Orin nx 16GB) Was trying to install cuda 12.4 and tensorrt 10.
Everything worked fine until I rebooted the device.

Then I got

nvidia-smi
No devices were found
ls /dev/
autofs                capture-vi-channel24  capture-vi-channel43  capture-vi-channel62  full          loop0           nvme0n1     pts     rtc1               tty14  tty33  tty52    ttyp3    v4l2-nvdec  vcsu5
block                 capture-vi-channel25  capture-vi-channel44  capture-vi-channel63  fuse          loop1           nvme0n1p1   ptyp0   shm                tty15  tty34  tty53    ttyp4    v4l2-nvenc  vcsu6
btrfs-control         capture-vi-channel26  capture-vi-channel45  capture-vi-channel64  gpiochip0     loop2           nvme0n1p10  ptyp1   snd                tty16  tty35  tty54    ttyp5    vcs         vfio
bus                   capture-vi-channel27  capture-vi-channel46  capture-vi-channel65  gpiochip1     loop3           nvme0n1p11  ptyp2   spidev0.0          tty17  tty36  tty55    ttyp6    vcs1        vga_arbiter
capture-vi-channel0   capture-vi-channel28  capture-vi-channel47  capture-vi-channel66  host1x-fence  loop4           nvme0n1p12  ptyp3   spidev0.1          tty18  tty37  tty56    ttyp7    vcs2        watchdog
capture-vi-channel1   capture-vi-channel29  capture-vi-channel48  capture-vi-channel67  hugepages     loop5           nvme0n1p13  ptyp4   spidev1.0          tty19  tty38  tty57    ttyp8    vcs3        watchdog0
capture-vi-channel10  capture-vi-channel3   capture-vi-channel49  capture-vi-channel68  hwrng         loop6           nvme0n1p14  ptyp5   spidev1.1          tty2   tty39  tty58    ttyp9    vcs4        zero
capture-vi-channel11  capture-vi-channel30  capture-vi-channel5   capture-vi-channel69  i2c-0         loop7           nvme0n1p15  ptyp6   stderr             tty20  tty4   tty59    ttypa    vcs5        zram0
capture-vi-channel12  capture-vi-channel31  capture-vi-channel50  capture-vi-channel7   i2c-1         loop-control    nvme0n1p2   ptyp7   stdin              tty21  tty40  tty6     ttypb    vcs6        zram1
capture-vi-channel13  capture-vi-channel32  capture-vi-channel51  capture-vi-channel70  i2c-10        mapper          nvme0n1p3   ptyp8   stdout             tty22  tty41  tty60    ttypc    vcsa        zram2
capture-vi-channel14  capture-vi-channel33  capture-vi-channel52  capture-vi-channel71  i2c-11        media0          nvme0n1p4   ptyp9   tee0               tty23  tty42  tty61    ttypd    vcsa1       zram3
capture-vi-channel15  capture-vi-channel34  capture-vi-channel53  capture-vi-channel8   i2c-2         mem             nvme0n1p5   ptypa   teepriv0           tty24  tty43  tty62    ttype    vcsa2       zram4
capture-vi-channel16  capture-vi-channel35  capture-vi-channel54  capture-vi-channel9   i2c-4         mqueue          nvme0n1p6   ptypb   tegra_camera_ctrl  tty25  tty44  tty63    ttypf    vcsa3       zram5
capture-vi-channel17  capture-vi-channel36  capture-vi-channel55  char                  i2c-5         net             nvme0n1p7   ptypc   tegra-soc-hwpm     tty26  tty45  tty7     ttyS0    vcsa4       zram6
capture-vi-channel18  capture-vi-channel37  capture-vi-channel56  console               i2c-7         ng0n1           nvme0n1p8   ptypd   tty                tty27  tty46  tty8     ttyS1    vcsa5       zram7
capture-vi-channel19  capture-vi-channel38  capture-vi-channel57  cpu_dma_latency       i2c-9         null            nvme0n1p9   ptype   tty0               tty28  tty47  tty9     ttyS2    vcsa6
capture-vi-channel2   capture-vi-channel39  capture-vi-channel58  cuse                  initctl       nvidia0         nvsciipc    ptypf   tty1               tty29  tty48  ttyAMA0  ttyS3    vcsu
capture-vi-channel20  capture-vi-channel4   capture-vi-channel59  disk                  input         nvidiactl       port        random  tty10              tty3   tty49  ttyGS0   ttyTCU0  vcsu1
capture-vi-channel21  capture-vi-channel40  capture-vi-channel6   dri                   kmsg          nvidia-modeset  pps0        rfkill  tty11              tty30  tty5   ttyp0    ttyTHS1  vcsu2
capture-vi-channel22  capture-vi-channel41  capture-vi-channel60  efi_capsule_loader    kvm           nvmap           printer     rtc     tty12              tty31  tty50  ttyp1    ttyTHS2  vcsu3
capture-vi-channel23  capture-vi-channel42  capture-vi-channel61  fd                    log           nvme0           ptmx        rtc0    tty13              tty32  tty51  ttyp2    urandom  vcsu4

Reverted back to cuda 12.2 but it didn’t help.

Anyway to fix this without reflashing the device?

Hi,

Have you set up the environment package for CUDA 12.4?

We have tested CUDA 12.4 and TensorRT 10.x on Jetson.
And nvidia-smi can work correctly.

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:23:12_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
nvidia@tegra-ubuntu:~$ dpkg-query -W tensorrt
tensorrt        10.0.1.6-1+cuda12.4
nvidia@tegra-ubuntu:~$ nvidia-smi
Tue Jul  9 06:38:04 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 540.3.0                Driver Version: N/A          CUDA Version: 12.4     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Orin (nvgpu)                  N/A  | N/A              N/A |                  N/A |
| N/A   N/A  N/A               N/A /  N/A | Not Supported        |     N/A          N/A |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Thanks.

Yes., I did it but I removed all cuda 12.2 related packages after the installation. It worked perfectly before I rebooted the device, must be some module got removed

Hi,

We try to reproduce this locally.
Could you share how you installed CDUA and TensorRT so we can give it a try?

Thanks.

Hi

maybe it’s not 100% accurate, but steps are:

  • Install CUDA, I believe in this step the tensorrt
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-4 cuda-compat-12-4

Then I removed anything related to cuda 12-2

sudo apt remove cuda-*-12-2
sudo apt remove cuda-toolkit-12-2-config-common
  • Install tensorrt
sudo apt install tensorrt

after reboot then the gpu is lost

Hi,

Thanks for this info.
We will try to reproduce this issue and provide more info to you later.

Thanks

Hi,

We set up a device with JetPack 6 and upgraded the CUDA and TensorRT with your comment.
nvidia-smi can work correctly but we need the below command to access the upgraded CUDA version:

$ export LD_LIBRARY_PATH=/usr/local/cuda-12.4/compat:$LD_LIBRARY_PATH
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:23:12_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

$ dpkg-query -W tensorrt
tensorrt	10.1.0.27-1+cuda12.4

$ nvidia-smi 
Mon Jul 15 02:26:15 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 540.3.0                Driver Version: N/A          CUDA Version: 12.4     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Orin (nvgpu)                  N/A  | N/A              N/A |                  N/A |
| N/A   N/A  N/A               N/A /  N/A | Not Supported        |     N/A          N/A |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Could you reflash the system and try it again?

Thanks.