Can't compile after apt upgrade

Hello,

As usual, i did sudo apt-get update then sudo apt-get upgrade before working on VPI with nsight.
But my code can’t compile so are the sample, i am getting this error while launching the executable :
VPI_ERROR_INTERNAL: (cudaErrorCompatNotSupportedOnDevice)

My code and the sample were working fine before this upgrade.

Here is what this cuda error means :
cudaErrorCompatNotSupportedOnDevice = 804
This error indicates that the system was upgraded to run with forward compatibility but the visible hardware detected by CUDA does not support this configuration. Refer to the compatibility documentation for the supported hardware matrix or ensure that only supported hardware is visible during initialization via the CUDA_VISIBLE_DEVICES environment variable.

After sudo apt-get upgrade here are the packages that has to be upgraded :

libnvidia-cfg1-460 libnvidia-common-460 libnvidia-compute-460 libnvidia-decode-460 libnvidia-encode-460 libnvidia-extra-460 libnvidia-fbc1-460 libnvidia-gl-460 libnvidia-ifr1-460 nvidia-compute-utils-460 nvidia-dkms-460 nvidia-driver-460 nvidia-kernel-common-460 nvidia-kernel-source-460 nvidia-utils-460 xserver-xorg-video-nvidia-460

This is what i get after hitting yes to upgrade, downloading and depacking .deb

Paramétrage de nvidia-dkms-460 (460.56-0ubuntu0.18.04.1) …
update-initramfs: deferring update (trigger activated)
INFO:Enable nvidia
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/dell_latitude
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/lenovo_thinkpad
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/put_your_quirks_here
Loading new nvidia-460.56 DKMS files…
Building for 5.4.0-66-generic
Building for architecture x86_64
Building initial module for 5.4.0-66-generic
Secure Boot not enabled on this system.
Done.

nvidia:
Running module version sanity check.

  • Original module
    • No original module exists within this kernel
  • Installation
    • Installing to /lib/modules/5.4.0-66-generic/updates/dkms/

nvidia-modeset.ko:
Running module version sanity check.

  • Original module
    • No original module exists within this kernel
  • Installation
    • Installing to /lib/modules/5.4.0-66-generic/updates/dkms/

nvidia-drm.ko:
Running module version sanity check.

  • Original module
    • No original module exists within this kernel
  • Installation
    • Installing to /lib/modules/5.4.0-66-generic/updates/dkms/

nvidia-uvm.ko:
Running module version sanity check.

  • Original module
    • No original module exists within this kernel
  • Installation
    • Installing to /lib/modules/5.4.0-66-generic/updates/dkms/

depmod…

DKMS: install completed.
Paramétrage de nvidia-driver-460 (460.56-0ubuntu0.18.04.1) …
Traitement des actions différées (« triggers ») pour libc-bin (2.27-3ubuntu1.4) …
Traitement des actions différées (« triggers ») pour man-db (2.8.3-2ubuntu0.1) …
Traitement des actions différées (« triggers ») pour initramfs-tools (0.130ubuntu3.11) …
update-initramfs: Generating /boot/initrd.img-5.4.0-66-generic
W: Possible missing firmware /lib/firmware/rtl_nic/rtl8125a-3.fw for module r8169
W: Possible missing firmware /lib/firmware/rtl_nic/rtl8168fp-3.fw for module r8169

Details of my graphics card

01:00.0 VGA compatible controller: NVIDIA Corporation GK110B [GeForce GTX 780 Ti] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Gigabyte Technology Co., Ltd GK110B [GeForce GTX 780 Ti]
Flags: bus master, fast devsel, latency 0, IRQ 33
Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
Memory at e8000000 (64-bit, prefetchable) [size=128M]
Memory at f0000000 (64-bit, prefetchable) [size=32M]
I/O ports at e000 [size=128]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities:
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

If i am correct, this means that nvidia released library that my pc can’t handle no more. So what can i do about this ? Is there a way to downgrade those library, but i will never have to upgrade again (this is probably not a good idea). What can i do to work with it as this is clearly a nvidia library compatibility issue.

Thanks in advance for your help !

Hi,

Suppose you meet this error on a desktop environment.
Please correct me if this is not correct.

Since VPI requires CUDA 10.2 environment, could you check if your library still v10.2?

$ nvidia-smi

Please share the output log of nvidia-smi with us.
Thanks.

Hi AastaLLL,
Thanks for your reply,

Suppose you meet this error on a desktop environment.

Yes you are right, I am on Ubuntu 18.04.5 LTS x86_64, this is on my host PC.
Output log of nvidia-smi :
Mon Mar 8 08:41:28 2021
±----------------------------------------------------------------------------+
| NVIDIA-SMI 460.56 Driver Version: 460.56 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 780 Ti Off | 00000000:01:00.0 N/A | N/A |
| 17% 32C P8 N/A / N/A | 198MiB / 3018MiB | N/A Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

Now everything works fine, i don’t understand and i didn’t do anything special.
Restarting my computer is the only thing i did that should affect this issue, so my guess is that those libraries didn’t install correctly on my case and that’s why i needed a reboot.
Do you have any idea why that wasn’t compiling ? I didn’t see any updates from those library.

Suppose you meet this error on a desktop environment.

My desktop environnement is ubuntu:GNOME, sorry for not telling you earlier.

FYI, nvidia-smi requires a PCI GPU. Jetsons instead integrate with the memory controller, so 100% of all packages using or working with nvidia-smi will fail on a Jetson. Don’t know if that is the cause, but if somehow this got mixed in on a Jetson it would cause a failure.

Hi linuxdev,
Thanks for the information, but this error happened on my host pc.
Can Jetson library for nvidia-smi replace or modify those for my host pc ?

If you use JetPack/SDKM to install to the host, then it will always use the correct version (with nvidia-smi). Similarly, if you use JetPack/SDKM to install to the Jetson, then this too will always be the correct version (without nvidia-smi). However, the host PC can handle multiple versions beyond what JetPack/SDKM would install…whereas a Jetson must use just the version installed by that JetPack/SDKM release. It is possible that if you had a different release of software active on the host PC, and then you installed the version from JetPack/SDKM, it might change the one used as a default, but the other versions would still be there.

As an example, on the host PC, if you look at the files in “/usr/local”, then you might have more than one hard linked directory with a numbered (versioned) cuda, e.g.:
/usr/local/cuda-10.2/
…plus a symbolic link pointing to that:
/usr/local/cuda//usr/local/cuda-10.2

If your software uses “/usr/local/cuda” to find CUDA content, then it is really finding the version the symbolic link points to. You could rename the “cuda” to “cuda-10.2” and it would always use that exact version. I don’t know if your host PC side is what you are having issues with, but one thing you want to verify when it starts misbehaving after installing host PC side CUDA is that the software is using the correct version…there might be more than one version. If this happens, then the old version should also still be there. If what you want is not there on a host PC, then it is easy on the host PC to tell it to install another version without removing the old version. I tend to do this with the tarball packages for the PC…I install one version with SDKM, and then use the tarball for other versions so that the package system does not remove the other versions. Packages won’t replace other non-package content (if the other package is really a tarball and not a “.deb” format, then it does not know about the other version and won’t interfere with it).

1 Like

Hi,

Good to know it works now.
You will need to reboot the device to refresh library for the new GPU driver

Thanks.