Nvidia-driver-460 on Ubuntu 20.04: NVIDIA driver is not loaded

That program does not compile. Output of make:
nvcc -c -arch sm_13 -DSM_13 -O3 -I. -I/usr/local/cuda/include -o cuda_memtest.o cuda_memtest.cu
nvcc fatal : Value ‘sm_13’ is not defined for option ‘gpu-architecture’
make: *** [Makefile:75: cuda_memtest.o] Error 1

Try sm_50 for your maxwell gpu. sm_13 is not supported by modern cuda versions anymore.

Here’s an updated fork:
https://github.com/ComputationalRadiationPhysics/cuda_memtest

Well, output of make for the fork:
gcc versions greater than 8 are not supported!
I am trying to work this around, wait a moment.

Output of cuda_memtest:
[03/01/2021 14:24:45][Aspire-E5][0]:Running cuda memtest, version 1.2.3
[03/01/2021 14:24:46][Aspire-E5][0]:NVRM version: NVIDIA UNIX x86_64 Kernel Module 460.39 Thu Jan 21 21:54:06 UTC 2021
[03/01/2021 14:24:46][Aspire-E5][0]:num_gpus=1
[03/01/2021 14:24:46][Aspire-E5][0]:Device name=GeForce 940M, global memory size=4242604032, serial=unknown (NVML runtime error)
[03/01/2021 14:24:46][Aspire-E5][0]:major=5, minor=0
[03/01/2021 14:24:46][Aspire-E5][0]:Attached to device 0 successfully.
[03/01/2021 14:24:46][Aspire-E5][0]:Allocated 3862 MB
[03/01/2021 14:24:46][Aspire-E5][0]:Test0 [Walking 1 bit]
[03/01/2021 14:24:46][Aspire-E5][0]:ERROR: CUDA error: an illegal memory access was encountered, line 598, file /home/ugobindini/Software/cuda_memtest-dev/tests.cu
[03/01/2021 14:24:46][Aspire-E5][0]:ERROR: CUDA error: an illegal memory access was encountered, line 598, file /home/ugobindini/Software/cuda_memtest-dev/tests.cu

No idea why that’s failing, which cuda version did you try?

nvcc --version:
Cuda compilation tools, release 10.1, V10.1.243.

nvidia-smi:
CUDA Version: 11.2

Maybe this mismatch is the problem?

No, nvidia-smi just reports the maximum supported cuda version of the driver. It should always be higher or equal to the installed cuda toolkit version. Maybe start with a system memory test using memtest86.

Ok, I give up. Thank you very much for all your work.

I understand, I guess the nvidia gpu is just broken.

I do also have probilem with ubuntu 20.04 and nvidia-driver-460. After the reboot, computer just stays in "loading ubuntu sign, nothing happens. See bug log attached.

nvidia-bug-report.log.gz (749.0 KB)

Seems you now have the nvidia driver uninstalled and are running on nouveau. There are some log-leftovers where the nvidia driver couldn’t load, liekly due to nouveau not being properly blacklisted. Please re-install the nvidia driver using software&updates, add the kernel parameter nouveau.modeset=0 and try to boot. If it still fails, please create a new nvidia-bug-report.log from recovery mode.

I did, still same problem. Please see if logs are more clear now.

nvidia-bug-report.log.gz (697.8 KB)

The driver was loading fine but I couldn’t see any Xserver starting. Unfortunately, the bug-report.log didn’t catch any config files. Please run

find /etc/X11 /etc/X11/xorg.conf.d /usr/share/X11/xorg.conf.d -name "*.conf" -print -exec cat '{}' \; >allconfig.txt

and attach allconfig.txt

Please see attached, and note that I am again on nouveau and nvidia is unistalled, as I cannot boot if installed.

allconfig.txt (6.7 KB)

No unusual or breaking config. Please reinstall the nvidia driver, reboot until stuck, reboot to recovery and run
sudo journalctl -b-1 >journal.txt
and attach journal.txt.

Please see log attached. Thanks for help!

journal.txt (187.8 KB)

Anyhow, Ubuntu is just stucked here after Nvidia driver install

i915 0000:00:02.0: [drm] Cannot find any crtc or sizes

Odd, please try to boot into a 5.4 kernel, if available, or install the GA kernel:
https://wiki.ubuntu.com/Kernel/LTSEnablementStack

1 Like

I did finally able to boot to GUI using kernel 5.4 and using nouveau.modeset=0 , but screen over Nvidia is not working yet (main one from Intel board does work).

New journal and bug report is attached.

journal.txt (439.1 KB)
nvidia-bug-report.log.gz (82.6 KB)

The nvidia kernel modules are not yet installed for the 5.4 kernel, I guess you’ll just need to install the missing kernel headers. While running the 5.4 kernel:

sudo apt install linux-headers-$(uname -r)

and reboot.

1 Like