That program does not compile. Output of make:
nvcc -c -arch sm_13 -DSM_13 -O3 -I. -I/usr/local/cuda/include -o cuda_memtest.o cuda_memtest.cu
nvcc fatal : Value ‘sm_13’ is not defined for option ‘gpu-architecture’
make: *** [Makefile:75: cuda_memtest.o] Error 1
Try sm_50 for your maxwell gpu. sm_13 is not supported by modern cuda versions anymore.
Here’s an updated fork:
https://github.com/ComputationalRadiationPhysics/cuda_memtest
Well, output of make for the fork:
gcc versions greater than 8 are not supported!
I am trying to work this around, wait a moment.
Output of cuda_memtest:
[03/01/2021 14:24:45][Aspire-E5][0]:Running cuda memtest, version 1.2.3
[03/01/2021 14:24:46][Aspire-E5][0]:NVRM version: NVIDIA UNIX x86_64 Kernel Module 460.39 Thu Jan 21 21:54:06 UTC 2021
[03/01/2021 14:24:46][Aspire-E5][0]:num_gpus=1
[03/01/2021 14:24:46][Aspire-E5][0]:Device name=GeForce 940M, global memory size=4242604032, serial=unknown (NVML runtime error)
[03/01/2021 14:24:46][Aspire-E5][0]:major=5, minor=0
[03/01/2021 14:24:46][Aspire-E5][0]:Attached to device 0 successfully.
[03/01/2021 14:24:46][Aspire-E5][0]:Allocated 3862 MB
[03/01/2021 14:24:46][Aspire-E5][0]:Test0 [Walking 1 bit]
[03/01/2021 14:24:46][Aspire-E5][0]:ERROR: CUDA error: an illegal memory access was encountered, line 598, file /home/ugobindini/Software/cuda_memtest-dev/tests.cu
[03/01/2021 14:24:46][Aspire-E5][0]:ERROR: CUDA error: an illegal memory access was encountered, line 598, file /home/ugobindini/Software/cuda_memtest-dev/tests.cu
No idea why that’s failing, which cuda version did you try?
nvcc --version:
Cuda compilation tools, release 10.1, V10.1.243.
nvidia-smi:
CUDA Version: 11.2
Maybe this mismatch is the problem?
No, nvidia-smi just reports the maximum supported cuda version of the driver. It should always be higher or equal to the installed cuda toolkit version. Maybe start with a system memory test using memtest86.
Ok, I give up. Thank you very much for all your work.
I understand, I guess the nvidia gpu is just broken.
I do also have probilem with ubuntu 20.04 and nvidia-driver-460. After the reboot, computer just stays in "loading ubuntu sign, nothing happens. See bug log attached.
nvidia-bug-report.log.gz (749.0 KB)
Seems you now have the nvidia driver uninstalled and are running on nouveau. There are some log-leftovers where the nvidia driver couldn’t load, liekly due to nouveau not being properly blacklisted. Please re-install the nvidia driver using software&updates, add the kernel parameter nouveau.modeset=0 and try to boot. If it still fails, please create a new nvidia-bug-report.log from recovery mode.
I did, still same problem. Please see if logs are more clear now.
nvidia-bug-report.log.gz (697.8 KB)
The driver was loading fine but I couldn’t see any Xserver starting. Unfortunately, the bug-report.log didn’t catch any config files. Please run
find /etc/X11 /etc/X11/xorg.conf.d /usr/share/X11/xorg.conf.d -name "*.conf" -print -exec cat '{}' \; >allconfig.txt
and attach allconfig.txt
Please see attached, and note that I am again on nouveau and nvidia is unistalled, as I cannot boot if installed.
allconfig.txt (6.7 KB)
No unusual or breaking config. Please reinstall the nvidia driver, reboot until stuck, reboot to recovery and run
sudo journalctl -b-1 >journal.txt
and attach journal.txt.
Please see log attached. Thanks for help!
journal.txt (187.8 KB)
Anyhow, Ubuntu is just stucked here after Nvidia driver install
i915 0000:00:02.0: [drm] Cannot find any crtc or sizes
Odd, please try to boot into a 5.4 kernel, if available, or install the GA kernel:
https://wiki.ubuntu.com/Kernel/LTSEnablementStack
I did finally able to boot to GUI using kernel 5.4 and using nouveau.modeset=0 , but screen over Nvidia is not working yet (main one from Intel board does work).
New journal and bug report is attached.
journal.txt (439.1 KB)
nvidia-bug-report.log.gz (82.6 KB)
The nvidia kernel modules are not yet installed for the 5.4 kernel, I guess you’ll just need to install the missing kernel headers. While running the 5.4 kernel:
sudo apt install linux-headers-$(uname -r)
and reboot.