Yesterday I tried updating my CUDA Toolkit from 10.1 to 11.3 using the CUDA install guide , I seem to have messed up pretty bad. I am no longer able to run compiled CUDA code, the outputs are just jibberish.
I’ve tried many approaches that I found in these forums but I’m at my wits end with this.
Here are some of my outputs:
OS: Ubuntu 20.04
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
ERROR: NVIDIA driver is not loaded
ERROR: Unable to load info from any available system
(nvidia-settings:27954): GLib-GObject-CRITICAL **: 13:30:10.150: g_object_unref: assertion ‘G_IS_OBJECT >(object)’ failed
** Message: 13:30:10.155: PRIME: Requires offloading
** Message: 13:30:10.155: PRIME: is it supported? yes
** Message: 13:30:10.193: PRIME: Usage: /usr/bin/prime-select nvidia|intel|on-demand|query
** Message: 13:30:10.193: PRIME: on-demand mode: “1”
** Message: 13:30:10.193: PRIME: is “on-demand” mode supported? yes
This one is weird because under the NVIDIA X Server Settings app I am set to ‘NVIDIA (Performance Mode)’
As I am a beginner when it comes to linux, it took me a while to learn about
ubuntu-drivers devices. It recommends
Additionally, here is the nvidia-bug-report.log.gz (1.2 MB)
nvidiafb is loaded and blocking the nvidia driver. Please create
sudo update-initramfs -u
Thank you so, so much! Everything works again.
How can I make it up to you?
Also, if I may ask, what was my mistake, how did I mess the drivers up this way (so I can avoid doing that in the future)? I suspect it was that I hadn’t deactivated secure boot when I first updated CUDA. Could that be it? Or was the mistake updating to the Toolkit version 11.3?
It’s recommended to not install the full ‘cuda’ metapackage on ubuntu but only ‘cuda-toolkit’ and use the driver from the normal repo which comes with all settings necessary.
may I ask for your help again? Ever since this fix, I am unable to launch Device-side code. The Host-side code executes as expected, but neither my own kernels nor functions from CUDA libraries (such as cuBLAS, cuDNN) are executed.
I’ve tried adding printf() statements in my kernels as a rudimentary debug method, but they never write. Additionally, after using cudaMemcpy to copy the results from the Device back to the Host, these arrays are empty.
Do you know what could be causing this?
The cuda package should include a deviceQuery sample, please run it and post its output.
Here is the output of deviceQuery:
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: “NVIDIA GeForce GTX 960M”
CUDA Driver Version / Runtime Version 11.3 / 11.3
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 4046 MBytes (4242604032 bytes)
(005) Multiprocessors, (128) CUDA Cores/MP: 640 CUDA Cores
GPU Max Clock rate: 1176 MHz (1.18 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 65536 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.3, CUDA Runtime Version = 11.3, NumDevs = 1
Result = PASS
Please excuse the formatting, I tried correcting it as best I could.