I followed instructions in the official webpage to deploy a cuDNN system, and met lots of problems as below:
(1) Hardware: NVIDIA Quadro P620, nvidia driver: 470 and 495, both successfully installed.
(2) Cuda version 11.3 and 11.5 were tested, examples NVIDIA_CUDA-11.3(5)-Samples/3_Imaging/SoberFilter always failed, crash with (Unknown Error).
(3) pytorch installed OK, fail to find GPU, torch.cuda.is_available() always False.
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2234 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 3089 G /usr/lib/xorg/Xorg 4MiB |
±----------------------------------------------------------------------------+
WARNING: infoROM is corrupted at gpu 0000:01:00.0
I tested some examples inside NVIDIA_CUDA-11.5_Samples, always failed.
looks my video card does NOT support installed cuda toolkits (both 11.3 and 11.5 tested failure], 10.2 not yet tested because it requires lower gcc version.
If I install pytorch, and test torch.cuda.is_available(), always return False whatever cuda version (>11) installed.
–>example A:
CUDA Sobel Edge-Detection Starting…
GPU Device 0: “Pascal” with compute capability 6.1
Reading image: lena.pgm
CUDA error at SobelFilter.cpp:314 code=999(cudaErrorUnknown) “cudaGraphicsGLRegisterBuffer(&cuda_pbo_resource, pbo_buffer, cudaGraphicsMapFlagsWriteDiscard)”
–>example B:
[CUDA FFT Ocean Simulation]
Left mouse button - rotate
Middle mouse button - pan
Right mouse button - zoom
‘w’ key - toggle wireframe
[CUDA FFT Ocean Simulation]
GPU Device 0: “Pascal” with compute capability 6.1
CUDA error at oceanFFT.cpp:296 code=999(cudaErrorUnknown) “cudaGraphicsGLRegisterBuffer(&cuda_heightVB_resource, heightVertexBuffer, cudaGraphicsMapFlagsWriteDiscard)”
Segmentation fault (core dumped)
–>example C:
Windowed mode
Simulation data stored in video memory
Single precision floating point simulation
1 Devices used for simulation
GPU Device 0: “Pascal” with compute capability 6.1
Compute 6.1 CUDA device: [Quadro P620]
CUDA error at bodysystemcuda_impl.h:186 code=999(cudaErrorUnknown) “cudaGraphicsGLRegisterBuffer(&m_pGRes[i], m_pbo[i], cudaGraphicsMapFlagsNone)”
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2234 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 3089 G /usr/lib/xorg/Xorg 4MiB |
±----------------------------------------------------------------------------+
I installed pytorch, and tested torch.cuda.is_available(), always return False.
I tested some examples under NVIDIA_CUDA-11.5_Samples, always failed.
Here I listed some wrong print:
example 1—> NVIDIA_CUDA-11.5_Samples/5_Simulations/nbody
Windowed mode
Simulation data stored in video memory
Single precision floating point simulation
1 Devices used for simulation
GPU Device 0: “Pascal” with compute capability 6.1
Compute 6.1 CUDA device: [Quadro P620]
CUDA error at bodysystemcuda_impl.h:186 code=999(cudaErrorUnknown) “cudaGraphicsGLRegisterBuffer(&m_pGRes[i], m_pbo[i], cudaGraphicsMapFlagsNone)”
example 2–>NVIDIA_CUDA-11.5_Samples/5_Simulations/oceanFFT
[CUDA FFT Ocean Simulation]
Left mouse button - rotate
Middle mouse button - pan
Right mouse button - zoom
‘w’ key - toggle wireframe
[CUDA FFT Ocean Simulation]
GPU Device 0: “Pascal” with compute capability 6.1
CUDA error at oceanFFT.cpp:296 code=999(cudaErrorUnknown) “cudaGraphicsGLRegisterBuffer(&cuda_heightVB_resource, heightVertexBuffer, cudaGraphicsMapFlagsWriteDiscard)”
Segmentation fault (core dumped)
example 3–>NVIDIA_CUDA-11.5_Samples/5_Simulations/smokeParticles
CUDA Smoke Particles Starting…
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
The following required OpenGL extensions missing:
GL_ARB_multitexture
GL_ARB_vertex_buffer_object
GL_EXT_geometry_shader4.
nvidia-smi print:
| NVIDIA-SMI 495.44 Driver Version: 495.44 CUDA Version: 11.5
| 0 Quadro P620 Off | 00000000:01:00.0 Off | N/A |
| 34% 34C P8 N/A / N/A | 6MiB / 2000MiB | 0% Default
WARNING: infoROM is corrupted at gpu 0000:01:00.0
pytorch installed, and torch.cuda.is_available() always retrun False.
examples inside cuda_11.5.1_495.29.05_linux.run can NOT run, for example,
1.SobelFilter :
Reading image: lena.pgm
CUDA error at SobelFilter.cpp:314 code=999(cudaErrorUnknown) “cudaGraphicsGLRegisterBuffer(&cuda_pbo_resource, pbo_buffer, cudaGraphicsMapFlagsWriteDiscard)”
2.smokeParticles:
The following required OpenGL extensions missing:
GL_ARB_multitexture
GL_ARB_vertex_buffer_object
GL_EXT_geometry_shader4.
It seems there’s no Xserver running on the nvidia gpu so the cuda/GL interop samples fail. Are you doing this over ssh? Please try running the deviceQuery sample, this checks for general cuda availability.
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: “Quadro P620”
CUDA Driver Version / Runtime Version 11.5 / 11.5
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 2000 MBytes (2097479680 bytes)
(004) Multiprocessors, (128) CUDA Cores/MP: 512 CUDA Cores
GPU Max Clock rate: 1354 MHz (1.35 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 98304 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.5, CUDA Runtime Version = 11.5, NumDevs = 1
Result = PASS