NOT work with NVIDIA Quadro P620 + nvidia driver 4.70 + CUDA11.3 + Ubuntu 20.04 LTS + GCC9.3

Hi, all,

I followed instructions in the official webpage to deploy a cuDNN system, and met lots of problems as below:
(1) Hardware: NVIDIA Quadro P620, nvidia driver: 470 and 495, both successfully installed.
(2) Cuda version 11.3 and 11.5 were tested, examples NVIDIA_CUDA-11.3(5)-Samples/3_Imaging/SoberFilter always failed, crash with (Unknown Error).
(3) pytorch installed OK, fail to find GPU, torch.cuda.is_available() always False.

OS: ubuntu 20.04 LTS, gcc version 9.3.0.

Thanks for your help.

regards.

Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

“nvidia-smi” print:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44 Driver Version: 495.44 CUDA Version: 11.5 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P620 Off | 00000000:01:00.0 Off | N/A |
| 34% 27C P8 N/A / N/A | 11MiB / 2000MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2234 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 3089 G /usr/lib/xorg/Xorg 4MiB |
±----------------------------------------------------------------------------+
WARNING: infoROM is corrupted at gpu 0000:01:00.0

I tested some examples inside NVIDIA_CUDA-11.5_Samples, always failed.
looks my video card does NOT support installed cuda toolkits (both 11.3 and 11.5 tested failure], 10.2 not yet tested because it requires lower gcc version.
If I install pytorch, and test torch.cuda.is_available(), always return False whatever cuda version (>11) installed.

–>example A:
CUDA Sobel Edge-Detection Starting…
GPU Device 0: “Pascal” with compute capability 6.1
Reading image: lena.pgm
CUDA error at SobelFilter.cpp:314 code=999(cudaErrorUnknown) “cudaGraphicsGLRegisterBuffer(&cuda_pbo_resource, pbo_buffer, cudaGraphicsMapFlagsWriteDiscard)”

–>example B:
[CUDA FFT Ocean Simulation]
Left mouse button - rotate
Middle mouse button - pan
Right mouse button - zoom
‘w’ key - toggle wireframe
[CUDA FFT Ocean Simulation]
GPU Device 0: “Pascal” with compute capability 6.1
CUDA error at oceanFFT.cpp:296 code=999(cudaErrorUnknown) “cudaGraphicsGLRegisterBuffer(&cuda_heightVB_resource, heightVertexBuffer, cudaGraphicsMapFlagsWriteDiscard)”
Segmentation fault (core dumped)

–>example C:

Windowed mode
Simulation data stored in video memory
Single precision floating point simulation
1 Devices used for simulation
GPU Device 0: “Pascal” with compute capability 6.1
Compute 6.1 CUDA device: [Quadro P620]
CUDA error at bodysystemcuda_impl.h:186 code=999(cudaErrorUnknown) “cudaGraphicsGLRegisterBuffer(&m_pGRes[i], m_pbo[i], cudaGraphicsMapFlagsNone)”

thank you in advance.
regards

nvidia-smi print:

±----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44 Driver Version: 495.44 CUDA Version: 11.5 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P620 Off | 00000000:01:00.0 Off | N/A |
| 34% 27C P8 N/A / N/A | 11MiB / 2000MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2234 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 3089 G /usr/lib/xorg/Xorg 4MiB |
±----------------------------------------------------------------------------+

I installed pytorch, and tested torch.cuda.is_available(), always return False.
I tested some examples under NVIDIA_CUDA-11.5_Samples, always failed.
Here I listed some wrong print:

example 1—> NVIDIA_CUDA-11.5_Samples/5_Simulations/nbody

Windowed mode
Simulation data stored in video memory
Single precision floating point simulation
1 Devices used for simulation
GPU Device 0: “Pascal” with compute capability 6.1
Compute 6.1 CUDA device: [Quadro P620]
CUDA error at bodysystemcuda_impl.h:186 code=999(cudaErrorUnknown) “cudaGraphicsGLRegisterBuffer(&m_pGRes[i], m_pbo[i], cudaGraphicsMapFlagsNone)”

example 2–>NVIDIA_CUDA-11.5_Samples/5_Simulations/oceanFFT
[CUDA FFT Ocean Simulation]
Left mouse button - rotate
Middle mouse button - pan
Right mouse button - zoom
‘w’ key - toggle wireframe
[CUDA FFT Ocean Simulation]
GPU Device 0: “Pascal” with compute capability 6.1
CUDA error at oceanFFT.cpp:296 code=999(cudaErrorUnknown) “cudaGraphicsGLRegisterBuffer(&cuda_heightVB_resource, heightVertexBuffer, cudaGraphicsMapFlagsWriteDiscard)”
Segmentation fault (core dumped)

example 3–>NVIDIA_CUDA-11.5_Samples/5_Simulations/smokeParticles
CUDA Smoke Particles Starting…
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
The following required OpenGL extensions missing:
GL_ARB_multitexture
GL_ARB_vertex_buffer_object
GL_EXT_geometry_shader4.

thank you in advance.
regards.

Hope these messages be helpful.

nvidia-smi print:
| NVIDIA-SMI 495.44 Driver Version: 495.44 CUDA Version: 11.5
| 0 Quadro P620 Off | 00000000:01:00.0 Off | N/A |
| 34% 34C P8 N/A / N/A | 6MiB / 2000MiB | 0% Default
WARNING: infoROM is corrupted at gpu 0000:01:00.0

pytorch installed, and torch.cuda.is_available() always retrun False.

examples inside cuda_11.5.1_495.29.05_linux.run can NOT run, for example,
1.SobelFilter :
Reading image: lena.pgm
CUDA error at SobelFilter.cpp:314 code=999(cudaErrorUnknown) “cudaGraphicsGLRegisterBuffer(&cuda_pbo_resource, pbo_buffer, cudaGraphicsMapFlagsWriteDiscard)”
2.smokeParticles:
The following required OpenGL extensions missing:
GL_ARB_multitexture
GL_ARB_vertex_buffer_object
GL_EXT_geometry_shader4.

thank you in advance.
regards

It seems there’s no Xserver running on the nvidia gpu so the cuda/GL interop samples fail. Are you doing this over ssh? Please try running the deviceQuery sample, this checks for general cuda availability.

Thank you. Pls refer to the following feedback:

./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “Quadro P620”
CUDA Driver Version / Runtime Version 11.5 / 11.5
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 2000 MBytes (2097479680 bytes)
(004) Multiprocessors, (128) CUDA Cores/MP: 512 CUDA Cores
GPU Max Clock rate: 1354 MHz (1.35 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 98304 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.5, CUDA Runtime Version = 11.5, NumDevs = 1
Result = PASS

Obviously, cuda is running fine. The GL interop samples should work by prepending

__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia

https://download.nvidia.com/XFree86/Linux-x86_64/495.44/README/primerenderoffload.html
Since you seem to be on a hybrid graphics system.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.