and follow the wizard there. Select the ODE setting. It will take you to a 440.31 driver (currently - may be newer in the future) for your Quadro P2000, which will satisfy the 435.21 or newer requirement.
The driver bundled with the latest CUDA toolkit is not guaranteed to be the latest driver for your GPU.
I have the same problem and the provided answer doesn’t solve the issue. I’m experiencing the issue on a Ubuntu 16.04 installation which provides nvidia-418 as the latest driver when combined with an installation of CUDA. Yes, it’s possible to install 440.31 when purging everything 418 related and all of CUDA, but then it’s no longer possible to build a working ffmpeg using CUDA. It doesn’t seem possible to reinstall CUDA after installing 440.31 (without going back to 418.87.00).
I actually figured it out early this am - and this is exactly what I did.
First, I uninstalled all drivers and cuda from my 18.04 server,
then installed the newest nvidia driver downloaded from the nvidia.com site and rebooted,
then checked the driver using
nvidia-smi
command,
then I installed cuda-toolkit using the runfile installer (.deb file did not work for me) and deselecting the option to install the bundled driver.
the cuda installation guide was helpful for pre-install and post-install tasks and assuring proper PATH alterations and post-install testing.
Unfortunately for my ubuntu 16.04 system this doesn’t result in a working solution. I’ve removed (purged) all installed nvidia and cuda packages, rebooted the system and verified no nvidia drivers where active. Then installed the 440 driver by running NVIDIA-Linux-x86_64-440.31.run, rebooted again. Then re-installed cuda-toolkit-10-1 through apt-get.
nvidia-smi provides this output:
root@fractal:~# nvidia-smi
Wed Nov 13 10:04:15 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.31 Driver Version: 440.31 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GT 710 Off | 00000000:01:00.0 N/A | N/A |
| 40% 39C P0 N/A / N/A | 0MiB / 979MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
… but my custom built ffmpeg can’t open a cuda device:
[AVHWDeviceContext @ 0x3ae97c0] cu->cuInit(0) failed -> CUDA_ERROR_UNKNOWN: unknown error
[h264_cuvid @ 0x3aeb5c0] Error creating a CUDA device
cuvid hwaccel requested for input stream #0:0, but cannot be initialized.
(they are likely to simply throw the same error - CUDA error unknown)
To fully confirm lack of support, i.e. Fermi, you would need to purge the system of all NVIDIA software and reload CUDA 8. If the deviceQuery in CUDA 8 shows a cc2.x GPU, then your GPU is too old to be used with any of this.
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GT 710"
CUDA Driver Version / Runtime Version 10.2 / 10.1
CUDA Capability Major/Minor version number: 3.5
Total amount of global memory: 979 MBytes (1026883584 bytes)
( 1) Multiprocessors, (192) CUDA Cores/MP: 192 CUDA Cores
GPU Max Clock rate: 954 MHz (0.95 GHz)
Memory Clock rate: 800 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.1, NumDevs = 1
Result = PASS
Forgot to run vectorAdd, but that also runs successful.
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
Errrr, okay… …I don’t know what changed, but I am now able to run my custom ffmpeg as well. The issue has disappeared, although I’m not aware of making changes in the mean time. Thanks for your efforts.