ffmpeg CUDA fails

trying to use a quadro p2000 to do hardware accelerated ffmpeg processes - says I need updated driver, but just installed all updated software…

anyone know how I can work this?

Thanks - SDC

sdcoston@CFS:/mnt/NFS/CFS_tank/CFS_Media/DVR_TV/Treadstone (2019)/Season 01$ sudo ffmpeg -vsync 0 -hwaccel cuvid -c:v h264_cuvid -i ‘Treadstone (2019) - S01E03 - The Berlin Proposal.ts’ -c:a copy -c:v h264_nvenc -b:v 5M ‘Treadstone.2019-S01E03.mkv’
[sudo] password for sdcoston:
ffmpeg version N-95673-g007e03348d Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 7 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
configuration: --enable-nonfree --enable-cuda-sdk --enable-libnpp --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64
libavutil 56. 35.101 / 56. 35.101
libavcodec 58. 60.100 / 58. 60.100
libavformat 58. 34.101 / 58. 34.101
libavdevice 58. 9.100 / 58. 9.100
libavfilter 7. 66.100 / 7. 66.100
libswscale 5. 6.100 / 5. 6.100
libswresample 3. 6.100 / 3. 6.100
[h264 @ 0x56099a755780] Increasing reorder buffer to 2
Input #0, mpegts, from ‘Treadstone (2019) - S01E03 - The Berlin Proposal.ts’:
Duration: 01:05:56.71, start: 1.400000, bitrate: 4131 kb/s
Program 1
Metadata:
service_name : Service01
service_provider: FFmpeg
Stream #0:0[0x100]: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(progressive), 1280x720 [SAR 1:1 DAR 16:9], Closed Captions, 59.94 fps, 59.94 tbr, 90k tbn, 119.88 tbc
Stream #0:10x101: Audio: ac3 ([129][0][0][0] / 0x0081), 48000 Hz, 5.1(side), fltp, 384 kb/s
Stream #0:20x102: Audio: ac3 ([129][0][0][0] / 0x0081), 48000 Hz, stereo, fltp, 192 kb/s
File ‘Treadstone.2019-S01E03.mkv’ already exists. Overwrite? [y/N] y
Stream mapping:
Stream #0:0#0:0 (h264 (h264_cuvid) → h264 (h264_nvenc))
Stream #0:1#0:1 (copy)
Press [q] to stop, [?] for help
[h264_nvenc @ 0x56099a814500] Driver does not support the required nvenc API version. Required: 9.1 Found: 9.0
[h264_nvenc @ 0x56099a814500] The minimum required Nvidia driver for nvenc is 435.21 or newer
Error initializing output stream 0:0 – Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height
Conversion failed!
sdcoston@CFS:/mnt/NFS/CFS_tank/CFS_Media/DVR_TV/Treadstone (2019)/Season 01$ nvidia-smi
Sat Nov 9 14:28:54 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro P2000 On | 00000000:81:00.0 On | N/A |
| 47% 35C P8 5W / 75W | 182MiB / 5057MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3688 G /usr/lib/xorg/Xorg 85MiB |
| 0 4114 G /usr/bin/gnome-shell 94MiB |
±----------------------------------------------------------------------------+
sdcoston@CFS:/mnt/NFS/CFS_tank/CFS_Media/DVR_TV/Treadstone (2019)/Season 01$

It says you need driver 435.21 or newer

You have 418.87.00

That won’t work. To get the latest driver for your GPU, go to:

http://www.nvidia.com/drivers

and follow the wizard there. Select the ODE setting. It will take you to a 440.31 driver (currently - may be newer in the future) for your Quadro P2000, which will satisfy the 435.21 or newer requirement.

The driver bundled with the latest CUDA toolkit is not guaranteed to be the latest driver for your GPU.

I have the same problem and the provided answer doesn’t solve the issue. I’m experiencing the issue on a Ubuntu 16.04 installation which provides nvidia-418 as the latest driver when combined with an installation of CUDA. Yes, it’s possible to install 440.31 when purging everything 418 related and all of CUDA, but then it’s no longer possible to build a working ffmpeg using CUDA. It doesn’t seem possible to reinstall CUDA after installing 440.31 (without going back to 418.87.00).

After installing the 440.31 driver, then install CUDA toolkit only either via:

package manager: install the cuda-toolkit package instead of the cuda package. You may need to be specific: cuda-toolkit-10-1

runfile installer: deselect the option to install the bundled driver

I suggest reading the CUDA linux install guide.

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

Thanks Robert!

I actually figured it out early this am - and this is exactly what I did.
First, I uninstalled all drivers and cuda from my 18.04 server,
then installed the newest nvidia driver downloaded from the nvidia.com site and rebooted,
then checked the driver using

nvidia-smi

command,
then I installed cuda-toolkit using the runfile installer (.deb file did not work for me) and deselecting the option to install the bundled driver.

the cuda installation guide was helpful for pre-install and post-install tasks and assuring proper PATH alterations and post-install testing.

Hi Robert,

Unfortunately for my ubuntu 16.04 system this doesn’t result in a working solution. I’ve removed (purged) all installed nvidia and cuda packages, rebooted the system and verified no nvidia drivers where active. Then installed the 440 driver by running NVIDIA-Linux-x86_64-440.31.run, rebooted again. Then re-installed cuda-toolkit-10-1 through apt-get.

nvidia-smi provides this output:

root@fractal:~# nvidia-smi
Wed Nov 13 10:04:15 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.31       Driver Version: 440.31       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 710      Off  | 00000000:01:00.0 N/A |                  N/A |
| 40%   39C    P0    N/A /  N/A |      0MiB /   979MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

… but my custom built ffmpeg can’t open a cuda device:

[AVHWDeviceContext @ 0x3ae97c0] cu->cuInit(0) failed -> CUDA_ERROR_UNKNOWN: unknown error
[h264_cuvid @ 0x3aeb5c0] Error creating a CUDA device
cuvid hwaccel requested for input stream #0:0, but cannot be initialized.

Your GeForce GT 710 appears to be a Fermi device.

https://www.techpowerup.com/gpu-specs/geforce-gt-710.c2614

Fermi GPUs have not been supported for CUDA since CUDA 8, and the last driver that officially supported them was 390.xx

Your GPU is too old to be used with any modern software.

To confirm this, build and run the CUDA deviceQuery and vectorAdd sample codes.

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#verify-installation

(they are likely to simply throw the same error - CUDA error unknown)

To fully confirm lack of support, i.e. Fermi, you would need to purge the system of all NVIDIA software and reload CUDA 8. If the deviceQuery in CUDA 8 shows a cc2.x GPU, then your GPU is too old to be used with any of this.

Running deviceQuery works and shows this output:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GT 710"
  CUDA Driver Version / Runtime Version          10.2 / 10.1
  CUDA Capability Major/Minor version number:    3.5
  Total amount of global memory:                 979 MBytes (1026883584 bytes)
  ( 1) Multiprocessors, (192) CUDA Cores/MP:     192 CUDA Cores
  GPU Max Clock rate:                            954 MHz (0.95 GHz)
  Memory Clock rate:                             800 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.1, NumDevs = 1
Result = PASS

To my knowledge my card is based on Kepler architecture: https://www.zotac.com/us/product/graphics_card/geforce®-gt-710-1gb-pcie-x-1

Forgot to run vectorAdd, but that also runs successful.

[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

Errrr, okay… …I don’t know what changed, but I am now able to run my custom ffmpeg as well. The issue has disappeared, although I’m not aware of making changes in the mean time. Thanks for your efforts.