FFPMEG hardware encoding failing on GeForce 3080 Ti laptop in Ubuntu 22.04

I’m developing a custom video encode/decode application for a client that uses the encode and decode libraries. The application works on a number of platforms and cards (Ubuntu 18.04, 20.04, 22.04, Windows; GeForce 1080, 2080) but fails on a recently-purchased GeForce 3080 Ti laptop GPU running Ubuntu 22.04 with driver 515. It runs on an AWS instance using the same OS and driver but an older GPU. In our case, the error is: /NV_Video_Codec_SDK_7.0.1/src/NvHWEncoder.cpp line 908: lock bitstream function failed, code 8

I’ve checked the specifications for this card, and as expected they are a superset of those for earlier cards.

I’ve been able to reproduce encoding/decoding failure on this system using commands that work on an Ubuntu 18.04 system with a GeForce 1080. This is decoding a video downloaded from MP4 ( H.264 ) | Test Videos

The following command works on both systems:
ffmpeg -i Big_Buck_Bunny_1080_10s_1MB.mp4 -c:v h264_nvenc output.mp4

The following command fails on the laptop:
ffmpeg -hwaccel cuvid -hwaccel_output_format cuda -i Big_Buck_Bunny_1080_10s_1MB.mp4 -c:v h264_nvenc output.mp4

Here is the relevant portion of its output:
Output #0, mp4, to ‘output.mp4’:
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.76.100
Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), cuda(progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 2000 kb/s, 60 fps, 15360 tbn (default)
Metadata:
handler_name : VideoHandler
vendor_id : [0][0][0][0]
encoder : Lavc58.134.100 h264_nvenc
Side data:
cpb: bitrate max/min/avg: 0/0/2000000 buffer size: 4000000 vbv_delay: N/A
[h264 @ 0x556d4f1cdbc0] No decoder surfaces left0:00:00.00 bitrate=N/A speed= 0x
[h264 @ 0x556d4f1eaa80] No decoder surfaces left
[h264 @ 0x556d4f207940] No decoder surfaces left
[h264 @ 0x556d4f224800] No decoder surfaces left
[h264 @ 0x556d4f0aca40] No decoder surfaces left
Error while decoding stream #0:0: Invalid data found when processing input
Last message repeated 1 times
[h264 @ 0x556d4f1cdbc0] No decoder surfaces left
Error while decoding stream #0:0: Invalid data found when processing input
[h264 @ 0x556d4f1eaa80] No decoder surfaces left
Error while decoding stream #0:0: Invalid data found when processing input
[h264 @ 0x556d4f207940] No decoder surfaces left
Impossible to convert between the formats supported by the filter ‘Parsed_null_0’ and the filter ‘auto_scaler_0’
Error reinitializing filters!
Failed to inject frame into filter network: Function not implemented
Error while processing the decoded data for stream #0:0
Conversion failed!

Here is the output of nvidia-smi -q:

==============NVSMI LOG==============

Timestamp : Wed Oct 12 15:32:56 2022
Driver Version : 515.65.01
CUDA Version : 11.7

Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : NVIDIA GeForce RTX 3080 Ti Laptop GPU
Product Brand : GeForce
Product Architecture : Ampere
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-41e83eb6-b8ee-dd98-ed89-c6241b0284b3
Minor Number : 0
VBIOS Version : 94.03.19.00.44
MultiGPU Board : No
Board ID : 0x100
GPU Part Number : N/A
Module ID : 0
Inforom Version
Image Version : G001.0000.03.03
OEM Object : 2.0
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x246010DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x0B281028
GPU Link Info
PCIe Generation
Max : 4
Current : 4
Link Width
Max : 16x
Current : 8x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 16384 MiB
Reserved : 258 MiB
Used : 5 MiB
Free : 16119 MiB
BAR1 Memory Usage
Total : 16384 MiB
Used : 3 MiB
Free : 16381 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 42 C
GPU Shutdown Temp : 99 C
GPU Slowdown Temp : 96 C
GPU Max Operating Temp : 86 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : N/A
Power Draw : 35.25 W
Power Limit : N/A
Default Power Limit : N/A
Enforced Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 1455 MHz
SM : 1455 MHz
Memory : 8000 MHz
Video : 1282 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 2100 MHz
SM : 2100 MHz
Memory : 8001 MHz
Video : 1950 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 812.500 mV
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 2245
Type : G
Name : /usr/lib/xorg/Xorg
Used GPU Memory : 4 MiB

The root cause appears to be a failure of an earlier call to cudaGraphicsGLRegisterBuffer(). This fails in two different portions of our custom application and seems consistent with the error reported by FFMPEG.

Recompiling using the current packages on the laptop itself rather than running a binary executable produces the same results but with a change in error code from 30 to 999 (which still means unknown internal error).

Installing the CUDA toolkit on the laptop rolled its drivers back to 510. The bug happens in both 510 and 515 but only on a GeForce 3080 Ti (we’ve only tried on this particular 3080 card).

The minimal PBO example program at (Minimal) OpenGL to CUDA PBO example, purpose of this example is to evaluate why depth transfer is so slow · GitHub also fails on the same call with an unknown error. This should be a good example for how to reproduce the problem.

This program works on an Ubuntu 18.04 machine with a GeForce 1080.

We were able to test this code running on a desktop with a 30 series card (3060 or 3070) on Ubuntu 18.04 with the 515 driver and it runs fine. It looks like either the specific combination of OS/driver/card or else the problem is specific to a 3070 Ti or to running on a laptop. The laptop is an Alienware m17 R5.

Turning off Optimus in the BIOS made it work.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.