H264 decoding error with cuda 11.4

Hi all,

I got error in the env: ubuntu 20.04.4, nvidia driver 515.48.07, cuda 11.4.2, A30 graphic card, ffmpeg 3.3.3

[h264_cuvid @ 0x55d83aea81c0] ctx->cvdl->cuvidCreateDecoder(&cudec, &cuinfo) failed -> CUDA_ERROR_OUT_OF_MEMORY: out of memory
ERROR - Failed to open decoder context! error - Generic error in an external library

when compile this code (omit all sanity checks):

void initDecoder() {
  AVCodec* mDecoder;
  AVCodecContext* mDecoderCtx;

  mDecoder = avcodec_find_decoder_by_name("h264_cuvid");

  mDecoderCtx = avcodec_alloc_context3(mDecoder);
  if (mDecoder->capabilities & AV_CODEC_CAP_TRUNCATED) {
    mDecoderCtx->flags |= AV_CODEC_CAP_TRUNCATED;
  }
  mDecoderCtx->flags2 |= AV_CODEC_FLAG2_CHUNKS;
  mDecoderCtx->thread_count = 2;


  av_opt_set(mDecoderCtx->priv_data, "gpu", std::to_string(mCudaDeviceId).c_str(), 0);
  av_opt_set_int(mDecoderCtx->priv_data, "surfaces", 16, 0);
  int error = avcodec_open2(mDecoderCtx, mDecoder, nullptr);
  if (error < 0)
  {
     std::cout << "Failed to open decoder context! error - " << err2str(error) << std::endl;
     exit(-1)
  }

}

It is interesting that the same code works on A10 graphic card: ubuntu 20.04.4, nvidia driver 515.48.07, cuda 11.4.2, A10 graphic card, ffmpeg 3.3.3.

Any idea of the error?
Is ffmpeg upgrade needed?

Hi @jackie.dinh ,
Could you just share us the repo?

Thanks!

Hi @mchi ,
I have setup a minimum example to reproduce the issue at the following link:

Thanks!

Hi @jackie.dinh
Thanks for the repo!
I tried your repo code, but I can’t reproduce the issue on A30.

1. Add changes below to print GPU card

diff --git a/demo.cpp b/demo.cpp
index 2c8a1f2..01fde4f 100644
--- a/demo.cpp
+++ b/demo.cpp
@@ -1,5 +1,6 @@
 #include <string>
 #include <iostream>
+#include <cuda_runtime.h>

 extern "C"
 {
@@ -46,7 +47,15 @@ void createDecoder() {
   std::cout << "Created decoder succesfully." << std::endl;
 }

-int main() {
+int main(int argc, char *argv[]) {
+    int dev = atoi(argv[1]);
+
+    cudaSetDevice(dev);
+    cudaDeviceProp deviceProp;
+    cudaGetDeviceProperties(&deviceProp, dev);
+
+    printf("\nDevice %d: \"%s\"\n", dev, deviceProp.name);
+
    av_register_all();
    createDecoder();
 }
diff --git a/run.sh b/run.sh
index b5793bf..8c19aeb 100755
--- a/run.sh
+++ b/run.sh
@@ -1 +1,5 @@
-g++ -I./install/ffmpeg/include demo.cpp -o demo -L./install/ffmpeg/lib -lavutil -lavcodec -lswresample -lavformat && LD_LIBRARY_PATH=./instal
l/ffmpeg/lib ./demo
+#!/bin/bash
+
+nvcc -I./install/ffmpeg/include demo.cpp -o demo -L./install/ffmpeg/lib -lavutil -lavcodec -lswresample -lavformat && LD_LIBRARY_PATH=.
/install/ffmpeg/lib ./demo $1

2. Test steps
// launch DS docker
$ docker run --gpus all -it --rm -v /tmp/.X11-unix:/tmp/.X11-unix -v /Datadiska/$user/:/Datadiska/$user/ -e DISPLAY=$DISPLAY -w /opt/nvidia/deepstream/deepstream nvcr.io/nvidia/deepstream:6.1-devel
// in docker
2.1. build the libs as you mentioned in https://github.com/jackiedinh8/demo_decoder
2.2 Run the sample
image

3. My Test Env
image

Hi @mchi,

Thanks for checking it.

I haven’t tested the modified code yet. The version of nvidia driver is not the same as ours and I think we don’t use nvcc to compile applications.

Anyway, I will try to run it on our setup and let you know the result.

Thanks!

Run the modified code gets the same error:

Device 0: "NVIDIA A30"
[h264_cuvid @ 0x5610460eaa20] ctx->cvdl->cuvidCreateDecoder(&cudec, &cuinfo) failed -> CUDA_ERROR_OUT_OF_MEMORY: out of memory
Failed to open decoder context! error - Generic error in an external library

I try to downgrade the driver version to 510.x, but get another issue:

Env: ubuntu 20.04.4 LTS, kernel 5.4.0-122-generic

[    8.315287] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  510.60.02  Wed Mar 16 11:24:05 UTC 2022
[    8.406654] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  510.60.02  Wed Mar 16 11:17:28 UTC 2022
[    8.409030] [drm] [nvidia-drm] [GPU ID 0x00001700] Loading driver
...
[   11.695135] Initializing XFRM netlink socket
[   15.235804] rfkill: input handler disabled
[  420.179075] nvidia-uvm: Loaded the UVM driver, major device number 511.
[ 1215.600175] rfkill: input handler enabled
[ 1215.620308] BUG: kernel NULL pointer dereference, address: 0000000000000040
[ 1215.620324] #PF: supervisor read access in kernel mode
[ 1215.620333] #PF: error_code(0x0000) - not-present page
[ 1215.620342] PGD 0 P4D 0
[ 1215.620349] Oops: 0000 [#1] SMP NOPTI
[ 1215.620356] CPU: 30 PID: 2983 Comm: nvidia-sleep.sh Tainted: P           OE     5.4.0-122-generic #138-Ubuntu

Finally, able to run the demo on ubuntu 20.04.4 with nvidia driver 510.73.05.

Thanks @mchi

1 Like