HW decoder/encoder failure

hamuryen · June 9, 2023, 3:56pm

• Hardware Platform (Jetson / GPU) NVIDIA A2
• DeepStream Version 6.1.1
• NVIDIA GPU Driver Version (valid for GPU only) 530.41.03
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

gst-launch-1.0 -v videotestsrc ! queue ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=I420,width=1920,height=1080' ! nvv4l2h264enc bitrate=800 ! rtspclientsink location=rtsp://localhost:8554/stream protocols=tcp profiles=avp tls-validation-flags=insecure

We are getting some errors that we want to hardware decode, and encode(nvv4l2h264enc) on A2 GPU. These errors are as follows.

ERROR: from element /GstPipeline:pipeline0/nvv4l2h264enc:nvv4l2h264enc0: Failed to process frame.
Additional debug info:
gstv4l2videoenc.c(1398): gst_v4l2_video_enc_handle_frame (): /GstPipeline:pipeline0/nvv4l2h264enc:nvv4l2h264enc0:
Maybe be due to not enough memory or failing driver
Execution ended after 0:00:01.205040711
Setting pipeline to NULL ...
Cuda failure: status=702
nvbufsurface: Error(-1) in releasing cuda memory
Cuda failure: status=702
nvbufsurface: Error(-1) in releasing cuda memory
Cuda failure: status=702
nvbufsurface: Error(-1) in releasing cuda memory
Cuda failure: status=702
nvbufsurface: Error(-1) in releasing cuda memory
Cuda failure: status=702
nvbufsurface: Error(-1) in releasing cuda memory
Cuda failure: status=702
nvbufsurface: Error(-1) in releasing cuda memory
Freeing pipeline ...

About GPU

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03              Driver Version: 530.41.03    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A2                       Off| 00000000:C4:00.0 Off |                    0 |
|  0%   42C    P8                9W /  60W|      9MiB / 15356MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1123      G   /usr/lib/xorg/Xorg                            4MiB |
|    0   N/A  N/A      2354      G   /usr/lib/xorg/Xorg                            4MiB |
+---------------------------------------------------------------------------------------+

We got a similar error when we tried 530, 525 and 515 as driver versions.
But when we downgraded the driver version to 470, we saw that the pipeline worked without errors.
Since we are using Deepstream 6.1.1, we need a more up-to-date CUDA version, so we cannot use driver version 470.

We need support about the cause and solution of these errors.

Thanks.

Honey_Patouceul · June 9, 2023, 6:44pm

Not sure, I have no experience with your platform, but the bitrate seems very low to me. nvv4l2h264enc may expect bitrate in bits/s, while x264enc or other encoder may specify bitrate in Kbits/s. Try bitrate=800000.

hamuryen · June 9, 2023, 7:28pm

@Honey_Patouceul
I tried with nvv4l2h264enc bitrate=800000 but unfortunately the result did not change and I still get the following error.

nvbufsurface: NvBufSurfaceSysToHWCopy: failed in mem copy
nvbufsurface: NvBufSurfaceCopy: failed to copy
nvbufsurface: NvBufSurfaceSysToHWCopy: failed in mem copy
nvbufsurface: NvBufSurfaceCopy: failed to copy
ERROR in BufSurfacecopy 
0:00:07.192074749   198 0x561c14517520 ERROR         v4l2bufferpool gstv4l2bufferpool.c:2388:gst_v4l2_buffer_pool_process:<nvv4l2h264enc0:pool:sink> failed to prepare data
ERROR: from element /GstPipeline:pipeline0/nvv4l2h264enc:nvv4l2h264enc0: Failed to process frame.
Additional debug info:
gstv4l2videoenc.c(1398): gst_v4l2_video_enc_handle_frame (): /GstPipeline:pipeline0/nvv4l2h264enc:nvv4l2h264enc0:
Maybe be due to not enough memory or failing driver
Execution ended after 0:00:07.066906214
Setting pipeline to NULL ...
Cuda failure: status=702
nvbufsurface: Error(-1) in releasing cuda memory
Cuda failure: status=702
nvbufsurface: Error(-1) in releasing cuda memory
Cuda failure: status=702
nvbufsurface: Error(-1) in releasing cuda memory
Cuda failure: status=702
nvbufsurface: Error(-1) in releasing cuda memory
Cuda failure: status=702
nvbufsurface: Error(-1) in releasing cuda memory
Cuda failure: status=702
nvbufsurface: Error(-1) in releasing cuda memory
Freeing pipeline ...

By the way, this pipeline runs smoothly(without any error) on RTX 2070 Super GPU with 520 driver, GTX 1080 Ti with 525 driver and T4 GPU with 530 driver installed. So the problem occurs when running on A2 GPU.

junshengy · June 12, 2023, 2:56am

We will look into this issue and will be back once there is any progress.

By the way,can you upgrade deepstream to 6.2 ?

hamuryen · June 12, 2023, 9:01am

Thanks for your advice @junshengy

I also tried with deepstream 6.2 and driver version 530.41.03. But the result did not change, we still keep getting the same error.

nvbufsurface: NvBufSurfaceSysToHWCopy: failed in mem copy
nvbufsurface: NvBufSurfaceCopy: failed to copy
nvbufsurface: NvBufSurfaceSysToHWCopy: failed in mem copy
nvbufsurface: NvBufSurfaceCopy: failed to copy
ERROR in BufSurfacecopy 
0:00:06.463089905   160 0x556520b8a980 ERROR         v4l2bufferpool gstv4l2bufferpool.c:2388:gst_v4l2_buffer_pool_process:<nvv4l2h264enc0:pool:sink> failed to prepare data
ERROR: from element /GstPipeline:pipeline0/nvv4l2h264enc:nvv4l2h264enc0: Failed to process frame.
Additional debug info:
gstv4l2videoenc.c(1489): gst_v4l2_video_enc_handle_frame (): /GstPipeline:pipeline0/nvv4l2h264enc:nvv4l2h264enc0:
Maybe be due to not enough memory or failing driver
Execution ended after 0:00:06.366975802
Setting pipeline to NULL ...

hamuryen · June 12, 2023, 6:07pm

The latest update, I tried the same pipeline with A10 GPU and driver version 525 and it worked fine (no errors). But with the same driver version and pipeline, we still get the error on A2 GPU.

A10(Error free, working successfully) GPU and driver information.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         Off  | 00000000:00:1E.0 Off |                    0 |
|  0%   33C    P0    58W / 300W |      0MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

We can connect to the stream published with pipeline, the ffprobe result is as follows.

Input #0, rtsp, from 'rtsp://localhost:8554/test':
  Metadata:
    title           : Stream
  Duration: N/A, start: 0.772378, bitrate: N/A
    Stream #0:0: Video: h264 (Constrained Baseline), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 90k tbn, 60 tbc

We are still waiting for a solution for the A2 model GPU.

Thanks for your help.

junshengy · June 13, 2023, 2:38am

Maybe the driver have same problems.

To confirm this question，can you try the NvVideoCodecSDK demo ?

DeepStream just wrap codec sdk to gstreamer plugins

hamuryen · June 19, 2023, 7:33am

Hello @junshengy

We do our development and testing with the official Nvidia docker image. The docker image we use: nvcr.io/nvidia/deepstream:6.1.1-triton. We use Gstreamer, not FFmpeg. Gstreamer uses nvv4l2h264dec and nvv4l2h264enc developed by Nvidia and included in the docker image as a decoder/encoder.
While the same pipeline does not cause problems in GPUs such as A10, T4, 2070 Super and 1080Ti unfortunately, the A2 also has a problem. We will also test it with the A30, but I don’t think there will be any problems.

Did you try to run a deepstream pipeline with HW encoder/decoder on A2 GPU?

junshengy · June 20, 2023, 1:51am

I don’t have A2 GPU，I have tried it on T4 and A3000 GPU,even jetson orin.
I think it’s a bug of codec driver.
But DeepStream can’t resolved it, I had report it to codec driver team.
Maybe you can got some help from here

junshengy · June 26, 2023, 8:08am

Does your device have multiple GPUs ?

I tried on A2 GPU and my driver version is 525.It’s ok

gst-launch-1.0 -v videotestsrc ! queue ! nvvideoconvert gpu-id=2 ! 'video/x-raw(memory:NVMM),format=I420,width=1920,height=1080' ! nvv4l2h264enc bitrate=1000000 gpu-id=2 ! filesink location=out.h264

If I just only set gpu-id of nvv4l2h264enc,will output the following log.

Maybe be due to not enough memory or failing driver

But nvvideoconvert and nvv4l2h264enc need to work on the same GPU, so both plugins need to set gpu-id

hamuryen · July 4, 2023, 10:59am

@junshengy
Thanks for your reply.

I ran the below pipeline on A2 GPU.

gst-launch-1.0 -v videotestsrc ! queue ! nvvideoconvert gpu-id=0 ! 'video/x-raw(memory:NVMM),format=I420,width=1920,height=1080' ! nvv4l2h264enc bitrate=1000000 gpu-id=0 ! filesink location=out.h264

And I kept getting the error output as below.

Setting pipeline to PLAYING ...
New clock: GstSystemClock
nvbufsurface: NvBufSurfaceSysToHWCopy: failed in mem copy
nvbufsurface: NvBufSurfaceCopy: failed to copy
ERROR in BufSurfacecopy 
0:00:06.304596854   124 0x55e79a768aa0 ERROR         v4l2bufferpool gstv4l2bufferpool.c:2388:gst_v4l2_buffer_pool_process:<nvv4l2h264enc0:pool:sink> failed to prepare data
ERROR: from element /GstPipeline:pipeline0/nvv4l2h264enc:nvv4l2h264enc0: Failed to process frame.
Additional debug info:
gstv4l2videoenc.c(1398): gst_v4l2_video_enc_handle_frame (): /GstPipeline:pipeline0/nvv4l2h264enc:nvv4l2h264enc0:
Maybe be due to not enough memory or failing driver
Execution ended after 0:00:04.647833737
Setting pipeline to NULL ...
Cuda failure: status=702
nvbufsurface: Error(-1) in releasing cuda memory
Cuda failure: status=702
nvbufsurface: Error(-1) in releasing cuda memory
Cuda failure: status=702
nvbufsurface: Error(-1) in releasing cuda memory
Cuda failure: status=702
nvbufsurface: Error(-1) in releasing cuda memory
Cuda failure: status=702
nvbufsurface: Error(-1) in releasing cuda memory
Cuda failure: status=702
nvbufsurface: Error(-1) in releasing cuda memory
Freeing pipeline ...

The GPU and driver version we use are as below.

Tue Jul  4 10:55:03 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A2           Off  | 00000000:C4:00.0 Off |                    0 |
|  0%   38C    P8     5W /  60W |      9MiB / 15356MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

We do these operations in a docker container using the nvcr.io/nvidia/deepstream:6.1.1-triton image.

simond83 · July 17, 2023, 9:13am

@junshengy - Any other ideas what may be causing this? Have you tried this in the docker?

junshengy · July 17, 2023, 9:24am

I have tried on nvcr.io/nvidia/deepstream:6.1.1-triton with driver version 525.

It’s work normal.

Do you have another A2 GPU ? Can you try other if you can ？

simond83 · July 17, 2023, 9:28am

@junshengy - I don’t. This is the only unit we have to test right now. Is there a script we can run to check the hardware is good?

junshengy · July 17, 2023, 10:11am

There is no easy way to test.

NVVS is a complex solution.

Trying another A2 or plugging the A2 into another device is a quicker way.

hamuryen · July 18, 2023, 6:30am

I ran the below pipeline with driver version 525 in the container we created with the nvcr.io/nvidia/deepstream:6.1.1-triton docker image, but i kept getting the errors, so nothing changed.
But when I downgraded the driver version from 525 to 470, it worked without any error.

Pipeline:

gst-launch-1.0 -v videotestsrc ! queue ! nvvideoconvert gpu-id=0 ! 'video/x-raw(memory:NVMM),format=I420,width=1920,height=1080' ! nvv4l2h264enc bitrate=1000000 gpu-id=0 ! filesink location=out.h264

GPU:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A2           Off  | 00000000:C4:00.0 Off |                    0 |
|  0%   38C    P8     5W /  60W |      9MiB / 15356MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

junshengy · July 19, 2023, 6:25am

Can you try nvcc --version in your docker？

Maybe a specific cuda version and driver version is causing this problem

In my docker of version 6.1.1, below is the output

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

Or try

deepstream-app --version-all

junshengy · July 20, 2023, 9:04am

Maybe you can keep driver verion as 470.
If you want use up-to-date CUDA version.

try.

sudo apt-get install -y cuda-compat-12-0
export LD_LIBRARY_PATH=/usr/local/cuda/compat:$LD_LIBRARY_PATH

junshengy · July 27, 2023, 5:47am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Can driver version 470 with cuda-compat-12-0 help you workaround ?

Thanks.

system · August 15, 2023, 1:22am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
DeepStream samples fail in fresh docker-container on centos 7.9 host system: Device is in streaming mode DeepStream SDK	15	551	October 27, 2022
ERROR from nvv4l2h264enc. Could not get/set settings from/on resource. Device is in streaming mode DeepStream SDK	9	1249	May 22, 2023
Deepstream pipeline waits for input indefinitely DeepStream SDK deepstream61	19	688	June 16, 2022
Cuda failure: status=801 Error(-1) in buffer allocation DeepStream SDK	36	1577	September 18, 2023
Cuda failure in Deepstream docker on Centos 7 DeepStream SDK	11	1232	October 12, 2021
New installation Multiple Failues DeepStream SDK	18	1131	June 28, 2022
Deepstream-app crash with nvbufsurface: NvBufSurfaceSysToHWCopy error DeepStream SDK nvbugs	28	4484	October 12, 2021
Video Processing stops with Software Encoder DeepStream SDK	13	485	March 11, 2024
Requirements of a host to run a deepstream in a container DeepStream SDK	19	1978	October 12, 2021
nVidia A100 with Deepstream 7 App run failure DeepStream SDK	9	288	January 7, 2025

HW decoder/encoder failure

Related topics