Nvv4l2decoder fails when running nvidia-cuda-mps-server

I have been using Deepstream 4.0 for some time now for the purpose of utilizing the “nvv4l2decoder” gstreamer plugin. So far it has been working well. However, I am working on performance optimizations which include multi-processing and running the nvidia-cuda-mps-server to improve concurrent utilization. I have successfully implemented multiprocessing and see a 2-3x throughput improvement when nvidia-cuda-mps-server is running. This is where I ran into a problem. My gstreamer video decoding pipeline fails when cuda-mps is running. Simply disabling cuda-mps resolves the problem.

Hardware Platform
GTX 1080 Ti

DeepStream Version
4.0.2

TensorRT Version
6.0.1.5

NVIDIA GPU Driver Version
440.33.01 w/ CUDA 10.2

Issue Type( questions, new requirements, bugs)
Bug

How to reproduce the issue?

Tested with following command:
gst-launch-1.0 rtspsrc location=“rtsp://192.168.1.72/axis-media/media.amp” ! rtph264depay ! h264parse ! nvv4l2decoder ! nvvideoconvert ! autovideosink

Without enabling cuda-mps the pipeline works as expected. However after enabling cuda-mps via:

sudo nvidia-cuda-mps-control -d

The pipeline blocks forever and is unrecoverable. To get it working again all I have to do it stop the cuda-mps-server. I also tried changing the io/memory parameters for the plugin without any luck.

I have attached two logs resulting from running the above pipeline with GST_DEBUG=4 in both scenarios.

gst-failure.log (111.9 KB)
gst-success.log (221.6 KB)

If I were to guess this has something to do with GPU architecture, but it is not clear to me why it is failing. Everything else behaves as expected.

Firstly, Deepstream4.0 is not supported now. Can you upgrade to latest DeepStream 5.0.1? https://developer.nvidia.com/deepstream-getting-started

The following pipeline can work with “nvidia-cuda-mps-control -d”:
gst-launch-1.0 --gst-debug=v4l2videodec:5 rtspsrc location=“rtsp://xxxxx” ! rtph264depay ! nvv4l2decoder ! nvvideoconvert ! nveglglessink

Thank you for the suggestions. I have upgraded to TensorRT 7 and Deepstream 5. I am still seeing the same behavior as before. I have attached two logs resulting from the same test I initially ran, except I used your modified pipeline instead. The pipeline blocks forever when it is handling frame 0 with cuda-mps enabled.

deepstream5-failure.log (5.3 KB)
deepstream5-success.log (20.0 KB)

What will happen if you use fakesink instead of autovideosink?

The behavior does not change. The pipeline never reaches the sink when cuda-mps is running.

We can reproduce the problem with GTX1080Ti and will investigate it. Will be back when there is any progress.

Thank you for looking into this. I have some deadlines coming up that require these optimizations to increase throughput. Could you please provide information on which GPUs this configuration will currently work on?

I think the list includes a T4 and a V100. Is it possible for you to test this on a Quadro RTX6000?

Formally, deepstream supports T4/P4 and Jetson, Quickstart Guide — DeepStream 6.1.1 Release documentation.

Also please note that similar issues occur when using the following plugins:

nvdec
nvh264enc

It appears to impact all decoding and encoding.

We have pointed out that “The NVIDIA Codec SDK: NVIDIA VIDEO CODEC SDK | NVIDIA Developer is
not supported under MPS on pre-Volta MPS clients.” in chapter 2.3.2 in Multi-Process Service (nvidia.com), Multi-Process Service :: GPU Deployment and Management Documentation
You can not use nvidia-cuda-mps-server with DS in GTX 1080 Ti.

Thanks for all of the information. I have a better understanding of what is happening now. It would seem that any architecture >= Volta will work and implements Volta MPS.

I am going to add a check to enable MPS when the CUDA Compute Capability is >= 7.0. Otherwise I will not enable MPS. Generally speaking, deployments will use newer GPUs, but I ran into this issue on a couple of my development platforms that do not have new GPUs.

Hello,
Would I know ampere (e.g., RTX 3090) support MPS?

@gpu_developer Please refer to the gpu list CUDA GPUs | NVIDIA Developer , if the “compute capability” value is higher than or equal to 7.0, it is a Volta or after-Volta MPS client. Or else, it is a “pre-Volta MPS client”.

@Fiona.Chen Thank you for reply. So, RTX series compute capability is higher than 7.0, it supports hardware-based MPS. Is it right?
I was confused since the architecture of RTX series is “ampere”.

“compute capability” is clearer than the name of “ampere”.