I have been using Deepstream 4.0 for some time now for the purpose of utilizing the “nvv4l2decoder” gstreamer plugin. So far it has been working well. However, I am working on performance optimizations which include multi-processing and running the nvidia-cuda-mps-server to improve concurrent utilization. I have successfully implemented multiprocessing and see a 2-3x throughput improvement when nvidia-cuda-mps-server is running. This is where I ran into a problem. My gstreamer video decoding pipeline fails when cuda-mps is running. Simply disabling cuda-mps resolves the problem.
Hardware Platform
GTX 1080 Ti
DeepStream Version
4.0.2
TensorRT Version
6.0.1.5
NVIDIA GPU Driver Version
440.33.01 w/ CUDA 10.2
Issue Type( questions, new requirements, bugs)
Bug
How to reproduce the issue?
Tested with following command:
gst-launch-1.0 rtspsrc location=“rtsp://192.168.1.72/axis-media/media.amp” ! rtph264depay ! h264parse ! nvv4l2decoder ! nvvideoconvert ! autovideosink
Without enabling cuda-mps the pipeline works as expected. However after enabling cuda-mps via:
sudo nvidia-cuda-mps-control -d
The pipeline blocks forever and is unrecoverable. To get it working again all I have to do it stop the cuda-mps-server. I also tried changing the io/memory parameters for the plugin without any luck.
I have attached two logs resulting from running the above pipeline with GST_DEBUG=4 in both scenarios.
The following pipeline can work with “nvidia-cuda-mps-control -d”:
gst-launch-1.0 --gst-debug=v4l2videodec:5 rtspsrc location=“rtsp://xxxxx” ! rtph264depay ! nvv4l2decoder ! nvvideoconvert ! nveglglessink
Thank you for the suggestions. I have upgraded to TensorRT 7 and Deepstream 5. I am still seeing the same behavior as before. I have attached two logs resulting from the same test I initially ran, except I used your modified pipeline instead. The pipeline blocks forever when it is handling frame 0 with cuda-mps enabled.
Thank you for looking into this. I have some deadlines coming up that require these optimizations to increase throughput. Could you please provide information on which GPUs this configuration will currently work on?
I think the list includes a T4 and a V100. Is it possible for you to test this on a Quadro RTX6000?
Thanks for all of the information. I have a better understanding of what is happening now. It would seem that any architecture >= Volta will work and implements Volta MPS.
I am going to add a check to enable MPS when the CUDA Compute Capability is >= 7.0. Otherwise I will not enable MPS. Generally speaking, deployments will use newer GPUs, but I ran into this issue on a couple of my development platforms that do not have new GPUs.
@gpu_developer Please refer to the gpu list CUDA GPUs | NVIDIA Developer , if the “compute capability” value is higher than or equal to 7.0, it is a Volta or after-Volta MPS client. Or else, it is a “pre-Volta MPS client”.
@Fiona.Chen Thank you for reply. So, RTX series compute capability is higher than 7.0, it supports hardware-based MPS. Is it right?
I was confused since the architecture of RTX series is “ampere”.