We’re using EGL Stream API to share video frames from single input to multiple applications. There is one process running as a video server, responsible for capturing and sharing frames. Server runs constantly in the background as systemd service. Also, up to 3 additional processes could be started anytime as a video clients.
Issue - On the server (producer) side there is a DMAbuf file descriptors leak after disconnection/reconnection of the clients.
Setup - Jetson TX2-4GB running L4T 32.4.4 with CUDA 10.2.
Test setup
I used EGLStream_CUDA_CrossGPU sample code from cuda-samples repo (latest master branch) as a reference.
Slightly modified Makefile to fix build for CUDA 10.2. As well as did quick modifications to main.cpp. Processes don’t stop and run until termination to emulate many reconnections to producer. Changes prod-cons.patch (3.5 KB) are attached.
Build command - cd Samples/2_Concepts_and_Techniques/EGLStream_CUDA_CrossGPU && make CUDA_PATH=/usr/local/cuda-10.2/ dbg=1. Compiled binary EGLStream_CUDA_CrossGPU (804.2 KB) is attached.
Steps to reproduce:
Start consumer first - ./EGLStream_CUDA_CrossGPU
Consumer waits for producer.
In the second terminal start producer - ./EGLStream_CUDA_CrossGPU -proctype prod
Run script to collect the FD usage statistics for both processes - ./cuda_test_fd.sh.
Script cuda_test_fd.sh (398 Bytes) is attached.
Press Enter button in Consumer terminal to repeat iteration. Collect statistics again and check that DMAbuf FDs are leaking.
According to the code, stream and frames should be destroyed (consequently, DMAbuf FDs should be closed) after each iteration, but in fact number of FDs grows very fast.
Where could be the issue? Is something wrong in sample code with sequence of Stream termination?
Any help is appreciated.
Thanks!
Please check if you can make a patch on this set up so that we can run and reproduce it. And would be great if you can upgrade to Jetpack 4.6.2 and try.
I’m aware about this demo. But our case is different. We have CUDA producer/consumer instead of OpenGL producer/consumer used in eglstreamcube/ctree/gears. That’s why I’m using demo from official Nvidia’s cuda-samples repo - EGLStream_CUDA_CrossGPU. It matches our case.
Patch and steps for reproducing are attached to original post under “Test setup” section. Just in case, “Test setup” section is collapsed for convenience. All technical details are there. Using them you can easily reproduce issue on your side.
I tried latest L4T 32.7.2 on my setup. Observe the same issue. I used same code and steps, as described in my original post under Test setup section.
Hi,
No. We check the code and it is a bit strange the producer/consumer is destroyed/re-initialized in a loop but the process is still alive. It is more reasonable the producer/consumer is not destroyed if the process is alive. To re-use same producer and cosumer.
I checked this user guide. It’s the same as used in L4T documentation for 32.4.4 release. Wasn’t able to find something new there.
We’re following these guidelines. And looks like in github samples from Nvidia you’re also following them. But the difference is that in our case and in EGLStream_CUDA_CrossGPU case CUDA consumer/producer are used instead of OpenGL ones.
In my test setup (based on EGLStream_CUDA_CrossGPU) consumer was created first. As you’re suggesting.
I did that intentionally to simulate behavior closer to our real case. We can’t keep consumers always running, because they’re desktop applications. Users of the system can open/close them at will any time.
Do you mean that EGL stream implementation can’t properly support dynamic creation/destroying of multiple Streams?
Are there any guidelines or documentation on usage of CUDA consumer/producer with EGL Streams?
Hi,
The document describes the functions and the sample is for demonstration. In the sample code, the consumer/producer is initialized and destroyed along with the process. After applying the patch, the process are alive and consumer/producer is initialize and destroyed in loop. This is not same as the sample and may not work properly.
For your use-case, one possible solution is to have consumer/producer daemon, once you need a consumer-producer connection, can fork one child process for consumer and one child process for producer. So that after the consumer-producer connection is done, you can destroy consumer/producer and exit the processes.