DMAbuf file descriptors leak in EGL Stream API after disconnect

Hello,

We’re using EGL Stream API to share video frames from single input to multiple applications. There is one process running as a video server, responsible for capturing and sharing frames. Server runs constantly in the background as systemd service. Also, up to 3 additional processes could be started anytime as a video clients.
prod-cons

Issue - On the server (producer) side there is a DMAbuf file descriptors leak after disconnection/reconnection of the clients.

Setup - Jetson TX2-4GB running L4T 32.4.4 with CUDA 10.2.

Test setup

I used EGLStream_CUDA_CrossGPU sample code from cuda-samples repo (latest master branch) as a reference.
Slightly modified Makefile to fix build for CUDA 10.2. As well as did quick modifications to main.cpp. Processes don’t stop and run until termination to emulate many reconnections to producer. Changes prod-cons.patch (3.5 KB) are attached.
Build command - cd Samples/2_Concepts_and_Techniques/EGLStream_CUDA_CrossGPU && make CUDA_PATH=/usr/local/cuda-10.2/ dbg=1. Compiled binary EGLStream_CUDA_CrossGPU (804.2 KB) is attached.

Steps to reproduce:
Start consumer first - ./EGLStream_CUDA_CrossGPU
Consumer waits for producer.
In the second terminal start producer - ./EGLStream_CUDA_CrossGPU -proctype prod
Run script to collect the FD usage statistics for both processes - ./cuda_test_fd.sh.
Script cuda_test_fd.sh (398 Bytes) is attached.
Press Enter button in Consumer terminal to repeat iteration. Collect statistics again and check that DMAbuf FDs are leaking.
According to the code, stream and frames should be destroyed (consequently, DMAbuf FDs should be closed) after each iteration, but in fact number of FDs grows very fast.

Where could be the issue? Is something wrong in sample code with sequence of Stream termination?
Any help is appreciated.
Thanks!

Hi,
The demo of EGL producer/consumer is shared in this post:
Problems getting EGL Stream transferred to another process on same machine - #7 by DaneLLL

Please check if you can make a patch on this set up so that we can run and reproduce it. And would be great if you can upgrade to Jetpack 4.6.2 and try.

Hi,
Thanks for quick reply.

I’m aware about this demo. But our case is different. We have CUDA producer/consumer instead of OpenGL producer/consumer used in eglstreamcube/ctree/gears. That’s why I’m using demo from official Nvidia’s cuda-samples repo - EGLStream_CUDA_CrossGPU. It matches our case.
Patch and steps for reproducing are attached to original post under “Test setup” section. Just in case, “Test setup” section is collapsed for convenience. All technical details are there. Using them you can easily reproduce issue on your side.

I tried latest L4T 32.7.2 on my setup. Observe the same issue. I used same code and steps, as described in my original post under Test setup section.

Thanks,
Vlad.

Hi,
We will set up and try to replicate the issue first. And then do further investigation.

Hi
Please check this user guide It is for Drive platform but similar on Jetson platforms. Generally we create consumer first and then producer. And consumer can be alive to wait for producers. It is same as this demo:
Problems getting EGL Stream transferred to another process on same machine - #7 by DaneLLL

Please check if you can adapt to this standard way.

Hi,
Have you been able to replicate issue on your side according to my steps?

Hi,
No. We check the code and it is a bit strange the producer/consumer is destroyed/re-initialized in a loop but the process is still alive. It is more reasonable the producer/consumer is not destroyed if the process is alive. To re-use same producer and cosumer.

Thanks for pointing me out.

I checked this user guide. It’s the same as used in L4T documentation for 32.4.4 release. Wasn’t able to find something new there.

We’re following these guidelines. And looks like in github samples from Nvidia you’re also following them. But the difference is that in our case and in EGLStream_CUDA_CrossGPU case CUDA consumer/producer are used instead of OpenGL ones.

In my test setup (based on EGLStream_CUDA_CrossGPU) consumer was created first. As you’re suggesting.

Thanks,
Vlad.

I did that intentionally to simulate behavior closer to our real case. We can’t keep consumers always running, because they’re desktop applications. Users of the system can open/close them at will any time.

  1. Do you mean that EGL stream implementation can’t properly support dynamic creation/destroying of multiple Streams?
  2. Are there any guidelines or documentation on usage of CUDA consumer/producer with EGL Streams?

Thanks!

Hi,
The document describes the functions and the sample is for demonstration. In the sample code, the consumer/producer is initialized and destroyed along with the process. After applying the patch, the process are alive and consumer/producer is initialize and destroyed in loop. This is not same as the sample and may not work properly.

For your use-case, one possible solution is to have consumer/producer daemon, once you need a consumer-producer connection, can fork one child process for consumer and one child process for producer. So that after the consumer-producer connection is done, you can destroy consumer/producer and exit the processes.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.