Multiple instances of NvEglRenderer

Hello

I am using Jetson TX2 for h264 video decoding and rendering with Tegra API JetPack 4.3 Ubuntu 18.04 and I have a question about right using of NvEglRenderer.

I need to render 3 NvBuffer into 3 different X windows in 1 process. So I create 3 separate threads for rendering. I create NvEglRenderer with createEglRenderer() function inside each thread and set fps to 25. But it gives me something about 20 fps.

I try to reproduce this on 00_video_decode sample. When I run 3 separate processes of application I get 25 fps easily. I even run 9 processes in the same time and they work good. I add 2 more NvEglRenderer in context_t struct of 00_video_decode sample like this:

NvEglRenderer *renderer;
NvEglRenderer *renderer1;
NvEglRenderer *renderer2;

create them in function static void query_and_set_capture(context_t * ctx):

ctx->renderer = NvEglRenderer::createEglRenderer("renderer0", window_width, window_height, ctx->window_x, ctx->window_y);
ctx->renderer1 = NvEglRenderer::createEglRenderer("renderer1", window_width, window_height, ctx->window_x, ctx->window_y);
ctx->renderer2 = NvEglRenderer::createEglRenderer("renderer2", window_width, window_height, ctx->window_x, ctx->window_y);

set fps:

ctx->renderer->setFPS(ctx->fps);
ctx->renderer1->setFPS(ctx->fps);
ctx->renderer2->setFPS(ctx->fps);

and add calls NvEglRenderer::render(int fd) for each new NvEglRenderer where it is needed.

I get something about 20 fps again.

I add profiling inside NvEglRenderer.cpp in my application. It shows me that NvEglRenderer spend most of its time on condition variable waiting when I launch 1 NvEglRenderer in 1 process. But when I run 3 NvEglRenderer in 3 separate threads in 1 process it spend most of its time on calling egl functions: glEGLImageTargetTexture2DOES, eglCreateSyncKHR, eglClientWaitSyncKHR etc.

It looks like NvEglRenderer::createEglRenderer() makes it possible to use multiple NvEglRenderer in 1 process. So it is not obvious to me why I can’t get 25 fps. What I am doing wrong? Is it possible to use more than 1 NvEglRenderer in 1 process?

Hi,
It looks like you create three renders in one thread. Suggest you create three threads and each thread has single renderer.

Thanks for reply.

Yes. For 00_video_decode sample I create all NvEglRenderers in 1 thread just for quick example. As I said above in my other application I definitely created each NvEglRenderer in separate thread. I will implement separate thread’s rendering in 00_video_decode sample tomorrow. I will provide code.

So I have second question. Why it is impossible to use multiple NvEglRenderers in 1 thread?
NvEglRenderer has its own thread inside its implementation. All egl things are initialized in void* NvEglRenderer::renderThread(void *arg). This method is launched by pthread_create(&render_thread, NULL, renderThread, this); from NvEglRenderer constructor. NvEglRenderer make this pthread_cond_timedwait(&render_cond, &render_lock, &last_render_time); inside method int NvEglRenderer::renderInternal(). It is waiting for timestamp and not for time period. NvEglRenderer rendering loop works in its own thread. Multiple NvEglRenderers should not mess with each other.

I made two examples from 00_video_decode and attach code samples.zip (103.8 KB).
You can unzip it into tegra_multimedia_api directory and run make in ./samples/00_video_decode_profile and in ./samples/00_video_decode_multi_render_single_thread. I launched both applications with line
./video_decode H264 -ww 640 -wh 480 -wx 0 -wy 0 --stats ./sample_outdoor_car_1080p_10fps.h264

First example is 00_video_decode_profile. I added some functions for profiling NvEglRenderer.cpp. 00_video_decode_profile will create json file video_decode_original.json. You can view it with chrome browser. Just type chrome://tracing and click Load button.

Second example is 00_video_decode_multi_render_single_thread. It will run 3 NvEglRenderer in 1 thread. You will see 3 X window: first window top left is (0, 0), second - (0 + window_width, 0), third - (0 + window_width * 2, 0). It will also create profiling json file.

I attach screenshots of this files because I can’t upload .json.


When I launch 1 processes with 1 thread with 1 NvEglRenderer: eglSwapBuffers method takes 5.5 ms. It stay the same when I launch 3 processes with 1 thread with 1 NvEglRenderer.

But when I launch 1 processes with 1 thread with 3 NvEglRenderers: eglSwapBuffers method takes 14.5 ms.
Why?
I think that it is very close case as 1 process with 3 threads with 1 NvEglRenderer in each.

Hi,
Please check the source code of NvEglRenderer in

/usr/src/jetson_multimedia_api/samples/common/classes/NvEglRenderer.cpp

It calculates the waiting time per fps setting and wait in the function call:

        pthread_cond_timedwait(&render_cond, &render_lock,
                &last_render_time);

So if you run three renders in one thread, it may affect each other.

Please, look closer at my pictures. I make some notes.

The first picture.


This is the case 1 process with 1 thread with 1 NvEglRenderer. You can see that pthread_cond_timedwait call takes 25.9 ms. It is inside red rectangle. It’s ok. eglSwapBuffers call takes 5.5 ms. Let it be ok.

The second picture.


This is the case 1 processes with 1 thread with 3 NvEglRenderers. You can see that pthread_cond_timedwait call takes 0.025 ms. pthread_cond_timedwait is not the reason of slow down. It is not waiting anything. The reason of slow down is eglSwapBuffers call that takes 14.635 ms. There is no any parallel work of NvEglRenderers so I don’t see any reason why eglSwapBuffers call become 3 times slower.

I will make multithread example today.

I have made multithread example from 00_video_decode.
It is called 00_video_decode_multi_render_multi_thread. New samples version is attached. samples_04_08.zip (126.8 KB)
I use this line for launch:
./video_decode H264 -ww 640 -wh 480 -wx 0 -wy 0 --stats ./sample_outdoor_car_1080p_10fps.h264
You will get json profiling file video_decode_mr_mt.json on the end of application.

I don’t synchronize decoders thread with renderers threads. Decoder is working on max speed.
This is resulting stats:

----------- Element = dec0 -----------
Total Profiling time = 1.46047
Average FPS = 403.98
Total units processed = 591
-------------------------------------
----------- Element = 0 -----------
Total Profiling time = 1.39747
Average FPS = 20.0362
Total units processed = 29
Num. of late units = 28
-------------------------------------
----------- Element = 1 -----------
Total Profiling time = 1.34926
Average FPS = 19.2698
Total units processed = 27
Num. of late units = 26
-------------------------------------
----------- Element = 2 -----------
Total Profiling time = 1.35011
Average FPS = 20.7391
Total units processed = 29
Num. of late units = 28
-------------------------------------

Element 0, 1 and 2 are NvEglRenderers. Fps was set to 30 but I can get only 20 fps.

This is profiling picture:


It shows that all NvEglRenderers work in parallel. They don’t wait in pthread_cond_timedwait. They work slow because of egl calls.

Hi,
Thanks for sharing the sample. We will check and update.

Hello,
Are there any updates on my issue?

Hi,
We can observe the issue. It is under investigation. Will update when there is new finding.

Hi,
We have it fixed in r32.5. Please give it a try.