Why two NvEglRenderer on two threads cannot run at maximum fps in Jetson 35.2.1 (it worked in 35.1.0)?

Hi,
According to this old post it should be possible to create 2 threads with each thread creating and running an instance of NvEglRenderer:

And I had it working on Jetson 35.1.0 with each thread achieving 60 fps.
But in Jetson 35.2.1 they run at 30 fps each.

Below I copied the smallest working code, which demonstrates the problem.

How to restore the good old functionality?

Thank you

/*
Usage
DISPLAY=:0 ./egl_renderer pos_x offset_x frame_count num_thread

For example, run one thread:

DISPLAY=:0 ./egl_renderer 0 0 100 1
prints
100 frames in 1653 ms fps: 60.485637

Run 2 threads, one thread per display:

DISPLAY=:0 ./egl_renderer 0 2000 100 2
On Jetson 35.2.1 it prints:
pos_x 0 100 frames in 2987 ms fps: 33.475959
pos_x 2000 100 frames in 3324 ms fps: 30.083336
On Jetson 35.1.0 it prints:
pos_x 0 100 frames in 1655 ms fps: 60.416307
pos_x 2000 100 frames in 1651 ms fps: 60.537270
*/

include “Error.h”
include “Thread.h”

include <Argus/Argus.h>
include <EGLStream/EGLStream.h>
include <EGLStream/NV/ImageNativeBuffer.h>

include “NvBufSurface.h”
include <NvEglRenderer.h>

include <stdio.h>
include <stdlib.h>
include

using namespace Argus;
using namespace EGLStream;

define FPS 60
define WIDTH 500
define HEIGHT 500

int num_frames = 0;

class RendererThread : public ArgusSamples::Thread
{
public:
RendererThread(int pos_x) : m_pos_x(pos_x) {}

bool threadExecute()
{
    NvEglRenderer* renderer = NULL;
    renderer = NvEglRenderer::createEglRenderer("renderer", WIDTH, HEIGHT, m_pos_x, 0);
    renderer->setFPS((float)FPS);

    NvBufSurf::NvCommonAllocateParams input_params = {0};
    input_params.memType = NVBUF_MEM_SURFACE_ARRAY;
    input_params.memtag = NvBufSurfaceTag_NONE;
    input_params.width = WIDTH;
    input_params.height = HEIGHT;
    input_params.layout = NVBUF_LAYOUT_BLOCK_LINEAR;
    input_params.colorFormat = NVBUF_COLOR_FORMAT_RGBA;
    int dmabuf_fd;
    NvBufSurf::NvAllocate(&input_params, 1, &dmabuf_fd);

    struct timespec ts;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    long long time_start_ns = ts.tv_sec * 1000*1000*1000LL + ts.tv_nsec;

    for(int i = 0; i < num_frames; i++)
    {
        renderer->render(dmabuf_fd);
    }

    clock_gettime(CLOCK_MONOTONIC, &ts);
    long long time_end_ns = ts.tv_sec * 1000*1000*1000LL + ts.tv_nsec;
    printf("pos_x %d %d frames in %d ms fps: %f\n",
           m_pos_x,
           num_frames, (int)((time_end_ns - time_start_ns)/1E6),
            num_frames * 1E9 / (time_end_ns - time_start_ns)
    );
    requestShutdown();
    return true;
}
bool threadInitialize() { return true; }
bool threadShutdown() { return true; }

int m_pos_x;

};

int main(int argc, char * argv)
{
int pos_x = argc > 1? strtol(argv[1], NULL, 10) : 0;
int offset_x = argc > 2? strtol(argv[2], NULL, 10) : 1000;
num_frames = argc > 3? strtol(argv[3], NULL, 10) : 1000;
int num_threads = argc > 4? strtol(argv[4], NULL, 10) : 2;

std::vector<std::unique_ptr<RendererThread>> threads;
for(int i = 0; i < num_threads; i++)
{
    threads.push_back(std::make_unique<RendererThread>(pos_x + i*offset_x));
}
for(auto & thread : threads)
{
    thread->initialize();
}
for(auto & thread : threads)
{
    thread->shutdown();
}
return 0;

}

Hi,
We would expect there is single NVEglRenderer in each process. If you have multiple video sources, please composite the sources into single video plane through NvBufSurface APIs. It is expected the rendering rate drops if there are multiple NvEglRender simultaneously rendering video frames.

Compositing makes no sense since I need to show each camera on a separate display.
Compositing would waste a lot of time to merge separate camera frame buffers to one large buffer
and then imediately the system will have to decompose it to separate display frame buffers.

Are you saying that I have no choice but to run each camera/display on separate processes?
Is it possible to pass frame buffer descriptor from one process to another?
Is it possible to restore the functionality of 35.1.0 and earlier, which supported multiple renderers in one process?

Hi,

For multiple sources in single process, we would suggest composite the sources inot single video plane for rendering. This is the solution we use/demonstrate in jetson_multimedia_api and DeepStream SDK.

This is a feature and we are evaluating to support it in future release. The possible solution is to pass NvBufSurface from one process to the other processes.

This is not supported since we have seen certain issues in this use-case. It is more stable to have single NvEglRenderer in each process.

Yes, I am aware of 13_argus_multi_camera example, which does that.
But this makes no sense to copy two images into a single buffer and then immediately copy them apart to separate display buffers. This unnecessary copy will greatly increase latency and waste bandwidth. Please, correct me if I am wrong.
For our low latency pipelines we should keep different camera buffers apart both in space and in time.

Is it possible to do this now or in the future?

Is it a limitation of NvEglRenderer sample class or of underlying EGL API?