Performance of DS pipeline with back to back primary detectors

My question is about using back to back primary detectors in a DS pipeline. When I do so based on (ref:https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/tree/master/back-to-back-detectors) in my DS pipeline, I see the total time that the pipeline takes to complete doubles. I am using the same model configuration baed on yolo_V3 in the two detectors. For capturing the pipeline playing time:

profile_start();
gst_element_set_state(pipeline, GST_STATE_PLAYING);
g_main_loop_run(loop);

/* Out of the main loop, clean up nicely */
g_print(“Returned, stopping playback\n”);
profile_end();
gst_element_set_state(pipeline, GST_STATE_NULL);

The profile_start() and profile_end() are from the deepstream perf demo sample:

deepstream_sdk_v4.0.2_x86_64/sources/apps/sample_apps/deepstream-perf-demo

My pipeline is set up as follows:

nvstreammux -> gst-dsexample->nvinfer(pgie) -> nvinfer(pgie)->nvtiler -> nvvidconv -> nvosd -> fakesink

Is this behavior expected in this scenario and since the nvinfer elements cannot run asynchronously in primary mode?

Is this behavior expected in this scenario and since the nvinfer elements cannot run asynchronously in primary mode?

My reading of the documentation (correct me if I’m wrong, Nvidia) is, yes, this would be expected. I think the idea is that you have a single model to do detections and potentially multiple elements down the pipeline to do classifications and such asynchronously.

You could try doing it in parallel using tees, queues, but even if you can do that, I don’t think you can mux the metadata back together without writing your own custom plugin to resync and handle that. That would be hard, this configuration is untested, and others have tried this unsucessfully.

Since nvinfer is open source, it might be easier to hack the nvinfer element to support async in primary mode, or to run multiple models. That’s a lot of work, however, and in the end it’s probably not the “right way” compared to having a single model.

Thank you for the response. I did try running the models with a tee and two queues, one queue for each model but the total pipeline playing time remains the same.

Since nvinfer is open source, it might be easier to hack the nvinfer element to support async in primary mode, or to run multiple models.

I do agree that trying to hack nvinfer to get it to work in async is a of lot work.

Yeah, the configuration is unsupported and untested, so if it doensn’t work, you’ll have to modify the plugin, I think. I just had a read through and if you know some CUDA and C++ you might be able to manage, but it’s still a fair amount of work.

Anyway, here is another thread with somebody trying the same thing and some suggestions from Nvidia rep DaneLLL. His suggestion is what you’ve already tried, however (running two in series).