Holoscan latency very high using AJA capture card video input

Afonso_Martingo · May 20, 2024, 6:22pm

Hi,

I’m finishing my Holoscan application and I’m in the testing phase, I noticed that the Holoscan latency is very high and the real-time results are very compromised. Below I have attached the timetable for each operator. I thought 1 Holoscan Operator handled 1 frame independently from other Operator so that all operators where occupied, but what I’m seeing is that the entire pipeline handles each frame independently, so this is a bad pipeline for my time-real purpose. My visualizer is only displaying ~15 fps and that’s not my goal. If my pipeline has 70ms execution time, will it only handle one frame every 70ms? Can you please give more information about the use of Schedulers (Schedulers - NVIDIA Docs) and if this can solve my problem?

Replayer:
Timestamps difference: 24.145 ms

ImageProcessing:
Timestamps difference: 18.289 ms

Preprocessor:
Timestamps difference: 1.213 ms

Inference:
Timestamps difference: 23.861 ms

Postprocessor:
Timestamps difference: 0.275 ms

PostImageProcessing:
Timestamps difference: 2.695 ms

Viz:
Timestamps difference: 1.575 ms

Thanks

jinl · May 24, 2024, 5:54pm

Hello,

If my pipeline has 70ms execution time, will it only handle one frame every 70ms?

No, it is possible to start processing the next frame while a previous frame is still being processed by other operators. The Multi-threaded scheduler (MTS) and Event-based Scheduler (EBS) will do that.
MTS and EBS scheduler cannot improve end-to-end latency of an application. They can only improve the throughput of an application. End-to-end latency is the amount of time a frame takes to go from source to sink of an application. Throughput is the number of frames coming out of an application per time unit.
At this moment MTS and EBS have some issues which are being fixed, that may lead to a higher latency than the greedy scheduler.
Operator do not run independently in a pipeline. The operators are connected by double-buffer queues and scheduling conditions on those queues. Therefore, the operators have data dependency between the adjacent vertices in an application graph.

Afonso_Martingo · June 6, 2024, 12:50pm

As i am using the older version of Holoscan SDK 1.0.3 i tested MTS and evaluated the performance using Nsight Systems Profiler, down attached the output of Greedy Scheduler and for MTS. My results where that the usage of MTS incresase the Latency as expected but didn’t increase the troughput. Why is Greedy scheduler using the 12 cores? MTS never overlap any operator, i was not expecting that.

Greedy

MTS

MTS2

jinl · June 6, 2024, 7:09pm

Actually looking at your pipeline from the first message, it looks like your application graph is of a linear path, in this case it is expected. When there are multiple parallel branches to the pipeline, then it is possible for the MTS to bring more throughput and reduce latency compared to the greedy scheduler.

jinl · June 6, 2024, 7:12pm

This looks strange - how did the inference time become ~59-70 ms in the MTS application when with Greedy, it was ~20 ms? Was the scheduler the only thing that has changed?

Afonso_Martingo · June 6, 2024, 7:20pm

Yes, the only thing that i changed was add the following MTS code

scheduler = scheduler_class(
            app,
            worker_thread_number=12,
            stop_on_deadlock=False,
            stop_on_deadlock_timeout=50,
            check_recession_period_ms=15,
        )
    app.scheduler(scheduler)

Afonso_Martingo · June 6, 2024, 7:27pm

Yes, the graph of my application has a linear path, but I thought that when a frame is in Inference, for example, the next frame could start being processed in Imageprocessing, instead of waiting until the 1st frame reaches the last operator. So that’s not possible right? I am sticked to 22 fps troughput.

My pipeline path:

Path1: AJAsource->DropalphaChannel->Imageprocessing->FormatConverter->Inference->PostInferenceOp->PostImageProcessing->Holoviz

Path2: AJAsource->Holoviz

mofir · June 12, 2024, 9:21am

A summary of some internal discussion:

GXF’s nvidia::gxf::BroadcastCodelet has a “round robin” mode that replaces broadcast behavior with alternately sending the input message to a different output port in turn. For Holoscan 2.1 that could be used via the new GXFCodeletOp.

This pattern is not available with off-the shelf operators now, but you could implement these operators. E.g. RoundRobinOp(Operator), GatherOp(Operator). Another operator may be necessary to handle messages from the RoundRobinOp, if the individual InferenceOp is not ready to receive the next event yet. And you would need to pay attention to the configuration of the input/output ports. Specifically ConditionType.NONE needs to be used at all input ports, otherwise the compute method is not be executed until messages at each port are available.

Feel free to test this approach, and give us feedback here. There will be an example of this in the future, but I can not exactly say when yet.

system · June 26, 2024, 11:46pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
First frame decoding taking longer time than all other frames Jetson AGX Xavier decoder , gstreamer	8	1046	August 5, 2022
Inference with deepstream yolov5s-3.0 on 2 camera long delay (20-25s) DeepStream SDK	18	2270	October 12, 2021
Gstreamer Encode framerate drops over long time Jetson TX1	11	2567	October 18, 2021
Nvidia jetson detectnet increasing latency Jetson Nano jetson-inference , ai	9	1656	October 15, 2021
Remove gstreamer pipeline buffering DeepStream SDK gstreamer , deepstream	15	1478	October 2, 2023
Streaming video latency Jetson TX2	6	1892	October 18, 2021
Latency issues when launching kernel in a gstreamer plugin Jetson TK1	11	902	November 19, 2018
Problems with Video on Custom Board Rev.2 Fitted with ADV7282-M Chip Jetson TX2 camera , board-design , device-tree	20	941	July 21, 2023
Delay in one of the multistream tiler outputs DeepStream SDK rtsp	11	1648	October 12, 2021
Reducing GPU Idle Time CUDA Programming and Performance	19	4402	June 14, 2022

Holoscan latency very high using AJA capture card video input

Related topics