I’m finishing my Holoscan application and I’m in the testing phase, I noticed that the Holoscan latency is very high and the real-time results are very compromised. Below I have attached the timetable for each operator. I thought 1 Holoscan Operator handled 1 frame independently from other Operator so that all operators where occupied, but what I’m seeing is that the entire pipeline handles each frame independently, so this is a bad pipeline for my time-real purpose. My visualizer is only displaying ~15 fps and that’s not my goal. If my pipeline has 70ms execution time, will it only handle one frame every 70ms? Can you please give more information about the use of Schedulers (Schedulers - NVIDIA Docs) and if this can solve my problem?
Replayer:
Timestamps difference: 24.145 ms
ImageProcessing:
Timestamps difference: 18.289 ms
Preprocessor:
Timestamps difference: 1.213 ms
Inference:
Timestamps difference: 23.861 ms
Postprocessor:
Timestamps difference: 0.275 ms
PostImageProcessing:
Timestamps difference: 2.695 ms
Viz:
Timestamps difference: 1.575 ms
If my pipeline has 70ms execution time, will it only handle one frame every 70ms?
No, it is possible to start processing the next frame while a previous frame is still being processed by other operators. The Multi-threaded scheduler (MTS) and Event-based Scheduler (EBS) will do that.
MTS and EBS scheduler cannot improve end-to-end latency of an application. They can only improve the throughput of an application. End-to-end latency is the amount of time a frame takes to go from source to sink of an application. Throughput is the number of frames coming out of an application per time unit.
At this moment MTS and EBS have some issues which are being fixed, that may lead to a higher latency than the greedy scheduler.
Operator do not run independently in a pipeline. The operators are connected by double-buffer queues and scheduling conditions on those queues. Therefore, the operators have data dependency between the adjacent vertices in an application graph.
As i am using the older version of Holoscan SDK 1.0.3 i tested MTS and evaluated the performance using Nsight Systems Profiler, down attached the output of Greedy Scheduler and for MTS. My results where that the usage of MTS incresase the Latency as expected but didn’t increase the troughput. Why is Greedy scheduler using the 12 cores? MTS never overlap any operator, i was not expecting that.
Actually looking at your pipeline from the first message, it looks like your application graph is of a linear path, in this case it is expected. When there are multiple parallel branches to the pipeline, then it is possible for the MTS to bring more throughput and reduce latency compared to the greedy scheduler.
This looks strange - how did the inference time become ~59-70 ms in the MTS application when with Greedy, it was ~20 ms? Was the scheduler the only thing that has changed?
Yes, the graph of my application has a linear path, but I thought that when a frame is in Inference, for example, the next frame could start being processed in Imageprocessing, instead of waiting until the 1st frame reaches the last operator. So that’s not possible right? I am sticked to 22 fps troughput.
GXF’s nvidia::gxf::BroadcastCodelet has a “round robin” mode that replaces broadcast behavior with alternately sending the input message to a different output port in turn. For Holoscan 2.1 that could be used via the new GXFCodeletOp.
This pattern is not available with off-the shelf operators now, but you could implement these operators. E.g. RoundRobinOp(Operator), GatherOp(Operator). Another operator may be necessary to handle messages from the RoundRobinOp, if the individual InferenceOp is not ready to receive the next event yet. And you would need to pay attention to the configuration of the input/output ports. Specifically ConditionType.NONE needs to be used at all input ports, otherwise the compute method is not be executed until messages at each port are available.
Feel free to test this approach, and give us feedback here. There will be an example of this in the future, but I can not exactly say when yet.