TensorRT frames processing speed increases with increase in number of frames

I am noticing something strange when I am inferencing from a TensorsRT graph. As I inference more frames in series the overall time per frame reduces. The data is as follows,

1frame- 6sec --0.1FPS
3frames-12sec --0.25FPS
30frames- 6sec --5FPS
100frames- 7.25sec --13.7FPS
1000frames- 31.337sec --32FPS
10000frames- 175.118sec --57FPS
100000frames-1664.778sec --60FPS

I have also calculated the time ignoring the first 15 inferences calls and it appears to follow the same pattern. So this rules out the time to initialize the graph for the first few inferences.

This model is a simple MobileNetV2 and running on a jetson nano 4gb.

code snippet of inference

for i in range(n_frames):
    output = frozen_func(get_img_tensor(i))[0].numpy()
print("time taken - ",end_time-start_time)

keen to know what is happening here and do let me know if additional info is required.

Thank You


The difference comes from the fps of latency or throughput.

For the 1x frame use case, TensorRT needs to run the model layer by layer since the input/output has strong dependency.
So the fps score reflects the overall latency of your MobileNetV2 model.

However, if the model doesn’t occupy all the GPU resources.
TensorRT can run the inference for different frames (inputs) in parallel.
This will increase the throughput fps result.

You can validate this via monitoring the GPU utilization with tegrastats.
The percentage should be proportional to the number of input frames.

$ sudo tegrastats


I am running the frames in series and only after one process is over does the other frame enter so i which there is no way of parallel processing can take place.
I have noticed the gpu usage an its keeps fluctuating from 65-80% during inference and its always the same irrespective of the number of frames being inferenced.


So it seems that bottleneck is from the CPU side.
Not sure which library do you use for pre-processing.
But it’s known that some image library (ex. OpenCV) is relatively slow on Jetson.

Maybe you can give our Deepstream library a try.