Slow inference using tensorrt sampleFasterRCNN, 320ms/frame

hrsht.sarma · February 12, 2018, 2:09pm

Hi everyone

I finally managed to run tensorrt implementation on Jetson TX2 and fasterrcnn sample as provided in /usr/src/tensorrt. Although execution time per frame comes around 320 ms which is way slower than ~20 FPS.

I also tried running jetson_clock.sh to see if it improves anything. But there is no improvement.

To find time taken , I am using following :

// run inference
  long long time_start = chrono::duration_cast<chrono::milliseconds>(chrono::steady_clock::now().time_since_epoch()).count();
	doInference(*context, data, imInfo, bboxPreds, clsProbs, rois, N);
  long long time_end = chrono::duration_cast<chrono::milliseconds>(chrono::steady_clock::now().time_since_epoch()).count() ;
  cout << "Total time =" <<(time_end - time_start)<< std::endl;

Will be great if someone can tell me what needs to be done to achieve even 10 FPS using tensorrt on the given example.
Fyi : I installed jetpack 3.1

hrsht.sarma · February 13, 2018, 5:59am

I also tried running jetson_clock.sh , but inference time per frame is same as before 320ms which is way less than expected 10-20 FPS

AastaLLL · February 13, 2018, 7:11am

Hi,

Please use our DetectNet sample to get 10fps object detection pipeline:
[url]https://github.com/dusty-nv/jetson-inference#locating-object-coordinates-using-detectnet[/url]

The target of sampleFasterRCNN is to demonstrate plugin API implementation.
Please use DetectNet for better performance.

Thanks.

hrsht.sarma · February 13, 2018, 7:34am

Can we finetune provided FasterRCNN to achieve better FPS ?

AastaLLL · February 21, 2018, 6:16am

Hi,

1. In sampleFasterRCNN, please noticed that we by default set batchSize=2.

2. In doInference() function, it contains memory allocation, copy buffer from host to device, inference, copy buffer from device to host, release memory.
Usually, you only need to apply memory allocation/release when initialization.
For zero-copy pipeline, ex. MMAPI sample, you don’t need to transfer data between host and device.

So, try to set batchsize=1 and optimize the pipeline with the zero-copy sample of MMAPI.
Thanks.

Topic		Replies	Views
How can I finetune the TensorRT faster RCNN Sample? Jetson TX2	13	5417	November 26, 2018
Is it possible to run Faster-RCNN in TensorRT with Jetson-TX2 at real-time FPS? Jetson TX2	2	976	November 29, 2018
Object Detection models and FPS Jetson TX2	3	1154	September 18, 2019
Can we get real-time fps(30fps) with Faster RCNN caffemodel, TensorRT and Jetson TX2? Jetson TX2	5	3400	January 21, 2019
Jetson TX2 working at full capacity for a object detection model Inference Jetson TX2 jetson-inference	3	927	February 17, 2022
Slow object detection speed Xavier AGX 32GB Jetson AGX Xavier tensorrt , tensorflow	5	1361	May 20, 2020
Increasing fps on Jetson TX2 for a Tensorflow algorithm Jetson TX2	6	2693	February 8, 2018
Slow inference on jetson TX2 with tensorflow Jetson TX2	1	683	February 15, 2019
Performance difference between jetson-inference & object-detection-tensorrt-example on TX2 Jetson TX2 jetson-inference	4	888	December 1, 2020
[urgent] FasterRCNN example - running slowly DeepStream SDK	4	1017	October 7, 2020

Slow inference using tensorrt sampleFasterRCNN, 320ms/frame

Related topics