Why I can't get 40 FPS for TLT YOLOv3 ResNet18 FP16 in 320x320?

rostislav.etc · November 9, 2020, 1:26pm

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson Nano B01
• DeepStream Version 5.0.1
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs) question
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hi, I’m refer to this doc Performance — DeepStream 6.1.1 Release documentation

YoloV3 – ResNet18 in FP16 on Jetson Nano has 11 FPS in 960x544 resolution. I’ve checked it in Jetson Nano, convert provided models and really get ~11 FPS.

So when I train YoloV3 ResNet18 with Transfer Learning Toolkit in 320x320 and convert it to FP16 in Jetson Nano I get only 20 FPS and 11FPS in FP32 - why?

How inference resolution affect on model performance?
What’s the maximum performance for YOLOv3 Resnet18 on Jetson Nano in 320x320?
How to get the maximum performance YOLOv3 Resnet18 on Jetson Nano in 320x320?

mchi · November 10, 2020, 2:28am

Lower resolution should get almost proportional inference fps increasement, but may need higher batch size.

Could you try higher batch?

rostislav.etc · November 10, 2020, 1:52pm

No, I don’t try higher batch, because referred model doesn’t run with batch.

I think I need to try prunning. Can you accept, that models in reffered doc are prunned? If yes, with which threshold and which method were used?

mchi · November 10, 2020, 1:59pm

Could you please try higher batch and check the total fps?

The model is from TLT, TLT automatically prunes the model during training based on the accuracy threshold you set.

rostislav.etc · November 16, 2020, 8:47am

Ok, I’ve tried higher batch.

I’ve chosen trtexec for measure performance with the following arguments:
--fp16 --batch=X --useSpinWait

There’s results:
Model converted with max batch=1: 51.5673 ms
Model converted with max batch=2, and --batch=1: 53.2138ms
Model converted with max batch=2, and --batch=2: 98.3157ms

Also we’ve tried prunning, and get the followwing results:
Model converted with max batch=1 and prunned threshold -pth=0.1: 24.7978ms
Model converted with max batch=1 and prunned threshold -pth=0.2: 22.2143ms
Model converted with max batch=1 and prunned threshold -pth=0.3: 18.4939ms

Despite of acceleration of performance, we’ve faced with model output issue after converting model to TensorRT. After this operation model get random bboxes and random calsses in random coordinates.

Could you (@2024a) give recommendations to accelerating YoloV3 – ResNet18 in FP16 on Jetson Nano according to reffered doc?

mchi · November 18, 2020, 2:04am

is it possible to share your model and perf measurement steps?

mchi · November 19, 2020, 1:14pm

I just checked the yolov3 network structure, it’s using NMS layer that is is TRT plugin, it should consume most of the inference time.
You can add “–dumpProfile” in the trtexec command. I think it’s because the time of this NMS layer time does not reduce so much from 960x544 to 320x320, so you don’t see expected fps.

Topic		Replies	Views
Low FPS on tensorRT YoloV3 Jetson Nano Jetson Nano tensorrt	2	702	October 15, 2021
Yolov3 in nanojetson Jetson Nano tensorrt	12	1095	October 18, 2021
Speed of FP32 vs FP16 TAO Toolkit	4	1371	October 12, 2021
YOLOv3 TensorRT Inference Super Slow In Nano Jetson Nano	3	1091	October 14, 2021
Inference using FP16 and FP32 precision giving no performance gain on Jetson Nano Jetson Nano	2	1367	October 14, 2021
Jetson nano Not able to give higher FPS Jetson Nano yolo	8	2271	October 15, 2021
No performance improvement on Jetson Nano FP16 vs FP32 TensorRT	6	2700	February 22, 2021
Tiny Yolo v3 Frame Rate Jetson Nano cuda , yolo	2	2253	October 18, 2021
Jetson Nano 16bits vs 32 bits inference performance Jetson Nano tensorrt , jetson-inference , python	2	661	April 18, 2023
Why inference in jetson nano with fp16 is slower than fp32 Jetson Nano tensorrt , jetson-inference	9	1978	September 5, 2021

Why I can't get 40 FPS for TLT YOLOv3 ResNet18 FP16 in 320x320?

Related topics