Jetson orin nano fp16/int8 performance

Hi admin,
The jetson orin nano is the same architecture as the 3060Ti.Whether it’s the number of CUDAs or the clock, the nano is not as good as the 3060Ti, and the FP16’s power is about 17 TFLops compared to the 3060Ti.
My actual testing comparing the nano running yolov8s is not as good as the 3060Ti. The orin nano frame rate is probably half that of the 3060Ti

1 .What’s the cause of this?
2. How is this 17 TFLops calculated from nano, and is it fully available to me if I run a yolov8s model?
3. Nano’s FP16 has 17 TFLops, int8 has 33 TOPS, when I use int8’s model the frame rate can only get 1.5 times of FP16’s frame rate, is this normal?


I look forward to getting your answers. Thank you.

Hi,

Could you share how you run the benchmark?

It’s expected to test it with TensorRT or CUTLASS library.
Please note that you can maximize the Orin Nano’s performance via following command (super mode):

$ sudo nvpmodel -m 2
$ sudo jetson_clocks

Thanks.

$ sudo nvpmodel -m 2
$ sudo jetson_clocks
alreay set thses two.
and inference used TensorRT c++.