Time of inference in FP16 and FP32 is the same

aanish.p · July 18, 2022, 2:15am

Using a TX2 NX to build and run a TRT engine. I have made my onnx model and that is being converted into a .trt engine file. I am using a basic MobilenetV2 model here.

The command to build the FP16 model and FP32 model

FP16

trtexec --onnx=onnx_model.onnx --saveEngine=TRTBS1.trt --explicitBatch --fp16

the output verbose in the model building were

[I] Precision: FP32 + FP16

FP32

trtexec --onnx=onnx_model.onnx --saveEngine=TRTBS1.trt --explicitBatch

the output verbose in the model building were

[I] Precision: FP32

The inference time for both the models is exactly the same. which basically means both models are the same fp32. am i right?
If so how do i improve my model performance further by moving the model to FP16 precision? because i have a little accuracy to spare rather that compute.

NVES · July 18, 2022, 2:37am

Hi,
Please refer to below links related custom plugin implementation and sample:

While IPluginV2 and IPluginV2Ext interfaces are still supported for backward compatibility with TensorRT 5.1 and 6.0.x respectively, however, we recommend that you write new plugins or refactor existing ones to target the IPluginV2DynamicExt or IPluginV2IOExt interfaces instead.

Thanks!

aanish.p · July 18, 2022, 6:06am

hey @NVES, I do not think i understand why i want to alter onnx graph cause i have no custom layers its just the standard MobileNetV2.
I need to know if the .trt engine can be made faster by moving to FP16 precision instead of FP32? If so what would the process be?

spolisetty · July 18, 2022, 5:36pm

Hi,

Which version of the TensorRT are you using?
It’s possible if many layers end up falling back to FP32. TensorRT automatically chooses the best kernel out of available precisions.
Please check the verbose logs. And share with us the ONNX model and verbose logs.

Thank you.

aanish.p · July 18, 2022, 5:39pm

TensorRT 8.0.1.6
sure will share the onnx model and the verbose log

aanish.p · July 19, 2022, 7:43am

Here is the onnx model and the verbose
log.txt (1.0 MB)
onnx_model.onnx (8.5 MB)

spolisetty · July 22, 2022, 6:30am

Hi,

We could get FP16 faster than FP32.

FP16 (internally TRT takes FP16+FP32) logs:

=== Performance summary ===
[07/21/2022-14:10:40] [I] Throughput: 2251.24 qps
[07/21/2022-14:10:40] [I] Latency: min = 0.530518 ms, max = 1.86304 ms, mean = 0.567591 ms, median = 0.565918 ms, percentile(99%) = 0.594482 ms
[07/21/2022-14:10:40] [I] End-to-End Host Latency: min = 0.569336 ms, max = 1.93237 ms, mean = 0.788129 ms, median = 0.821899 ms, percentile(99%) = 0.861084 ms

FP32 logs:

=== Performance summary ===
[07/21/2022-14:54:35] [I] Throughput: 948.788 qps
[07/21/2022-14:54:35] [I] Latency: min = 1.1499 ms, max = 1.37012 ms, mean = 1.17979 ms, median = 1.17896 ms, percentile(99%) = 1.2085 ms
[07/21/2022-14:54:35] [I] End-to-End Host Latency: min = 1.17493 ms, max = 2.19727 ms, mean = 1.95792 ms, median = 1.9646 ms, percentile(99%) = 2.04865 ms

If we observe latency in FP16 is better than the FP32.

Thank you.

aanish.p · July 22, 2022, 7:37am

Hi, this is strange because i get much higher time i get around 7-9ms are you sure this is run on the Tx2 Nx?

aanish.p · July 22, 2022, 7:39am

@spolisetty can you share the trtexec build commands to verify.

spolisetty · July 22, 2022, 7:42am

Hi,

We have not verified on the TX2 NX. Please try the following commands on TX2 NX.
If you still face this issue, we would like to move this post to the TX2 NX forum to get better help.

FP16
/opt/tensorrt/bin/trtexec --onnx=onnx_model.onnx --verbose --workspace=5000 --fp16

FP32
/opt/tensorrt/bin/trtexec --onnx=onnx_model.onnx --verbose --workspace=5000

Thank you.

aanish.p · July 22, 2022, 7:54am

sure

aanish.p · July 22, 2022, 9:56am

FP16

[07/22/2022-14:45:38] [I] === Performance summary ===
[07/22/2022-14:45:38] [I] Throughput: 175.887 qps
[07/22/2022-14:45:38] [I] Latency: min = 5.52954 ms, max = 7.27454 ms, mean = 5.6736 ms, median = 5.65527 ms, percentile(99%) = 6.5177 ms

FP32

[07/22/2022-14:53:46] [I] === Performance summary ===
[07/22/2022-14:53:46] [I] Throughput: 164.369 qps
[07/22/2022-14:53:46] [I] Latency: min = 5.95654 ms, max = 8.1095 ms, mean = 6.07347 ms, median = 6.0498 ms, percentile(99%) = 6.71985 ms

is this all there is to gain ?

spolisetty · July 25, 2022, 2:23pm

Hi,

We see some improvement in the case of FP16. To get more improvement, you can try increasing the workspace and also INT8.

Thank you.

aanish.p · July 25, 2022, 3:01pm

sure will try and let you know

aanish.p · July 27, 2022, 10:05am

hey @spolisetty, increasing the workspace has not made any gain this might be because we have already used more than what is needed at max. the tx2 nx gpu doesnt support int8 i believe, because of which there is no time improvements when i ran the build in int8.

AastaLLL · August 4, 2022, 8:13am

Hi,

As you said, TX2 doesn’t support INT8 operation.
Only FP32 and FP16 are available.

Have you maximized the device performance before the profiling?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

More, it’s also recommended to upgrade to TensorRT 8.2 which is included in the JetPack 4.6.2.
Thanks.

aanish.p · August 4, 2022, 9:24am

hey @AastaLLL yes the tx2 is running at max power and the jetson clocks were running. Unfortunately the jetpack4.6.2 bsp is not available from the carrier board manufacturer. Is there an alternative to this?

AastaLLL · August 5, 2022, 5:32am

Hi,

Does JetPack 4.6.1 work for you?
Or you need the JetPack 4.6 for the device.

Thanks.

aanish.p · August 5, 2022, 5:35am

the latest one that is available and the one i am running on it the 4.6

kayccc · August 10, 2022, 4:41am

Please contact with the board vendor to have them update to newer JetPack. Thanks

Topic		Replies	Views
NX & TRT & Jetson-inference - Not setting precision to INT8 Jetson Xavier NX tensorrt , jetson-inference	4	869	October 18, 2021
Different FP16 inference with tensorrt and pytorch TensorRT	5	4497	October 25, 2021
How can we know we have convert the onnx to int8trt rather than Float32? TensorRT tensorrt	23	1890	June 14, 2021
Low Compute utilization of converted TensorFlow model during inference Jetson TX2	19	1700	October 18, 2021
Inference is so slow with torch1.6 Jetson Xavier NX nvbugs , pytorch	12	3544	October 23, 2020
Extreme engine building time for certain models on Windows with FP16 TensorRT	6	1210	March 23, 2022
TRT Uses INT 32 VS INT 16 TensorRT	3	1020	October 12, 2021
High RAM consumption with CUDA and TensorRT on Jetson Xavier NX Jetson Xavier NX tensorrt	10	2872	October 18, 2021
Human pose detection model (MoveNet) TensorRT conversion on NVIDIA Jetson Jetson Xavier NX tensorrt , tensorflow , jetson-inference	7	2640	June 16, 2022
Inswapper onnx model conversion to tensorrt model Jetson AGX Orin tensorrt , onnx	29	1020	January 8, 2025

Time of inference in FP16 and FP32 is the same

Related topics