Low ViT Performance Gain on Jetson Thor Using FP8 vs FP16

Hello,

Looking at the documentation, to enable fp8 operations you need some onnx surgery (inserting Q/DQ at specific locations) to trigger the right MHA (Multi-Head Attention) fusion in conjunction with fp8 precision.

However, the performance improvement is quite low for base ViT model (~20% latency reduction). It is even worse on the EfficientSAM encoder with basically no gain.

By looking at the profiling and layer info from TensorRT the FP8 seems there (even though some tactics are quite cryptic, especially the gmm_mha_v2_#weirdbitstream).

Environment

  • TensorRT Version: 10.13.3

  • NVIDIA GPU: Thor (Jetson DevKit)

  • NVIDIA Driver Version: 580.00

  • CUDA Version: 13

Relevant Files

Steps To Reproduce

Model Optimizercommit

ViT-Base FP8 onnx generation:

python3 -m modelopt.onnx.quantization --onnx_path=./vit_base_patch8_224_Opset17.onnx --quantize_mode=fp8 --output_path=./vitb_fp8.onnx

EfficientSAM-S FP8 onnx generation:

python3 -m modelopt.onnx.quantization --onnx_path=./efficientsam_s_encoder.onnx --quantize_mode=fp8 --output_path=./sam_s_fp8.onnx

ViT-Base FP8 engine generation:

trtexec --stronglyTyped --onnx=./vitb_fp8.onnx --saveEngine=./vitb_fp8.engine

ViT-Base FP8 engine generation:

trtexec --stronglyTyped --onnx=./sam_s_fp8.onnx --saveEngine=./sam_s_fp8.engine

efficientsam_s_encoder_fp8.profile.txt (14.3 KB)

efficientsam_s_encoder_fp16.profile.txt (14.0 KB)

vit_base_patch8_224_Opset17_fp8.profile.txt (13.7 KB)

vit_base_patch8_224_Opset17_fp16.profile.txt (13.3 KB)

profiles_and_layerinfo.zip (28.9 KB)

Hi,

Thanks for reporting this.
We will try it locally and update you with more information.

1 Like

Hi,

In our test, for EfficientSAM:

FP16: 140.689 qps
FP8: 166.332 qps

Does this align with your experiment?
We set the device to maximum performance before the testing:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Thanks.

1 Like

Does this align with your experiment?

Yes pretty much.

For EfficientSAM (S), with the MAXN profile I have:

  • 147 fps for the fp16 engine
  • 172 fps for the fp8 engine

The gain is ~17% on my part and ~18% on yours.

Hi,

Thanks for the update.

We are gathering more information about this issue with our internal team.
Will update more information with you later

1 Like