Why weights of enigne model is INT8, while bias is FLOAT 32?

I converted .onnx to engine file, but when I check graph json file, I saw that weights of layer have int8 format, bit bias has float format as follow:

Weights: {'Type': 'Int8', 'Count': 864}
Bias: {'Type': 'Float', 'Count': 32}

I think in one generated layers, weights and bias have to have the same format. How can I force to generate bias with int8 format to increase speed of engine model?


Could you please share with us complete verbose logs while building the engine and graph json file for better help.

Thank you.

@spolisetty I attached verbose logs when building engine and graph.json. I use this repo to build engine, it used Tensorrt Python API. It seems that, all biases is FP32 in engine model. Thanks
graph.json (239.4 KB)

log.md (7.2 KB)

Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet


import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging

I use this repo (Tensorrt Python API GitHub - Linaom1214/TensorRT-For-YOLO-Series: tensorrt for yolo series (YOLOv8, YOLOv7, YOLOv6, YOLOv5), nms plugin support) to convert .onnx to .engine. If I enable -v for verbose in the repo, log is very long in terminal and I can not save all. Currently, I am not using trtexec to convert .onnx model to fp32 and .int8 engine.

@AakankshaS I checked with this code, nothing is in output. It means that my onnx model is valid.

@spolisetty @AakankshaS
Thanks. I used this repo GitHub - Linaom1214/TensorRT-For-YOLO-Series: tensorrt for yolo series (YOLOv8, YOLOv7, YOLOv6, YOLOv5), nms plugin support (tensorrt python API to convert .onnx to .int8 model) and I can log all verbose output. I attach .onnx file, graph.json file (weights is INT8 and bias is FLOAT in the same). How to fix in the Tensorrt Python API?
graph_new.json (238.5 KB)

log.txt (12.0 MB)

Because file .onnx is large (>100 MB), I attach drive link here to .onnx yolov7_cach3_KOdynamic.onnx - Google Drive you can download it.

@spolisetty @AakankshaS
Sorry, but is there any update? I am waiting for your guide.


This is an expected behavior,. INT8 conv/gemm kernels use FP32 bias. Forcing bias into INT8 does not increase speed because the bias is already fused into conv/gemm kernels.

Thank you.

1 Like

Thank you so much.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.