I think in one generated layers, weights and bias have to have the same format. How can I force to generate bias with int8 format to increase speed of engine model?
Thanks
@spolisetty I attached verbose logs when building engine and graph.json. I use this repo to build engine, it used Tensorrt Python API. It seems that, all biases is FP32 in engine model. Thanks graph.json (239.4 KB)
Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
validating your model with the below snippet
check_model.py
import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!
This is an expected behavior,. INT8 conv/gemm kernels use FP32 bias. Forcing bias into INT8 does not increase speed because the bias is already fused into conv/gemm kernels.