QAT int8 TRT engine slower than fp16

maxime5 · December 21, 2021, 11:30am

Thank you for your reply but my problem is not to perform a int8 inference with tensorRT it’s about the generation of an int8 engine. I can use the engine without any issue, my point is about the inference time in int8 that is slower than fp16. I’ve seen that is a reported problem, please let me know if you have any advice.
Thanks