What is the official suggestion to use weight only quantization / smooth quant in TensorRT?

1055057679 · November 9, 2023, 10:40am

Description

Since weight only / smooth quant is being widely used in LLMs and supported by FasterTransformer and TRTLLM. Is there any way to use them directly in TensorRT? After all TRTLLM is not supposed to be solution for everything, because it neither support ONNX parser nor support older GPUs and requires complex environment. I have noticed TRT has explicit quant mode, so is it applicable to set all weights to kINT8 to use weight-only quantization?

1055057679 · November 14, 2023, 2:33am

Hello

AakankshaS · November 15, 2023, 10:00am

Hi @1055057679 ,
SmoothQuant support and perf optimization support has been added to latest TRT release and can be used via onnx path
Thanks

1055057679 · November 16, 2023, 3:19am

Hi, do you mean TRT9.1? May I ask for a document for that with respect to SmoothQuant? Or is that expected to come out soon?

1055057679 · November 16, 2023, 3:31am

And I’m afraid TRT9.1 doesn’t support P100, does it?

awaisuncle07 · November 28, 2023, 12:46pm

I’m currently exploring the possibility of implementing weight-only quantization for my models. While I’ve come across information about the FasterTransformer and TRTLLM libraries supporting such quantization techniques, I’m inclined to explore cola man vapes a solution directly within TensorRT due to compatibility concerns and the complexity of the environment required by TRTLLM.

system · December 12, 2023, 12:47pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
I want to get the lightweight model converted to onnx TensorRT onnx	3	679	September 1, 2022
Q's on TensorRT GPU-Accelerated Libraries	3	1178	August 7, 2017
TensorRT quantization uses int8 or uint8 TensorRT tensorrt	1	856	June 6, 2023
Structured sparsity not working with explicit quantization TensorRT tensorrt	5	954	March 31, 2022
TensorRT encountered issues when converting weights between types and that could affect accuracy TensorRT	7	1786	September 22, 2023
What's the default quantization mode for TensorRT PTQ TensorRT	2	565	October 12, 2021
Trtexec cannot convert QAT onnx model to trt model Jetson AGX Xavier tensorrt	7	663	August 9, 2022
How to using DLA gracefully with Int8 in TRT8 TensorRT tensorrt , dla	5	1246	December 20, 2023
TensorRT INT8 Quantization : weights + activations quantization TensorRT	4	2066	February 13, 2020
TensorRT - INT8 Quantization - weights - activations TensorRT	2	961	January 9, 2020

What is the official suggestion to use weight only quantization / smooth quant in TensorRT?

Description

Related topics