TensorRT5/6 FC Layer not support Int8 quantization.

yfjiaren · October 17, 2019, 11:25am

NVES_R · October 17, 2019, 11:07pm

Hi,

Can you provide the following information so we can better help?

Provide details on the platforms you are using:
Linux distro and version
GPU type
Nvidia driver version
CUDA version
CUDNN version
Python version [if using python]
Tensorflow version
TensorRT version
If Jetson, OS, hw versions

Files

Include any logs, source, models (.uff, .pb, etc.) that would be helpful to diagnose the problem.

If relevant, please include the full traceback.

Reproducibility

Please provide a minimal test case that reproduces your error.

yfjiaren · October 18, 2019, 7:21am

Ubuntu16.04
P40
396.26
9.0
7.6.3
3.6.8
1.12
6.0.1.5

Case 1: Just use any *.engine with a fully connected layer and quantize it, the following error occurs.

TensorRT] WARNING: No implementation of layer FC obeys the requested constraints in strict mode. No conforming implementation was found i.e. requested layer computation precision and output precision types are ignored, using the fastest implementation.

Case 2: If I use a 1x1 convolutional layer instead of a fully connected layer with the same weight, the quantization process is ok.

And the *.engine8 size of Case 2 is much small Case 1, saying that fully connected layer is not supported for quantization.

NVES_R · October 18, 2019, 5:57pm

Hi,

I can help you more quickly if you provide a minimal script that reproduces your error similar to the one you provided here: https://devtalk.nvidia.com/default/topic/1064929/tensorrt/resizebilinear/post/5392890/#5392890

Please provide (1) a simple script failing to quantize a network with FC layer and (2) a simple script successfully quantizing a network with a 1x1 conv layer.

Thanks,
NVIDIA Enterprise Support

15618561709 · October 20, 2019, 2:20pm

how to check my pytorch layers contain INT64?

Topic		Replies	Views
Post-Training Quantization (PTQ) for semantic segmentation model running on Jetson Orin NX Jetson Orin NX tensorrt	24	753	March 26, 2025
TensorRT generated QAT engine, why the engine is bigger than pretrained fp16 engine? TensorRT	3	1418	January 4, 2022
[Hugging Face transformer models + pytorch_quantization] PTQ quantization int8 is slower than fp16 TensorRT tensorrt , python , onnx , natural-language-processing-nlp	4	3166	January 6, 2022
Int8 TensorCores for Jetson Jetson AGX Xavier tensorrt	7	1450	April 26, 2023
Model does not get Int8 layers TensorRT	4	636	September 19, 2022
Why while ONNX-TensorRT conversion with INT8 quantizations some layers are not quantized? TensorRT tensorrt , pytorch , onnx	12	2999	December 4, 2022
The result of tensorrt qat is not equal to the result of pytorch qat in int8 mode TensorRT	5	572	June 5, 2020
TensorRT run ONNX model with Int8 issue TensorRT	9	4445	October 12, 2021
TensorRT the inference is slow for the QAT model comparing to the PTQ case Jetson AGX Xavier tensorrt , nvbugs	19	1847	January 16, 2023
Turing Tensor core int4 operation TensorRT	3	2957	December 11, 2018

TensorRT5/6 FC Layer not support Int8 quantization.

Provide details on the platforms you are using: Linux distro and version GPU type Nvidia driver version CUDA version CUDNN version Python version [if using python] Tensorflow version TensorRT version If Jetson, OS, hw versions

Related topics

Provide details on the platforms you are using:
Linux distro and version
GPU type
Nvidia driver version
CUDA version
CUDNN version
Python version [if using python]
Tensorflow version
TensorRT version
If Jetson, OS, hw versions