Hello,
Can anyone please let me know if we can calculate or visualize the Quantization error in TensorRT2.1 using half-precision or INT8 quantization ?
Thanks for your help !
Hello,
Can anyone please let me know if we can calculate or visualize the Quantization error in TensorRT2.1 using half-precision or INT8 quantization ?
Thanks for your help !
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Achieving FP32 Accuracy for INT8 Inference Using Quantization Aware Training with NVIDIA TensorRT | 1 | 911 | December 3, 2023 | |
| [Hugging Face transformer models + pytorch_quantization] PTQ quantization int8 is slower than fp16 | 4 | 3152 | January 6, 2022 | |
| Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization | 1 | 638 | June 6, 2023 | |
| INT8 quantization with Torch-TensorRT fails | 3 | 967 | June 29, 2022 | |
| TensorRT5/6 FC Layer not support Int8 quantization. | 4 | 1029 | October 20, 2019 | |
| How to quantize a model for Tensorrt? | 0 | 192 | February 6, 2025 | |
| Post-Training Quantization (PTQ) for semantic segmentation model running on Jetson Orin NX | 24 | 713 | March 26, 2025 | |
| Some questions about TensorRT INT8, PTQ and QAT | 5 | 1943 | December 27, 2021 | |
| TensorRT quantization uses int8 or uint8 | 1 | 939 | June 6, 2023 | |
| QAT int8 TRT engine slower than fp16 | 3 | 2510 | January 6, 2022 |