Performing int4/fp4 quantization on Thor for Yolov7

Hi,

Can anyone guide me to some example for how to perform int4/fp4 quantization? I am able to perform int8 quantization so far.

Hi,

Which frameworks do you use?

For example, you can set the quantization to INT4 for a MLC use case like beloew:

https://elinux.org/Jetson/L4T/Jetson_AI_Stack#MLC

mlc_llm convert_weight models/Qwen3-30B-A3B-Instruct-2507/
–quantization q4bf16_1
–model-type qwen3_moe
–device cuda
–source-format huggingface-safetensor\

Thanks.

Hi,

I am using pytorch models and using tensorrt10 for quantization related stuffs. If we do int4 quantization, how calibration cache will be generated? I am using models for OD/IS

Hi,

Please try to use our TensorRT Model Optimizer to quantize the model:

Thanks.

Will it be supported on Nvidia Thor? Also, any example how to perform int4 quantization. I could not find anything.

Hi,

Could you share what kind of model you want to quantize so we can find a corresponding one for you?

Do you want to apply quantization for an LLM or VLM?

Thanks.

Hi @AastaLLL

I am trying to run Yolov7 (GitHub - WongKinYiu/yolov7: Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors) model. I have run it with int8 quantization and is working fine. I want to try int4 and fp4 quantization.

Hi,

Usually, INT4/FP4 quantization is applied to the LLM model.
For CNN, it’s more common to use INT8 to preserve precision.

You can find the CNN quantization with TensorRT Model Optimizer below:

For int4, there are some examples in the link below:

Thanks.