Problem with structured sparsity and explicit quantization (PTQ) on Tiny-Yolov7


Hi, I am trying to use TensorRT to execute a Tiny-Yolov7 model with structured sparsity (2:4) and explicit quantization (PTQ). I have successfully trained and deployed on TensorRT a sparse network following this page :

Then, I followed this tutorial to perform PTQ step on my sparse model : yolo_deepstream/yolov7_qat at main · NVIDIA-AI-IOT/yolo_deepstream · GitHub

If I use “–sparsity=enable”, you will see that no sparse implementation were picked (see ‘log_sparse_ptq.txt’). With “–sparsity=force”, I see an error happening but the engine is generated and evaluated (see ‘log_force.txt’). Why? The error is :

[04/19/2023-11:08:04] [E] Error[3]:[convolutionLayer.h::setKernelWeights::30] Error Code 3: API Usage Error (Parameter check failed at: /_src/build/x86_64-gnu/release/optimizer/api/layers/convolutionLayer.h::setKernelWeights::30, condition: kernelWeights.values != nullptr

If I inspect with netron the ONNX, the weights in QuantizeLinear layers have the same zeros values of my sparsified only model. I don’t know what happened. If you could help me, it will be highly appreciated ^^


TensorRT Version: 8.5.3
GPU Type: RTX A5000-24GB
Nvidia Driver Version: 520.61.05
CUDA Version: 11.8
CUDNN Version: 8.6.0
Operating System + Version: Ubuntu 22.04 LTS
Python Version (if applicable): 3.10.6
TensorFlow Version (if applicable): not applicable
PyTorch Version (if applicable): 2.0.0
Baremetal or Container (if container which image + tag): not applicable

Relevant Files

ptq-sparse-640.onnx (24.0 MB)
log_sparse_ptq.txt (5.2 MB)
log_force.txt (5.2 MB)

Steps To Reproduce

trtexec --onnx=ptq-sparse-640.onnx --saveEngine=ptq-tiny-yolov7-sparse-640.trt --int8 --fp16 --sparsity=enable --useCudaGraph --verbose
trtexec --onnx=ptq-sparse-640.onnx --saveEngine=ptq-tiny-yolov7-sparse-640.trt --int8 --fp16 --sparsity=force --useCudaGraph --verbose


This looks like a Deepstream related issue. We will move this post to the Deepstream forum.


Thanks for the quick answer.

It’s clearly a TensorRT problem. There is several parts in the github and I’m focus on the tensorRT and PTQ/QAT technic.


Hi @TakeThat42 ,
Apologies for the delayed response, i am checking on this and will update you soon.


Hi @AakankshaS,

Thanks for your concern. I will be waiting your answer.


Hi @AakankshaS,

Do you have any update on my problem? I’m still waiting your answer.