Description
Hi, I am trying to use TensorRT to execute a Tiny-Yolov7 model with structured sparsity (2:4) and explicit quantization (PTQ). I have successfully trained and deployed on TensorRT a sparse network following this page :
Then, I followed this tutorial to perform PTQ step on my sparse model : https://github.com/NVIDIA-AI-IOT/yolo_deepstream/tree/main/yolov7_qat
If I use “–sparsity=enable”, you will see that no sparse implementation were picked (see ‘log_sparse_ptq.txt’). With “–sparsity=force”, I see an error happening but the engine is generated and evaluated (see ‘log_force.txt’). Why? The error is :
[04/19/2023-11:08:04] [E] Error[3]:[convolutionLayer.h::setKernelWeights::30] Error Code 3: API Usage Error (Parameter check failed at: /_src/build/x86_64-gnu/release/optimizer/api/layers/convolutionLayer.h::setKernelWeights::30, condition: kernelWeights.values != nullptr
If I inspect with netron the ONNX, the weights in QuantizeLinear layers have the same zeros values of my sparsified only model. I don’t know what happened. If you could help me, it will be highly appreciated ^^
Environment
TensorRT Version: 8.5.3
GPU Type: RTX A5000-24GB
Nvidia Driver Version: 520.61.05
CUDA Version: 11.8
CUDNN Version: 8.6.0
Operating System + Version: Ubuntu 22.04 LTS
Python Version (if applicable): 3.10.6
TensorFlow Version (if applicable): not applicable
PyTorch Version (if applicable): 2.0.0
Baremetal or Container (if container which image + tag): not applicable
Relevant Files
ptq-sparse-640.onnx (24.0 MB)
log_sparse_ptq.txt (5.2 MB)
log_force.txt (5.2 MB)
Steps To Reproduce
trtexec --onnx=ptq-sparse-640.onnx --saveEngine=ptq-tiny-yolov7-sparse-640.trt --int8 --fp16 --sparsity=enable --useCudaGraph --verbose
or
trtexec --onnx=ptq-sparse-640.onnx --saveEngine=ptq-tiny-yolov7-sparse-640.trt --int8 --fp16 --sparsity=force --useCudaGraph --verbose