Problem with structured sparsity and explicit quantization (PTQ) on Tiny-Yolov7

TakeThat42 · April 19, 2023, 10:24am

Description

Hi, I am trying to use TensorRT to execute a Tiny-Yolov7 model with structured sparsity (2:4) and explicit quantization (PTQ). I have successfully trained and deployed on TensorRT a sparse network following this page :

Then, I followed this tutorial to perform PTQ step on my sparse model : https://github.com/NVIDIA-AI-IOT/yolo_deepstream/tree/main/yolov7_qat

If I use “–sparsity=enable”, you will see that no sparse implementation were picked (see ‘log_sparse_ptq.txt’). With “–sparsity=force”, I see an error happening but the engine is generated and evaluated (see ‘log_force.txt’). Why? The error is :

[04/19/2023-11:08:04] [E] Error[3]:[convolutionLayer.h::setKernelWeights::30] Error Code 3: API Usage Error (Parameter check failed at: /_src/build/x86_64-gnu/release/optimizer/api/layers/convolutionLayer.h::setKernelWeights::30, condition: kernelWeights.values != nullptr

If I inspect with netron the ONNX, the weights in QuantizeLinear layers have the same zeros values of my sparsified only model. I don’t know what happened. If you could help me, it will be highly appreciated ^^

Environment

TensorRT Version: 8.5.3
GPU Type: RTX A5000-24GB
Nvidia Driver Version: 520.61.05
CUDA Version: 11.8
CUDNN Version: 8.6.0
Operating System + Version: Ubuntu 22.04 LTS
Python Version (if applicable): 3.10.6
TensorFlow Version (if applicable): not applicable
PyTorch Version (if applicable): 2.0.0
Baremetal or Container (if container which image + tag): not applicable

Relevant Files

ptq-sparse-640.onnx (24.0 MB)
log_sparse_ptq.txt (5.2 MB)
log_force.txt (5.2 MB)

Steps To Reproduce

trtexec --onnx=ptq-sparse-640.onnx --saveEngine=ptq-tiny-yolov7-sparse-640.trt --int8 --fp16 --sparsity=enable --useCudaGraph --verbose
or
trtexec --onnx=ptq-sparse-640.onnx --saveEngine=ptq-tiny-yolov7-sparse-640.trt --int8 --fp16 --sparsity=force --useCudaGraph --verbose

AakankshaS · April 19, 2023, 10:37am

Hi,

This looks like a Deepstream related issue. We will move this post to the Deepstream forum.

Thanks!

TakeThat42 · April 19, 2023, 10:47am

Hi,
Thanks for the quick answer.

It’s clearly a TensorRT problem. There is several parts in the github and I’m focus on the tensorRT and PTQ/QAT technic.

Regards,

AakankshaS · April 29, 2023, 2:59pm

Hi @TakeThat42 ,
Apologies for the delayed response, i am checking on this and will update you soon.

Thanks

TakeThat42 · April 29, 2023, 4:13pm

Hi @AakankshaS,

Thanks for your concern. I will be waiting your answer.

Thanks

TakeThat42 · May 26, 2023, 6:45am

Hi @AakankshaS,

Do you have any update on my problem? I’m still waiting your answer.

Thanks

Topic		Replies	Views
Structured sparsity not working with explicit quantization TensorRT tensorrt	5	1092	March 31, 2022
Sparsity in INT8: Training Workflow and Best Practices for NVIDIA TensorRT Acceleration Technical Blog	0	435	May 26, 2023
Sparsity does not provide any speedup for TensorRT on DLA Jetson AGX Orin cudnn	6	1166	January 22, 2024
2:4 sparsity doesnot improve inference performance on RTX 3090 TensorRT tensorrt	14	3685	September 9, 2022
Stuctured sparsity 2:4 does not improve inference performance on Jetson Orin TensorRT tensorrt	6	1066	October 17, 2023
Structure Sparsity not working with BERT large TensorRT	11	1209	July 7, 2022
Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT Technical Blog	13	3052	June 2, 2023
Deep Learning model in TensorRT with SPARSE layers not accelerating on A40 TensorRT tensorrt , yolo , onnx , deep-learning	0	150	August 7, 2024
Sparsity on Onnx Model TensorRT	1	248	December 31, 2024
Some questions about TensorRT INT8, PTQ and QAT TensorRT tensorrt	5	1954	December 27, 2021

Problem with structured sparsity and explicit quantization (PTQ) on Tiny-Yolov7

Description

Environment

Relevant Files

Steps To Reproduce

Related topics