Structured sparsity not working with explicit quantization

Description

I am trying to use TensorRT to execute a ResNet50 model with structured sparsity (2:4) and explicit quantization. However, I cannot get TensorRT to pick a sparse implementation for any of the layers. Is structured sparsity supported along with explicit quantization in general?

Environment

TensorRT Version : 8.2.3-1+cuda11.4
GPU Type : A100-SXM4-40GB
Nvidia Driver Version : 460.32.03
CUDA Version : 11.6
CUDNN Version : 8.3
Operating System + Version : Ubuntu 20.04.2 LTS
Python Version (if applicable) : 3.8.10
TensorFlow Version (if applicable) : not applicable
PyTorch Version (if applicable) : not applicable
Baremetal or Container (if container which image + tag) : tensorrt:22.02-py3 (NGC catalog)

Relevant Files

resnet50_quant_sparse.onnx (97.8 MB)

Steps To Reproduce

trtexec --onnx=resnet50_quant_sparse.onnx --int8 --sparsity=force --shapes=input:128x3x224x224

or

trtexec --onnx=resnet50_quant_sparse.onnx --int8 --sparsity=enable --shapes=input:128x3x224x224

Hi @alexandre_marques

Could you please share verbose logs for the above commands with the --verbose option for better debugging.

Thank you.

Thanks again for debugging this issue. Here are the log files:

enable_log.txt (45.4 KB)
force_log.txt (58.9 KB)

Sorry, the logs above did not use verbosity. Here are the correct ones:

enable_log.txt (3.0 MB)
force_log.txt (3.0 MB)

Hi,

From the sparsity_force_log:

[03/25/2022-13:23:43] [I] [TRT] (Sparsity) Layers eligible for sparse math: sections.0.0.conv1.module.weight + QuantizeLinear_24_quantize_scale_node + Conv_28 + Relu_30, sections.0.0.identity.conv.module.weight + QuantizeLinear_68_quantize_scale_node + Conv_72, sections.0.0.conv2.module.weight + QuantizeLinear_39_quantize_scale_node + Conv_43 + Relu_45, sections.0.0.conv3.module.weight + QuantizeLinear_54_quantize_scale_node + Conv_58 + Add_80 + Relu_81, sections.0.1.conv1.module.weight + QuantizeLinear_90_quantize_scale_node + Conv_94 + Relu_96, sections.0.1.conv2.module.weight + QuantizeLinear_105_quantize_scale_node + Conv_109 + Relu_111, sections.0.1.conv3.module.weight + QuantizeLinear_120_quantize_scale_node + Conv_124 + Add_132 + Relu_133, sections.0.2.conv1.module.weight + QuantizeLinear_142_quantize_scale_node + Conv_146 + Relu_148, sections.0.2.conv2.module.weight + QuantizeLinear_157_quantize_scale_node + Conv_161 + Relu_163, sections.0.2.conv3.module.weight + QuantizeLinear_172_quantize_scale_node + Conv_176 + Add_184 + Relu_185, sections.1.0.conv1.module.weight + QuantizeLinear_194_quantize_scale_node + Conv_198 + Relu_200, sections.1.0.identity.conv.module.weight + QuantizeLinear_238_quantize_scale_node + Conv_242, sections.1.0.conv2.module.weight +
[03/25/2022-13:23:43] [I] [TRT] (Sparsity) TRT inference plan picked sparse implementation for layers: sections.0.1.conv1.module.weight + QuantizeLinear_90_quantize_scale_node + Conv_94 + Relu_96, sections.0.2.conv1.module.weight + QuantizeLinear_142_quantize_scale_node + Conv_146 + Relu_148,

We can see some layers are using sparse tactics, example sparse_conv:

[03/25/2022-13:23:42] [V] [TRT] sections.0.1.conv1.module.weight + QuantizeLinear_90_quantize_scale_node + Conv_94 + Relu_96 Set Tactic Name: sm80_xmma_fprop_sparse_conv_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x128x128_stage3_warpsize1x4x1_g1_sptensor16x8x64_t1r1s1_no_preds Tactic: 7886967395128926382

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.