Stuctured sparsity 2:4 does not improve inference performance on Jetson Orin

predrag12 · August 19, 2023, 3:59am

Description

Using --int8 --sparsity=enable or force on Yolo v7 ONNX model that was 2:4 pruned is the same latency as without sparsity, ~22ms. Jetson Orin lists 137 TOP/s dense 275 TOP/s sparse, or 2x, it is not clear how to improve performance for sparse?

Environment

TensorRT Version: 8.5.2
GPU Type: Tegra
Nvidia Driver Version: JetPack 5.1.1
CUDA Version: 11.4
CUDNN Version: 8.6.0
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8
PyTorch Version (if applicable): 2.0.0
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

Steps To Reproduce

git clone GitHub - WongKinYiu/yolov7: Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
cd yolov7
wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-w6.pt
gedit export.py

between L47 and L48 insert

for name, module in model.named_modules():
if hasattr(module, ‘weight’):
with torch.no_grad():
if len(module.weight.shape) == 2:
module.weight[:, ::2] = 0
elif len(module.weight.shape) == 3:
module.weight[:, :, ::2] = 0
elif len(module.weight.shape) == 4:
module.weight[:, :, :, ::2] = 0
elif len(module.weight.shape) == 5:
module.weight[:, :, :, :, ::2] = 0
if hasattr(module, ‘bias’):
with torch.no_grad():
module.bias[::2] = 0

python export.py --weights yolov7-w6.pt --grid --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 1280 1280 --max-wh 1280

set power to 50W
sudo jetson_clocks

trtexec --onnx=yolov7-w6.onnx --saveEngine=yolov7-w6.trt --int8
Latency: mean = 22 ms
trtexec --onnx=yolov7-w6.onnx --saveEngine=yolov7-w6-0.trt --int8 --sparsity=force
Latency: mean = 22 ms

AakankshaS · August 21, 2023, 8:37am

Hi,

This looks like a Jetson issue. Please refer to the below samples in case useful.

For any further assistance, we will move this post to to Jetson related forum.

Thanks!

predrag12 · August 21, 2023, 1:20pm

Hi,

Not sure that these links are directly related, there are also couple of other posts related to DGPU and TensorRT 2:4 sparsity. Specifically this topic is about difference between TensroRT without and with sparsity which does not seem to make a difference. Based on the Jetson Orin and other DGPU specs 2:4 sparsity should improve dense TOP/s by 100% but is not evident from trtexec execution?

Thanks.

predrag12 · August 28, 2023, 5:00pm

Hi @AakankshaS,

As mentioned, this looks like a TensorRT not just Jetson issue, and hence post is better moved back to TensorRT - NVIDIA Developer Forums.
Executing on A5000 with TRT 8.5.0 exhibits similar behavior, difference between dense and 2:4 sparse file is minimal and forcing sparsity on sparse file makes no difference. Is there a contemporary object detection example that exemplifies TRT 2:4 advantage? Thanks.

Dense file:
trtexec --onnx=yolov7-w6.onnx --saveEngine=yolov7-w6.trt --int8
Latency 4.73ms

Sparse file:
trtexec --onnx=yolov7-w6-0.onnx --saveEngine=yolov7-w6-0.trt --int8
Latency 4.44ms

Sparse file:
trtexec --onnx=yolov7-w6-0.onnx --saveEngine=yolov7-w6-00.trt --int8 --sparsity=force
Latency 4.41ms

predrag12 · September 11, 2023, 7:47pm

Hi,
Can this be moved back to TensorRT - NVIDIA Developer Forums since it is not just Jetson Orin specific, so it could get further assistance? Thanks.

predrag12 · October 3, 2023, 3:48am

Is there a TensorRT sample that exemplifies claimed 2:1 acceleration using 2:4 sparsity optimization vs. dense? Thanks.

predrag12 · October 17, 2023, 2:23pm

Using APEX 2:4 structural sparsity during training, latency stays the same. Is there any object detection model that examples 2:1 acceleration with TensorRT?

Topic		Replies	Views
2:4 sparsity doesnot improve inference performance on RTX 3090 TensorRT tensorrt	14	3365	September 9, 2022
Lower than expected time-consuming optimizations Jetson AGX Orin jetson-inference	8	283	January 16, 2024
Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT Technical Blog	13	2842	June 2, 2023
Sparsity does not provide any speedup for TensorRT on DLA Jetson AGX Orin cudnn	6	981	January 22, 2024
Sparse tensor math speedup on Ampere TensorRT tensorrt , cuda	1	374	December 20, 2023
Deep Learning model in TensorRT with SPARSE layers not accelerating on A40 TensorRT tensorrt , yolo , onnx , deep-learning	0	70	August 7, 2024
Structured sparsity not working with explicit quantization TensorRT tensorrt	5	986	March 31, 2022
Problem with structured sparsity and explicit quantization (PTQ) on Tiny-Yolov7 TensorRT	5	773	May 26, 2023
Does Jetson Xavier NX 16 GB support sparse tensor? Jetson Xavier NX pytorch	7	692	July 19, 2023
Difference between --sparsity=enable and --sparcity=disable in .trtexec utility Jetson AGX Orin tensorrt	4	836	October 11, 2022