Description
Using --int8 --sparsity=enable or force on Yolo v7 ONNX model that was 2:4 pruned is the same latency as without sparsity, ~22ms. Jetson Orin lists 137 TOP/s dense 275 TOP/s sparse, or 2x, it is not clear how to improve performance for sparse?
Environment
TensorRT Version: 8.5.2
GPU Type: Tegra
Nvidia Driver Version: JetPack 5.1.1
CUDA Version: 11.4
CUDNN Version: 8.6.0
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8
PyTorch Version (if applicable): 2.0.0
Baremetal or Container (if container which image + tag): Baremetal
Relevant Files
Steps To Reproduce
git clone GitHub - WongKinYiu/yolov7: Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
cd yolov7
wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-w6.pt
gedit export.py
between L47 and L48 insert
for name, module in model.named_modules():
if hasattr(module, ‘weight’):
with torch.no_grad():
if len(module.weight.shape) == 2:
module.weight[:, ::2] = 0
elif len(module.weight.shape) == 3:
module.weight[:, :, ::2] = 0
elif len(module.weight.shape) == 4:
module.weight[:, :, :, ::2] = 0
elif len(module.weight.shape) == 5:
module.weight[:, :, :, :, ::2] = 0
if hasattr(module, ‘bias’):
with torch.no_grad():
module.bias[::2] = 0
python export.py --weights yolov7-w6.pt --grid --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 1280 1280 --max-wh 1280
set power to 50W
sudo jetson_clocks
trtexec --onnx=yolov7-w6.onnx --saveEngine=yolov7-w6.trt --int8
Latency: mean = 22 ms
trtexec --onnx=yolov7-w6.onnx --saveEngine=yolov7-w6-0.trt --int8 --sparsity=force
Latency: mean = 22 ms