Using --int8 --sparsity=enable or force on Yolo v7 ONNX model that was 2:4 pruned is the same latency as without sparsity, ~22ms. Jetson Orin lists 137 TOP/s dense 275 TOP/s sparse, or 2x, it is not clear how to improve performance for sparse?
Environment
TensorRT Version: 8.5.2 GPU Type: Tegra Nvidia Driver Version: JetPack 5.1.1 CUDA Version: 11.4 CUDNN Version: 8.6.0 Operating System + Version: Ubuntu 20.04 Python Version (if applicable): 3.8 PyTorch Version (if applicable): 2.0.0 Baremetal or Container (if container which image + tag): Baremetal
trtexec --onnx=yolov7-w6.onnx --saveEngine=yolov7-w6.trt --int8
Latency: mean = 22 ms
trtexec --onnx=yolov7-w6.onnx --saveEngine=yolov7-w6-0.trt --int8 --sparsity=force
Latency: mean = 22 ms
Not sure that these links are directly related, there are also couple of other posts related to DGPU and TensorRT 2:4 sparsity. Specifically this topic is about difference between TensroRT without and with sparsity which does not seem to make a difference. Based on the Jetson Orin and other DGPU specs 2:4 sparsity should improve dense TOP/s by 100% but is not evident from trtexec execution?
As mentioned, this looks like a TensorRT not just Jetson issue, and hence post is better moved back to TensorRT - NVIDIA Developer Forums.
Executing on A5000 with TRT 8.5.0 exhibits similar behavior, difference between dense and 2:4 sparse file is minimal and forcing sparsity on sparse file makes no difference. Is there a contemporary object detection example that exemplifies TRT 2:4 advantage? Thanks.
Hi,
Can this be moved back to TensorRT - NVIDIA Developer Forums since it is not just Jetson Orin specific, so it could get further assistance? Thanks.
Using APEX 2:4 structural sparsity during training, latency stays the same. Is there any object detection model that examples 2:1 acceleration with TensorRT?