Same resnext101 model size for dense and sparse

hank.fang.usa · December 21, 2023, 5:28am

I am following instructions in Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT | NVIDIA Technical Blog

but after I downloaded the dense and sparse models by:
ngc registry model download-version nvidia/resnext101_32x8d_sparse_onnx:1
ngc registry model download-version nvidia/resnext101_32x8d_dense_onnx:1

Surprisingly, the 2 downloaded onnx models have exact file size:
354782502 Dec 20 17:57 resnext101_32x8d_pyt_torchvision_sparse.onnx
354782502 Dec 20 18:00 resnext101_32x8d_pyt_torchvision_dense.onnx

I expect the sparse model has smaller size, but both has exact file size, can this be the reason people reported no performance difference?

AastaLLL · December 25, 2023, 8:21am

Hi,

Could you share the checksum of both models?

$ md5sum resnext101_32x8d_pyt_torchvision_sparse.onnx
$ md5sum resnext101_32x8d_pyt_torchvision_dense.onnx

Thanks.

hank.fang.usa · December 26, 2023, 7:17pm

$ md5sum resnext101_32x8d_pyt_torchvision_dense.onnx
49beb2920f6f6e42eb20b874a30eab98

$ md5sum resnext101_32x8d_pyt_torchvision_sparse.onnx
c962aeafd8a7000f3c72bbfcd2165572

AastaLLL · December 27, 2023, 6:10am

Hi,

Have you try to infer it with TensorRT (ex. trtexec)?
The model might use the same data length to save sparse or dense model so the file size will be identical.

Thanks.

hank.fang.usa · December 27, 2023, 10:55pm

yes: here is the result:

with resnext101_32x8d_pyt_torchvision_sparse.onnx
Throughput: 146.487 qps
Total Host Walltime: 3.01733 s
Total GPU Compute Time: 3.00983 s
with resnext101_32x8d_pyt_torchvision_dense.onnx
Throughput: 116.562 qps
Total Host Walltime: 3.02844 s
Total GPU Compute Time: 3.01938 s

it seems there is about 25% performance improvement with Sparsity enabled.

hank.fang.usa · December 27, 2023, 11:26pm

A follow-up question here: at least for resnext101 model, about 25% performance improvement observed on Jetson AGX orin if the model is inferencing on GPU, what about DLA? is there any performance improvement with sparisity enabled when model runs on DLA?

AastaLLL · January 4, 2024, 5:45am

Hi,

It depends on the model.

Although DLA can increase inference throughput, it is limited in the supported layer type.
If a model needs to fallback to GPU frequently, the data transfer overhead might slow down the performance.

Thank.

system · January 30, 2024, 8:47am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT Technical Blog	13	2857	June 2, 2023
Difference between --sparsity=enable and --sparcity=disable in .trtexec utility Jetson AGX Orin tensorrt	4	841	October 11, 2022
Sparse tensor math speedup on Ampere TensorRT tensorrt , cuda	1	375	December 20, 2023
Stuctured sparsity 2:4 does not improve inference performance on Jetson Orin TensorRT tensorrt	6	912	October 17, 2023
Sparsity does not provide any speedup for TensorRT on DLA Jetson AGX Orin cudnn	6	987	January 22, 2024
Model file size on jetson nano with 6.0-full-dims is larger than that on desktop PC with 6.0 Jetson Nano onnx	4	779	October 18, 2021
2:4 sparsity doesnot improve inference performance on RTX 3090 TensorRT tensorrt	14	3392	September 9, 2022
Does Jetson Xavier NX 16 GB support sparse tensor? Jetson Xavier NX pytorch	7	693	July 19, 2023
TX2 NX ONNX Convert TensorRT Engine Jetson TX2 tensorrt , hw , jetson-inference	2	629	October 18, 2021
Lower than expected time-consuming optimizations Jetson AGX Orin jetson-inference	8	287	January 16, 2024

Same resnext101 model size for dense and sparse

Related topics