Hi all,
I am working on tensorrt 8.4.0.6 . one feature that tensorrt 8 has is Sparsity. So, it mean that the sparser model is the faster model is . Is that right?
because I pruned my model (my weights are about 70 % are zeros ), if I convert to tensorrt , it will faster than unpruned model ?
Thank you
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| TensorRT optimization for pruning | 5 | 3707 | June 15, 2020 | |
| Does network pruning speed up inference speed? | 6 | 1843 | January 7, 2022 | |
| Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT | 13 | 3021 | June 2, 2023 | |
| Difference between --sparsity=enable and --sparcity=disable in .trtexec utility | 4 | 949 | October 11, 2022 | |
| 2:4 sparsity doesnot improve inference performance on RTX 3090 | 14 | 3625 | September 9, 2022 | |
| Sparsity does not provide any speedup for TensorRT on DLA | 6 | 1133 | January 22, 2024 | |
| Should pruning a model prior to converting it to tensorRT make inference faster? | 12 | 3040 | October 18, 2021 | |
| Deep Learning model in TensorRT with SPARSE layers not accelerating on A40 | 0 | 134 | August 7, 2024 | |
| Sparse tensor math speedup on Ampere | 1 | 429 | December 20, 2023 | |
| Stuctured sparsity 2:4 does not improve inference performance on Jetson Orin | 6 | 1040 | October 17, 2023 |