Recently, I’ve been using ASP in Apex to sparsify my network model, and it reduces the time-consuming by over 40% on a GPU 3060, but why is it that on Orin only reduces the time-consuming by 18%?
Hi,
Which frameworks do you use for inference?
Is it TensorRT?
Thanks.
It’s TensorRT.
Sorry, I just find that other factors made the time-consuming longer before sparsifying on GPU. Finally, I find ASP also reduces the time-consuming by about 18% on GPU.
Is it a normal reducing range on Orin by using ASP?
Thanks.
Hi,
The ratio should depend on the model architecture.
Which precision did you use for inference?
It’s recommended to try INT8 or FP16.
Thanks.
Thanks, I used INT8.
Hi,
Could you run the trtexec with --verbose
and share the log with us?
It should contain the sparsity information.
Thanks.
Could you please offer some of your previous empirical benefits obtained by sparsifying networks such as ResNet or others?
Thanks!
Hi,
We don’t have a dense vs. sparse perf comparison with GPU.
But there are some data with DLA:
Thanks.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.