Does weight pruning help improve the inference speed of pruned models on TX2?

Hello, I would like to prune my models and run thems on TX2. I would use weight pruning. That is to make the weights of models as sparse as possible.

It seems that whether the speed would be improved depends.

A sparse PyTorch model does not necessarily run faster than a dense one. But an ONNX one could. (Software)

It seems also about hardware. I am wondering if a sparse one could run faster on TX2?


This depends on the software you used.

Take TensorRT as example, weight pruning may not improve the performance obviously.
We don’t check the sparsity before inference so the lunched process remains the same.

Suppose you can try layer pruning, which will directly improve the performance.