Hello everyone.
I have created several models at different pruning thresholds.
I was wondering if TensorRT did some form of optimization when it encounters some weights set to zero. Would anyone know how to answer my question? Thank you.
Hi,
In general, TensorRT uses layer type for optimization.
However, it does refer to the real inference time to choose an deployed algorithm.
This indicates it will take weight value and hardware environment into account indirectly.
For example,
In the convolution layer, it might not apply the FFT acceleration if almost all the weight values are zeros.
Please also note that we do have an optimization related to sparsity.
But this is only available for ampere GPU right now.
https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/
Thanks.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.