How does tensorRT behave?

forflafor · January 19, 2022, 2:26pm

Hello everyone.
I have created several models at different pruning thresholds.
I was wondering if TensorRT did some form of optimization when it encounters some weights set to zero. Would anyone know how to answer my question? Thank you.

AastaLLL · January 20, 2022, 3:50am

Hi,

In general, TensorRT uses layer type for optimization.

However, it does refer to the real inference time to choose an deployed algorithm.
This indicates it will take weight value and hardware environment into account indirectly.

For example,
In the convolution layer, it might not apply the FFT acceleration if almost all the weight values are zeros.

Please also note that we do have an optimization related to sparsity.
But this is only available for ampere GPU right now.
https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/

Thanks.

system · February 9, 2022, 5:57am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Optimize Tensorflow with Tensor RT to improve inference timing Jetson Nano	2	631	October 18, 2021
Would tensorrt optimize the memory consumption？ TensorRT	2	420	May 4, 2020
TensorRT optimization for pruning TensorRT	5	3367	June 15, 2020
TensorFlow-TensorRT inference time and memory consumption on Nano Jetson Nano	2	974	October 18, 2021
inference speed not improve between FP32 vs FP16 when using tensorflow.contrib.tensorrt Jetson AGX Xavier	4	715	October 18, 2021
TF-TRT vs TensorRT Jetson Nano	2	3519	October 14, 2021
Difference between TF-TFT and uff->tensorrt Jetson Nano	4	718	October 14, 2021
Inference using FP16 and FP32 precision giving no performance gain on Jetson Nano Jetson Nano	2	1320	October 14, 2021
Does network pruning speed up inference speed? TensorRT	6	1623	January 7, 2022
Tensor RT and weights pruning GPU-Accelerated Libraries	0	1127	August 10, 2017

How does tensorRT behave?

Related topics