I just discovered the TensorRT tool and I have a question.
During the network optimization process, is it possible to ask TensorRT to prune small weights in order to decrease the network memory and the inference time ?
Or may I prune the network by myself before the TensorRT optimization ?
In this case, is setting manually small weights to zero enough ?
In my first experimentations, it appears that the amount of memory of the optimized / raw caffe networks are the same. Is it normal ?
Thank you for your help :)