i want to ask is before i try measuring performance, i want to seek advice for Tensor RT optimization.
i used the underline method to prune my model, and i think this pruning method takes sparse layers to accelerate model inference.
so, in tensor RT layers does calculation for sparse layers of convolution network support so my pruining work will accelerate in tensorRT? or what i’m doing is automatively done in tensor RT so my work will not improve speed?
rtx 3050 laptop
i use pop-os 21.04 and i dont know pop-os use control driver nvidia… Example, ubuntu use nouveau control nvidia. But i think not use best performance rtx. You can help me another choice???
If you channel prune models in the right way (and then compress them), you won’t get any increase in speed in TensorRT. GPUs are simply very good at dense math, so unless sparsity is appropriately structured or weights are very sparse, sparse computations are unlikely to improve performance.