I pruned my model, do QAT and then convert to engine model. But speed of my pruned engine is lower than the one from original model. I am not clear the root cause of the problem? Is there any constraints with number of channels of Conv layer to best supported in TensorRT?
Thanks.