Through the TensorRT verbose log and profile result, I’m not satisfied with some layers’ performance. And I’ve implemented faster CUDA kernels for these layers. Can I use my kernels as the external tactics while building the TensorRT engine?
Please refer to below links related custom plugin implementation and sample:
Hope following will help you. Please let us know if you still have a query.