Description
We have implemented a solution in C++ which inferences a object detection CNN.
It has some activations for the convolutionos which are custom and implemented suing IPluginEXTV2
We’re looking to implement these customs plugins so they support both Fp16 and Fp32 inference. When the builder is configured as FP16 or not.
How best is this implemented to also ensure the best tactic is generated during compiling?
I’ve done some googling and cannot find a difinitive answer. So hoping I can get some clarification here.
Environment
TensorRT Version: 7.2.3.4
GPU Type: RTX 3060 Mobile, RTX 3060 SUPER Desktop
Nvidia Driver Version: 516.59
CUDA Version: 11.2
CUDNN Version: Unkown
Operating System + Version: Ubuntu 18.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): Nvidia docker runtime