FP16 Engine in Python gives right outputs but C++ gives no outputs or NaNs/zeros.
I have an onnx file which I converted to an engine file using trtexec. I am using FP16 and setting the input and output precision’s to FP16. I am able to get the right outputs in Python, but the same engine gives no outputs in C++. I feel I am creating the context and buffers correctly in C++ similarly to the one in Python. I also tried using half data type in the “cuda_fp16.h” for the input/output data and the layers, I get more errors with some illegal memory when doing this. When I use normal float, I get no outputs. I believe TensorRT does an internal data type conversion, but I could be wrong. Any suggestions would be appreciated. Thanks.
My trtexec command:-
export CUDA_MODULE_LOADING=LAZY && ./trtexec \
–onnx=superpoint.onnx \
–builderOptimizationLevel=5 \
–useSpinWait \
–useRuntime=full \
–useCudaGraph \
–precisionConstraints=obey \
–allowGPUFallback \
–tacticSources=+CUBLAS,+CUDNN,+JIT_CONVOLUTIONS,+CUBLAS_LT,+EDGE_MASK_CONVOLUTIONS \
–saveEngine=superpoint.trt \
–minShapes=image:1x1x256x256 \
–optShapes=image:1x1x512x512 \
–maxShapes=image:1x1x640x640 \
–workspace=5120 \
–layerOutputTypes=fp16 \
–layerPrecisions=fp16 \
–sparsity=enable \
–inputIOFormats=fp16:chw \
–outputIOFormats=int32:chw,fp16:chw,fp16:chw \
–fp16
Environment
TensorRT Version: 8.6.1
GPU Type: RTX 4050 Laptop GPU
Nvidia Driver Version: 530.30.02
CUDA Version: 12.1
CUDNN Version: 8.9
Operating System + Version: Ubuntu 22.04
Python Version: 3.10
C++ Version: 17
Relevant Files
superpoint.onnx
superpoint.trt
superpoint_trt.py
main.cpp
CMakeLists.txt