FP16 Engine in Python gives right outputs but C++ gives no outputs or NaNs/zeros

hari10 · October 16, 2024, 5:28am

FP16 Engine in Python gives right outputs but C++ gives no outputs or NaNs/zeros.

I have an onnx file which I converted to an engine file using trtexec. I am using FP16 and setting the input and output precision’s to FP16. I am able to get the right outputs in Python, but the same engine gives no outputs in C++. I feel I am creating the context and buffers correctly in C++ similarly to the one in Python. I also tried using half data type in the “cuda_fp16.h” for the input/output data and the layers, I get more errors with some illegal memory when doing this. When I use normal float, I get no outputs. I believe TensorRT does an internal data type conversion, but I could be wrong. Any suggestions would be appreciated. Thanks.

My trtexec command:-

export CUDA_MODULE_LOADING=LAZY && ./trtexec \
–onnx=superpoint.onnx \
–builderOptimizationLevel=5 \
–useSpinWait \
–useRuntime=full \
–useCudaGraph \
–precisionConstraints=obey \
–allowGPUFallback \
–tacticSources=+CUBLAS,+CUDNN,+JIT_CONVOLUTIONS,+CUBLAS_LT,+EDGE_MASK_CONVOLUTIONS \
–saveEngine=superpoint.trt \
–minShapes=image:1x1x256x256 \
–optShapes=image:1x1x512x512 \
–maxShapes=image:1x1x640x640 \
–workspace=5120 \
–layerOutputTypes=fp16 \
–layerPrecisions=fp16 \
–sparsity=enable \
–inputIOFormats=fp16:chw \
–outputIOFormats=int32:chw,fp16:chw,fp16:chw \
–fp16

Environment

TensorRT Version: 8.6.1
GPU Type: RTX 4050 Laptop GPU
Nvidia Driver Version: 530.30.02
CUDA Version: 12.1
CUDNN Version: 8.9
Operating System + Version: Ubuntu 22.04
Python Version: 3.10
C++ Version: 17

Relevant Files

superpoint.onnx
superpoint.trt
superpoint_trt.py
main.cpp
CMakeLists.txt

Topic		Replies	Views
Inference fp16 engine in c++ get Nan output but inference fp32 engine can get correct result TensorRT	13	1304	October 10, 2023
The TensorRT engine produces different inference results when loaded using Python compared to C++ TensorRT cudnn , deepstream	0	12	April 22, 2025
TensorRT with fp16 return nan for all outputs TensorRT	5	4040	February 5, 2021
TensorRT gives diffent results than ONNX and Pytorch TensorRT	8	1556	September 28, 2023
Onnx to TensorRT conversion TensorRT tensorrt , cuda , ubuntu	1	709	June 21, 2023
Tensorrt C++ not working as python version and gives wrong results TensorRT	4	495	August 7, 2023
Simple ResNet model from PyTorch - "nan" Output TensorRT	1	1567	April 9, 2021
Why is the size of the model exported by `trtexec --fp16` almost the same size as the model without this flag set? TensorRT	2	1107	December 8, 2022
All outputs are nan TensorRT	5	2638	September 23, 2022
TensorRT int8 slower than FP16 due to reformat layer TensorRT tensorrt , cudnn	0	85	October 11, 2024

FP16 Engine in Python gives right outputs but C++ gives no outputs or NaNs/zeros