I try to use TensorRT with its c++ API. The model is trained in Tensorflow Keras and it is saved in h5 format. To parse in C++ program, the model is required to convert to uff format according to official documents. Yet, The output from C++ has a huge difference from Python version. I use the output which is load from pb and inference in Tensorflow as answers. Initially, I suppose the error comes from FP16 optimization. But the unmatching remains the same after I turn off FP16 optimization. I wonder if you could give me some suggestion to find out the root cause.
Btw, I notice that in uff model the input should be NCHW format. I manually do this kind of operation explicitly. However, after profiling, it becomes the bottleneck of the whole process.
I would like to know
- How NHWC format affect inference performance?
- Is it possible to fuse the reshape operation into the model optimized by TensorRT to provide identical interface?