Different inference output when loading from uff and pb file

I try to use TensorRT with its c++ API. The model is trained in Tensorflow Keras and it is saved in h5 format. To parse in C++ program, the model is required to convert to uff format according to official documents. Yet, The output from C++ has a huge difference from Python version. I use the output which is load from pb and inference in Tensorflow as answers. Initially, I suppose the error comes from FP16 optimization. But the unmatching remains the same after I turn off FP16 optimization. I wonder if you could give me some suggestion to find out the root cause.

Btw, I notice that in uff model the input should be NCHW format. I manually do this kind of operation explicitly. However, after profiling, it becomes the bottleneck of the whole process.

I would like to know

  1. How NHWC format affect inference performance?
  2. Is it possible to fuse the reshape operation into the model optimized by TensorRT to provide identical interface?


Guess there is something different in the camera format/pre-process, could you check it first?

1. There will be a converter automatically added for NHWC -> NCHW.
2. You can let TensorRT to add the reshape layer for you.


I solve my original problem. The difference comes from the OpenCV channel order. When I switch to BGR instead of RGB, it outputs consistent result with Python version. However, the reshaping issue remains unsolved.

Do you mean that I have to do nothing and TensorRT will deal with the reshaping? If so, It shouldn’t be the result I have seen.
I would like to make sure my understanding of “automatically” is correct.

The model retains NHWC input format in h5 and pb. After applying convert-to-uff, the process preserves the same interface as the original. But actual computation is based on NCHW format implicitly. The process automatically adds an extra reshape layer in the model.

Hi Sorry for reactivating the thread. In our use case, we have to access the weighting in the internal layer. Specifically we extract “conv5_block3_out/add” layer in Resnet50. However, I can still observe non-negligible inconsistency between two versions even in FP32 mode.

After I inspect the layer close to the input layer. I find out the error is accumulating. I guess the inconsistency is caused by error propagation. Is there any method to prove my assumption or maybe to prevent the phenomenon?


Could you share the value difference between the frameworks with us?

We don’t find there is a difference that causing an accuracy issue.
Not sure if this is model relevant.


@AastaLLL Would you mind give me your email? I could send our code and model to you because of the privacy issue.


Would you mind to share it with a private message?

@posutsai Hi ,have you solved the problem? I alse meet the same problem, when i convert NHWC in keras to NCHW in TensorRT, the results are different. Any suggestion? @AastaLLL

Unfortunately, I haven’t solved the issue directly. I discover that the distribution of our output tensor is similar to the correct answer except linearly extending. Hence, I manually divide the average and the output is acceptable for me. However, I really think the issue should be handled in another way. I suggest you send your model to @AstaLLL in private message and see if he can have some comments.

In conclusion, although my project is already closed, this experience really keeps me from adopting TensorRT as an accelerating method again. In my opinion, I would consider to speedup my model with either TVM https://tvm.apache.org/ or Glow https://engineering.fb.com/ml-applications/glow-a-community-driven-approach-to-ai-infrastructure/. I have tried TVM recently if need any further information, I am willing to discuss with you and please send me a mail.