the result infer a engine has a huge difference

hello, I convert my trained ckpt model to frozen file myModel.pb and then infer it, the result is consistent with using the original ckpt. But after I convert the above pb file to uff file, then create a engine and infer it, the result has a huge different what causes this problem? can you give some advises?

wait, I will check it again.

Hello,

Depending on what you see, please share a repro containing your .pb, code converting it to uff/engine, and inferencing example that demonstrates the inconsistencies you are seeing.

regards,
NVIDIA Enterprise Support

@NVES thanks for your prompt reply

my code is https://github.com/IvyGongoogle/tensorrt-east

please read the README.md to see how to use it.

Notice:
1.For 1376_800.engine, the output node is model_7/feature_fusion/Conv_7/Sigmoid, and I finally want to get this output node mat with height and width both is the 1/4 of input node mat’s.
2. the result.jpg is output node mat result that I want

Linux distro and version:

LSB Version:	:core-4.1-amd64:core-4.1-noarch
Distributor ID:	CentOS
Description:	CentOS Linux release 7.4.1708 (Core)
Release:	7.4.1708
Codename:	Core

other envirs:

GPU type: Tesla v100
nvidia driver version: NVIDIA-SMI 396.44
CUDA version: 9.0
CUDNN version: 7.3.0
Python version [if using python]: python2.7
TensorRT version: 5.0.2.6
gcc>5.3/lib64

looking forward your reply…

Hello,

If I understand correctly, the repro you uploaded only contains the converted TRT engine and inference. To see the difference, can you also share the following?

  1. your trained ckpt model, and code to freeze file myModel.pb and then infer it (showing consistent result with using the original ckpt).
  2. code that convert the above pb file to uff file, then create an engine.

@NVES I just add the code. please check again. I very grateful that you can debug my code. thanks a million.

Hello,

I think the files uploaded are not complete. Getting the following error when running python freezeGraph.py
. I git cloned your repro.

2018-11-26 21:56:01.809587: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Data loss: Checksum does not match: stored 1214729159 vs. calculated on the restored bytes 4272926173
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.DataLossError: Checksum does not match: stored 1214729159 vs. calculated on the restored bytes 4272926173
         [[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

@NVES I find the cause of the problem. Actually the results of the c++ and python code both are correct.

I was misguided by the io.imsave function https://github.com/IvyGongoogle/tensorrt-east/blob/master/tensorrt-infer.py#L161. It saves a mat with 0-1 float values to a binary image, which looks like having some white spaces in image so that I still regards this saved image by io.imsave as correct result.
In fact, the saved image with 0-1 float values by cv::imwrite https://github.com/IvyGongoogle/tensorrt-east/blob/master/tensorrtNet.cpp#L136 is correct, which looks like is a black image.