hello, I convert my trained ckpt model to frozen file myModel.pb
and then infer it, the result is consistent with using the original ckpt
. But after I convert the above pb
file to uff
file, then create a engine and infer it, the result has a huge different what causes this problem? can you give some advises?
wait, I will check it again.
Hello,
Depending on what you see, please share a repro containing your .pb, code converting it to uff/engine, and inferencing example that demonstrates the inconsistencies you are seeing.
regards,
NVIDIA Enterprise Support
@NVES thanks for your prompt reply
my code is https://github.com/IvyGongoogle/tensorrt-east
please read the README.md to see how to use it.
Notice:
1.For 1376_800.engine
, the output node is model_7/feature_fusion/Conv_7/Sigmoid
, and I finally want to get this output node mat with height and width both is the 1/4 of input node mat’s.
2. the result.jpg is output node mat result that I want
Linux distro and version:
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.4.1708 (Core)
Release: 7.4.1708
Codename: Core
other envirs:
GPU type: Tesla v100
nvidia driver version: NVIDIA-SMI 396.44
CUDA version: 9.0
CUDNN version: 7.3.0
Python version [if using python]: python2.7
TensorRT version: 5.0.2.6
gcc>5.3/lib64
looking forward your reply…
Hello,
If I understand correctly, the repro you uploaded only contains the converted TRT engine and inference. To see the difference, can you also share the following?
- your trained ckpt model, and code to freeze file
myModel.pb
and then infer it (showing consistent result with using the originalckpt
). - code that convert the above
pb
file touff
file, then create an engine.
@NVES I just add the code. please check again. I very grateful that you can debug my code. thanks a million.
Hello,
I think the files uploaded are not complete. Getting the following error when running python freezeGraph.py
. I git cloned your repro.
2018-11-26 21:56:01.809587: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Data loss: Checksum does not match: stored 1214729159 vs. calculated on the restored bytes 4272926173
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.DataLossError: Checksum does not match: stored 1214729159 vs. calculated on the restored bytes 4272926173
[[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
@NVES I find the cause of the problem. Actually the results of the c++ and python code both are correct.
I was misguided by the io.imsave
function [url]https://github.com/IvyGongoogle/tensorrt-east/blob/master/tensorrt-infer.py#L161[/url]. It saves a mat with 0-1 float values to a binary image, which looks like having some white spaces in image so that I still regards this saved image by io.imsave
as correct result.
In fact, the saved image with 0-1 float values by cv::imwrite
[url]https://github.com/IvyGongoogle/tensorrt-east/blob/master/tensorrtNet.cpp#L136[/url] is correct, which looks like is a black image.