using trt.float32 and trt.float16 have the same speed

Hello, sorry to bother you again.

First question:
When I use trt.float32 to create a engine and infer one image 1376x800, the pure infer time is about 25ms. Then I change trt.float32 here https://github.com/IvyGongoogle/tensorrt-east/blob/master/tensorrt-infer.py#L71 to trt.float16, the pure infer time is still about 25ms. why is not using fp16 more faster than fp32?

Second question:
Besides, when I change trt.float32 here https://github.com/IvyGongoogle/tensorrt-east/blob/master/tensorrt-infer.py#L71 to trt.int8, it shows error:

[TensorRT] ERROR: runtime.cpp (30) - Cuda Error in free: 77
terminate called after throwing an instance of 'nvinfer1::CudaError'
  what():  std::exception
Aborted

I do not understand what causes this error.

my code is https://github.com/IvyGongoogle/tensorrt-east

Linux distro and version:

LSB Version:	:core-4.1-amd64:core-4.1-noarch
Distributor ID:	CentOS
Description:	CentOS Linux release 7.4.1708 (Core)
Release:	7.4.1708
Codename:	Core

other envirs:

GPU type: Tesla v100
nvidia driver version: NVIDIA-SMI 396.44
CUDA version: 9.0
CUDNN version: 7.3.0
Python version [if using python]: python2.7
TensorRT version: 5.0.2.6
tensorflow-gpu:1.4.1
gcc>5.3/lib64

AndrewGong,

I’m getting a mismatch on the checksum you uploaded. Can you confirm that it’s a good checkpoint?

tensorflow.python.framework.errors_impl.DataLossError: Checksum does not match: stored 1214729159 vs. calculated on the restored bytes 4272926173
[[node save/RestoreV2 (defined at freezeGraph.py:8) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, …, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device=“/job:localhost/replica:0/task:0/device:CPU:0”](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

@kschlichter thanks for your reply.
Sorry, I uploaded an invalid ckpt file. Now you can get the correct one in my latest repo.

I’ve still got the same checksum mismatch with same stored (1214729159) and calculated (4272926173) values. Here’s a longer snippet:

2018-12-05 17:26:36.716913: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Data loss: Checksum does not match: stored 1214729159 vs. calculated on the restored bytes 4272926173
Traceback (most recent call last):
File “freezeGraph.py”, line 9, in
saver.restore(sess,tf.train.latest_checkpoint(“./ckpt/”))
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py”, line 1546, in restore
{self.saver_def.filename_tensor_name: save_path})
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 929, in run
run_metadata_ptr)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 1152, in _run
feed_dict_tensor, options, run_metadata)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 1328, in _do_run
run_metadata)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.DataLossError: Checksum does not match: stored 1214729159 vs. calculated on the restored bytes 4272926173
[[node save/RestoreV2 (defined at freezeGraph.py:8) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, …, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device=“/job:localhost/replica:0/task:0/device:CPU:0”](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op u’save/RestoreV2’, defined at:
File “freezeGraph.py”, line 8, in
saver = tf.train.import_meta_graph(meta_path)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py”, line 1674, in import_meta_graph
meta_graph_or_file, clear_devices, import_scope, **kwargs)[0]
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py”, line 1696, in _import_meta_graph_with_return_elements
**kwargs))
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/meta_graph.py”, line 806, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py”, line 488, in new_func
return func(*args, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.py”, line 442, in import_graph_def
_ProcessNewOps(graph)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.py”, line 234, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py”, line 3440, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py”, line 3299, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py”, line 1770, in init
self._traceback = tf_stack.extract_stack()

DataLossError (see above for traceback): Checksum does not match: stored 1214729159 vs. calculated on the restored bytes 4272926173
[[node save/RestoreV2 (defined at freezeGraph.py:8) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, …, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device=“/job:localhost/replica:0/task:0/device:CPU:0”](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

@KevinSchlichter thanks for your reply. Sorry that I forget to tell you I use tensorflow-gpu 1.4.1 and it works in my envs. I has updated my issue about my work env. Which version of tensorflow do you use?

I’m using 1.12.0 from the tensorrt:18.08-py2 container. Are you using a container from ngc.nvidia.com?

@KevinSchlichter thanks for your reply. I do not use the container from ngc.nvidia.com. I use tensorrt from [url]https://developer.nvidia.com/nvidia-tensorrt-5x-download[/url]

I’ve installed the tensorflow-18.01-py2 container, which uses TensorFlow 1.4.0 (not 1.4.1) and installed TensorRT 5.0.2.6 but I’m still seeing the same checksum error. For the sake of troubleshooting, could you try the TensorRT-18.09-py2 container from ngc.nvidia.com and generate a checksum that’ll work for us? That should let us focus on the question regarding trt.float32 and trt.float16.