Provide details on the platforms you are using:
Linux distro and version: Ubuntu 16.04.5 LTS (GNU/Linux 4.4.0-133-generic x86_64)
GPU type: Tesla P4
nvidia driver version: 384.81
CUDA version: 9.0
CUDNN version: 7.4
Python version [if using python]: python 3.5
Tensorflow version: 1.10
TensorRT version: TensorRT 5.0.0 RC / Container image 18.10-py3
If Jetson, OS, hw versions: n/a
Describe the problem
I used the integration TF-TRT script sample tftrt_sample.py and adapted it to optimize the inference of my custom model (based on resnetv2_50). It works with the optimized inference in precision modes native, fp32, fp16 but not with int8. See below:
native:
images/s : 38.7 +/- 0.6, s/batch: 0.02582 +/- 0.00042
RES, Native, 1, 38.73, 0.63, 0.02582, 0.00042
fp32:
images/s : 122.6 +/- 2.5, s/batch: 0.00816 +/- 0.00017
RES, TRT-FP32, 1, 122.59, 2.50, 0.00816, 0.00017
fp16: (poor performance is because P4 doesn’t support fp16)
images/s : 107.9 +/- 10.1, s/batch: 0.00927 +/- 0.00086
RES, TRT-FP16, 1, 107.89, 10.10, 0.00927, 0.00086
int8:
Running calibration: ok
Creating inference graph:
[b]terminate called after throwing an instance of 'std::runtime_error'
DefaultLogger Tensor resnet_model/Relu_47 is uniformly zero; network calibration failed.
what(): Could not find tensor InputPH_0 in tensorScales
[/b]
Files
Include any logs, source, models (uff, pd) that would be helpful to diagnose the problem.
INFO:tensorflow:Running against TensorRT version 5.0.0
2018-11-14 00:15:55.819363: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 6
2018-11-14 00:15:57.247483: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:756] MULTIPLE tensorrt candidate conversion: 2
2018-11-14 00:15:57.256576: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2936] Segment @scope 'resnet_model/', converted to graph
2018-11-14 00:15:57.296673: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2936] Segment @scope 'resnet_model/dense/', converted to graph
2018-11-14 00:15:57.313203: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:724] Can't determine the device, constructing an allocator at device 0
2018-11-14 00:17:09.410657: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:724] Can't determine the device, constructing an allocator at device 0
Running Calibration
INFO:tensorflow:Starting execution
2018-11-14 00:17:14.533960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0, 1, 2, 3, 4, 5
2018-11-14 00:17:14.536051: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-14 00:17:14.536208: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 1 2 3 4 5
2018-11-14 00:17:14.536271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N Y Y Y Y Y
2018-11-14 00:17:14.536325: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1: Y N Y Y Y Y
2018-11-14 00:17:14.536377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 2: Y Y N Y Y Y
2018-11-14 00:17:14.536431: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 3: Y Y Y N Y Y
2018-11-14 00:17:14.536484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 4: Y Y Y Y N Y
2018-11-14 00:17:14.536525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 5: Y Y Y Y Y N
2018-11-14 00:17:14.547597: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3803 MB memory) -> physical GPU (device: 0, name: Tesla P4, pci bus id: 0000:21:00.0, compute capability: 6.1)
2018-11-14 00:17:14.549030: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3803 MB memory) -> physical GPU (device: 1, name: Tesla P4, pci bus id: 0000:41:00.0, compute capability: 6.1)
2018-11-14 00:17:14.550100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 3803 MB memory) -> physical GPU (device: 2, name: Tesla P4, pci bus id: 0000:61:00.0, compute capability: 6.1)
2018-11-14 00:17:14.551248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 3803 MB memory) -> physical GPU (device: 3, name: Tesla P4, pci bus id: 0000:81:00.0, compute capability: 6.1)
2018-11-14 00:17:14.552287: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 3803 MB memory) -> physical GPU (device: 4, name: Tesla P4, pci bus id: 0000:a1:00.0, compute capability: 6.1)
2018-11-14 00:17:14.553390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 3803 MB memory) -> physical GPU (device: 5, name: Tesla P4, pci bus id: 0000:c1:00.0, compute capability: 6.1)
INFO:tensorflow:Starting Warmup cycle
2018-11-14 00:17:35.825911: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:567] Starting calibration thread on device 0, Calibration Resource @ 0x7efc44001110
2018-11-14 00:17:41.620515: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:567] Starting calibration thread on device 0, Calibration Resource @ 0x7efc1400e120
INFO:tensorflow:Warmup done. Starting real timing
iter 0 7.145742897987366
Comparison= False
INFO:tensorflow:Timing loop done!
Creating inference graph
2018-11-14 00:25:57.053860: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:153] Starting Calib Conversion
2018-11-14 00:25:57.306475: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:159] Construction of static int8 engine is not implemented yet!. Dynamic engine will be constructed
2018-11-14 00:27:35.888619: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] <b>DefaultLogger Tensor resnet_model/Relu_47 is uniformly zero; network calibration failed.</b>
terminate called after throwing an instance of 'std::runtime_error'
<b>what(): Could not find tensor InputPH_0 in tensorScales.</b>
/home/tftrt_sample_custom.py: line 149: 606 Aborted (core dumped)
command line to reproduce test case
python3 tftrt_sample_custom.py --native --FP32 --FP16 --INT8 --num_loops 10 --topN 5 --batch_size 1 --log_file log.txt --network frozen_graph_1541777429.pb --input_node input_tensor --output_nodes my_sigmoid_tensor --img_size 224 --img_file image.jpg --labellist labellist_custom.json
some recommendations?