inference.py sample not working with vgg_16 and vgg_19 models

Provide details on the platforms you are using:
Linux distro and version: Ubuntu 16.04.5 LTS (GNU/Linux 4.4.0-133-generic x86_64)
GPU type: Tesla P4
nvidia driver version: 384.81
CUDA version: 9.0
CUDNN version: 7.4
Python version [if using python]: python 3.5
Tensorflow version: 1.10
TensorRT version: TensorRT 5.0.0 RC / Container image 18.10-py3
If Jetson, OS, hw versions: n/a

Describe the problem:
I am using the script sample inference.py to run the inference TF-TRT5 with different models, all models are working except vgg_16’ ‘vgg_19’ which are throwing memory errors and failing when building TensorRT engine INT8, see below:

VGG_16

2018-11-14 23:22:01.413486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: Tesla P4 major: 6 minor: 1 memoryClockRate(GHz): 1.1135
pciBusID: 0000:3b:00.0
totalMemory: 7.43GiB freeMemory: 7.31GiB
2018-11-14 23:22:02.162996: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 1 with properties:
name: Tesla P4 major: 6 minor: 1 memoryClockRate(GHz): 1.1135
pciBusID: 0000:5e:00.0
totalMemory: 7.43GiB freeMemory: 7.31GiB
2018-11-14 23:22:02.890455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 2 with properties:
name: Tesla P4 major: 6 minor: 1 memoryClockRate(GHz): 1.1135
pciBusID: 0000:d8:00.0
totalMemory: 7.43GiB freeMemory: 7.31GiB
2018-11-14 23:22:02.897646: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0, 1, 2
2018-11-14 23:22:04.324507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-14 23:22:04.324564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0 1 2
2018-11-14 23:22:04.324571: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N Y Y
2018-11-14 23:22:04.324593: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1:   Y N Y
2018-11-14 23:22:04.324598: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 2:   Y Y N
2018-11-14 23:22:04.325264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7029 MB memory) -> physical GPU (device: 0, name: Tesla P4, pci bus id: 0000:3b:00.0, compute capability: 6.1)
2018-11-14 23:22:04.326184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 7029 MB memory) -> physical GPU (device: 1, name: Tesla P4, pci bus id: 0000:5e:00.0, compute capability: 6.1)
2018-11-14 23:22:04.326857: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 7029 MB memory) -> physical GPU (device: 2, name: Tesla P4, pci bus id: 0000:d8:00.0, compute capability: 6.1)
Using checkpoint found at: /home/dell/inference_trt5/pretrained_models/vgg_16/vgg_16.ckpt
2018-11-14 23:22:09.804605: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 3
2018-11-14 23:22:15.170533: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2936] Segment @scope 'vgg_16/', converted to graph
2018-11-14 23:22:15.354693: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:724] Can't determine the device, constructing an allocator at device 0
Cuda error in file src/implicit_gemm.cu at line 585: out of memory
2018-11-14 23:22:39.054257: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger cuda/cudaFusedConvActLayer.cpp (277) - Cuda Error in executeFused: 2
2018-11-14 23:22:39.074433: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger cuda/cudaFusedConvActLayer.cpp (277) - Cuda Error in executeFused: 2
2018-11-14 23:22:39.102904: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:857] Engine creation for segment 0, composed of 89 nodes failed: Internal: Failed to build TensorRT engine. Skipping...
Calibrating INT8...

VGG_19:

2018-11-14 23:23:58.162531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: Tesla P4 major: 6 minor: 1 memoryClockRate(GHz): 1.1135
pciBusID: 0000:3b:00.0
totalMemory: 7.43GiB freeMemory: 7.31GiB
2018-11-14 23:23:58.859691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 1 with properties:
name: Tesla P4 major: 6 minor: 1 memoryClockRate(GHz): 1.1135
pciBusID: 0000:5e:00.0
totalMemory: 7.43GiB freeMemory: 7.31GiB
2018-11-14 23:23:59.612628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 2 with properties:
name: Tesla P4 major: 6 minor: 1 memoryClockRate(GHz): 1.1135
pciBusID: 0000:d8:00.0
totalMemory: 7.43GiB freeMemory: 7.31GiB
2018-11-14 23:23:59.614979: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0, 1, 2
2018-11-14 23:24:00.973007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-14 23:24:00.973068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0 1 2
2018-11-14 23:24:00.973076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N Y Y
2018-11-14 23:24:00.973098: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1:   Y N Y
2018-11-14 23:24:00.973104: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 2:   Y Y N
2018-11-14 23:24:00.973720: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7029 MB memory) -> physical GPU (device: 0, name: Tesla P4, pci bus id: 0000:3b:00.0, compute capability: 6.1)
2018-11-14 23:24:00.974696: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 7029 MB memory) -> physical GPU (device: 1, name: Tesla P4, pci bus id: 0000:5e:00.0, compute capability: 6.1)
2018-11-14 23:24:00.976583: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 7029 MB memory) -> physical GPU (device: 2, name: Tesla P4, pci bus id: 0000:d8:00.0, compute capability: 6.1)
Using checkpoint found at: /home/dell/inference_trt5/pretrained_models/vgg_19/vgg_19.ckpt
2018-11-14 23:24:06.581198: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 3
2018-11-14 23:24:12.381253: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2936] Segment @scope 'vgg_19/', converted to graph
2018-11-14 23:24:12.658102: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:724] Can't determine the device, constructing an allocator at device 0
Cuda error in file src/implicit_gemm.cu at line 585: out of memory
2018-11-14 23:24:38.190669: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger cuda/cudaFusedConvActLayer.cpp (277) - Cuda Error in executeFused: 2
2018-11-14 23:24:38.207207: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger cuda/cudaFusedConvActLayer.cpp (277) - Cuda Error in executeFused: 2
2018-11-14 23:24:38.231940: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:857] Engine creation for segment 0, composed of 104 nodes failed: Internal: Failed to build TensorRT engine. Skipping...
Calibrating INT8...

command line to reproduce test case
vgg_16

python3 inference.py --model vgg_16 --precision int8 --use_trt --cache --batch_size 1

vgg_19

python3 inference.py --model vgg_19 --precision int8 --use_trt --cache --batch_size 1

some tips to make it work?

Hello,
can you describe which inference.py are you using? Also, can you describe what is inference_original.py?

Hi NVES, please disregard the file inference_original.py.

I am using the inference.py version that comes with the container image nvcr.io/nvidia/tensorflow:18.10-py3:

nvidia-docker run -it --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864-v /tmp/:/tmp/ nvcr.io/nvidia/tensorflow:18.10-py3

located at

cd /workspace/nvidia-examples/inference/image-classification/scripts
ll
-rw-rw-r-- 1 root root  2538 Oct 21 15:50 README.md
-rw-rw-r-- 1 root root  1491 Oct 21 15:50 check_accuracy.py
-rw-rw-r-- 1 root root  9926 Oct 21 15:50 classification.py
-rw-rw-r-- 1 root root 13069 Oct 21 15:50 <b>inference.py</b>