TensorRT 4: subgraph conversion error for subgraph_index:1 due to: "Unimplemented: Not supported constant type...

I am running the example tensorrt.py via container (image nvcr.io/nvidia/tensorflow:18.07-py3) using my own frozen graph file but it throws the errors “subgraph conversion error for subgraph_index:1” when running FP32 graph and FP16 graph. The test worked fine when running native graph. Here is the partial output showing the errors:

Running FP32 graph
2018-08-17 16:13:00.195862: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 4
2018-08-17 16:13:00.322100: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:419] MULTIPLE tensorrt candidate conversion: 2
2018-08-17 16:13:00.328255: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3161] Max batch size= 8 max workspace size= 19046418
2018-08-17 16:13:00.328273: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3167] starting build engine
2018-08-17 16:13:00.403277: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3172] Built network
2018-08-17 16:13:00.404848: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3177] Serialized engine
2018-08-17 16:13:00.405600: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3185] finished engine dense/my_trt_op0 containing 4 nodes
2018-08-17 16:13:00.405641: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3192] Finished op preparation
2018-08-17 16:13:00.405692: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3200] OK finished op building
2018-08-17 16:13:00.412386: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:454] subgraph conversion error for subgraph_index:1 due to: “Unimplemented: Not supported constant type, at batch_normalization_41/Const_1” SKIPPING…( 447 nodes)
INFO:tensorflow:Starting execution
2018-08-17 16:13:01.412522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0, 1, 2, 3
2018-08-17 16:13:01.412630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-17 16:13:01.412646: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 1 2 3
2018-08-17 16:13:01.412670: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N Y Y Y
2018-08-17 16:13:01.412679: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 1: Y N Y Y
2018-08-17 16:13:01.412689: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 2: Y Y N Y
2018-08-17 16:13:01.412700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 3: Y Y Y N
2018-08-17 16:13:01.413558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8080 MB memory) → physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1a:00.0, compute capability: 7.0)
2018-08-17 16:13:01.413728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 8080 MB memory) → physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1c:00.0, compute capability: 7.0)
2018-08-17 16:13:01.413868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 8080 MB memory) → physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1d:00.0, compute capability: 7.0)
2018-08-17 16:13:01.413974: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 8080 MB memory) → physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1e:00.0, compute capability: 7.0)
INFO:tensorflow:Starting Warmup cycle
INFO:tensorflow:Starting timing.
INFO:tensorflow:Timing loop done!
Running FP16 graph
2018-08-17 16:13:04.973382: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 4
2018-08-17 16:13:05.092119: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:419] MULTIPLE tensorrt candidate conversion: 2
2018-08-17 16:13:05.102842: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3161] Max batch size= 8 max workspace size= 19046418
2018-08-17 16:13:05.102904: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3165] Using FP16 precision mode
2018-08-17 16:13:05.102925: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3167] starting build engine
2018-08-17 16:13:05.344161: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3172] Built network
2018-08-17 16:13:05.345806: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3177] Serialized engine
2018-08-17 16:13:05.345936: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3185] finished engine dense/my_trt_op2 containing 4 nodes
2018-08-17 16:13:05.345968: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3192] Finished op preparation
2018-08-17 16:13:05.346002: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3200] OK finished op building
2018-08-17 16:13:05.352164: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:454] subgraph conversion error for subgraph_index:1 due to: “Unimplemented: Not supported constant type, at batch_normalization_52/Const_1” SKIPPING…( 447 nodes)
INFO:tensorflow:Starting execution
2018-08-17 16:13:06.391039: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0, 1, 2, 3
2018-08-17 16:13:06.391138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-17 16:13:06.391153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 1 2 3
2018-08-17 16:13:06.391166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N Y Y Y
2018-08-17 16:13:06.391177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 1: Y N Y Y
2018-08-17 16:13:06.391188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 2: Y Y N Y
2018-08-17 16:13:06.391199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 3: Y Y Y N
2018-08-17 16:13:06.392053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8080 MB memory) → physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1a:00.0, compute capability: 7.0)
2018-08-17 16:13:06.392213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 8080 MB memory) → physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1c:00.0, compute capability: 7.0)
2018-08-17 16:13:06.392389: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 8080 MB memory) → physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1d:00.0, compute capability: 7.0)
2018-08-17 16:13:06.392503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 8080 MB memory) → physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1e:00.0, compute capability: 7.0)
INFO:tensorflow:Starting Warmup cycle
INFO:tensorflow:Starting timing.
INFO:tensorflow:Timing loop done!

Some tips on how to solve it?

Issue solved training and exporting the saved model model with Tensorflow instead fo Keras