From my frozen graph. I created tensorRT graph. My frozen_graph was working fine on same system. but when I tried same code with tensorRT converted graph. got error described below..

cuda - 9.0
cudnn - 7.0
tensorflow - 1.10
tensorRT - 4

2020-03-13 23:11:29.112484: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:491] FeatureExtractor/InceptionV2/my_trt_op_2 Constructing a new engine with batch size 128
2020-03-13 23:11:29.113348: W tensorflow/contrib/tensorrt/log/trt_logger.cc:34] DefaultLogger Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
2020-03-13 23:11:29.205325: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger cudnnEngine.cpp (92) - Cuda Error in initializeCommonContext: 1 (Could not initialize cublas, please check cuda installation.)
2020-03-13 23:11:29.205480: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger cudnnEngine.cpp (92) - Cuda Error in initializeCommonContext: 1 (Could not initialize cublas, please check cuda installation.)
2020-03-13 23:11:29.205532: E tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:505] Engine creation for batch size 128 failed Internal: Failed to build TensorRT engine
2020-03-13 23:11:29.205537: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:283] Engine retrieval for batch size 1 failed Running native segment
Traceback (most recent call last):
File “/home/everestlabs/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1278, in _do_call
return fn(*args)
File “/home/everestlabs/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1263, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File “/home/everestlabs/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1350, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Engine creation failed!
[[Node: FeatureExtractor/InceptionV2/my_trt_op_2 = TRTEngineOpInT=[DT_FLOAT], OutT=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], cached_engine_batches=[128], calibration_data=“”, fixed_input_size=true, input_shapes=[[?,192,?,?]], max_cached_engines_count=10, output_shapes=[[?,64,?,?], [?,64,?,?], [?,64,?,?], [?,32,?,?]], precision_mode=“FP16”, segment_funcdef_name=“FeatureExtractor/InceptionV2/my_trt_op_2_native_segment”, serialized_segment=“\n^\n\tInp…o\022\203:”, static_engine=false, workspace_size_bytes=47093940, _device=“/job:localhost/replica:0/task:0/device:GPU:0”]]
[[Node: Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Reshape_6/_97 = _Recvclient_terminated=false, recv_device=“/job:localhost/replica:0/task:0/device:CPU:0”, send_device=“/job:localhost/replica:0/task:0/device:GPU:0”, send_device_incarnation=1, tensor_name=“edge_1094_…/Reshape_6”, tensor_type=DT_FLOAT, _device=“/job:localhost/replica:0/task:0/device:CPU:0”]]

tfrt funtion :
trt_graph = trt.create_inference_graph(input_graph_def=graph_def,
outputs=your_outputs,
max_batch_size=128,
max_workspace_size_bytes=1<<30,
precision_mode=“FP16”,
minimum_segment_size=10,
is_dynamic_op=True,
maximum_cached_engines=10)

Note due to the large codebase, can’t post anything here. It would be really helpful if some suggestions are given based on error.

the concern is if it was Cuda error, the unoptimized graph shouldn’t have run. so Cuda error I couldn’t get it.
Thank you