Description
I’m using TensorFlow 1.15 and Ubuntu 16.04. At first, I installed TensorRT7 but it gave me an error looking for ‘libnvinfer.so.5’. There is no ‘libnvinfer.so.5’ but a ‘libnvinfer.so.7’ instead. Then I installed TensorRT5 and followed the instructions here. This time I successfully created the optimized graph.
I use Tensorflow’s profiler to generate a timeline.json. There are multiple names I’ve never seen before in the timeline table, such as ‘volta_scudnn_128x32_relu_small_nn_v1’, so I think the profiler is describing the optimized model, not the vanilla one.
However, no improvement appears according to the timeline.json, even if I changed precision mode or minimum segment size. The inference times are nearly the same. My network is purely CNN with a structure similar to U-Net.
Environment
TensorRT Version: 5.0.2.6
GPU Type: RTX 2080ti
Nvidia Driver Version: 460.67
CUDA Version: 10.0
CUDNN Version: 7.6.4
Operating System + Version: Ubuntu 16.04
Python Version: 3.6.13
TensorFlow Version: 1.15.0
Steps To Reproduce
How I installed TensorRT: I downloaded the tar file here, extracted it and installed three whls under its path. Then I added its path to the system’s LD_LIBRARY_PATH.
My codes:
config = tf.ConfigProto(allow_soft_placement=True, graph_options=tf.GraphOptions(
optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0)))
config.gpu_options.allow_growth = True
# load saved model
with tf.gfile.GFile(SAVE_PATH+'_classroom/model/best_model.pb', 'rb') as f:
frozen_graph = tf.GraphDef()
frozen_graph.ParseFromString(f.read())
# create optimized graph
trt_graph = trt.TrtGraphConverter(input_graph_def=frozen_graph, session_config=config,nodes_blacklist=return_elements_list,is_dynamic_op=True,precision_mode=precision,minimum_segment_size=segment).convert()
sess = tf.Session(config=config)
tf.import_graph_def(trt_graph,{'source':model.source},return_elements=return_elements_list)
run_metadata = tf.RunMetadata()
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
sess.run(tf.global_variables_initializer())
while True:
try:
src_hdr_in, tgt_hdr_in = sess.run(next_element_large,
feed_dict={handle_large: test_handle})
src_hdr = np.zeros((test_batch_size, PADDING_HEIGHT, IMAGE_WIDTH, INPUT_CHANNEL))
tgt_hdr = np.zeros((test_batch_size, PADDING_HEIGHT, IMAGE_WIDTH, TARGET_CHANNEL))
src_hdr[:,0:IMAGE_HEIGHT,:,:] = src_hdr_in
tgt_hdr[:,0:IMAGE_HEIGHT,:,:] = tgt_hdr_in
feed_dict = {model.source: src_hdr}
output_tensor = sess.graph.get_tensor_by_name(output_tensor_name)
denoised_1_bd = sess.run(output_tensor, feed_dict, options=run_options, run_metadata=run_metadata)
# ...