No improvement seen using TFTRT

Description

I’m using TensorFlow 1.15 and Ubuntu 16.04. At first, I installed TensorRT7 but it gave me an error looking for ‘libnvinfer.so.5’. There is no ‘libnvinfer.so.5’ but a ‘libnvinfer.so.7’ instead. Then I installed TensorRT5 and followed the instructions here. This time I successfully created the optimized graph.

I use Tensorflow’s profiler to generate a timeline.json. There are multiple names I’ve never seen before in the timeline table, such as ‘volta_scudnn_128x32_relu_small_nn_v1’, so I think the profiler is describing the optimized model, not the vanilla one.

However, no improvement appears according to the timeline.json, even if I changed precision mode or minimum segment size. The inference times are nearly the same. My network is purely CNN with a structure similar to U-Net.

Environment

TensorRT Version: 5.0.2.6
GPU Type: RTX 2080ti
Nvidia Driver Version: 460.67
CUDA Version: 10.0
CUDNN Version: 7.6.4
Operating System + Version: Ubuntu 16.04
Python Version: 3.6.13
TensorFlow Version: 1.15.0

Steps To Reproduce

How I installed TensorRT: I downloaded the tar file here, extracted it and installed three whls under its path. Then I added its path to the system’s LD_LIBRARY_PATH.

My codes:

    config = tf.ConfigProto(allow_soft_placement=True, graph_options=tf.GraphOptions(
        optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0)))
    config.gpu_options.allow_growth = True
    
    # load saved model
    with tf.gfile.GFile(SAVE_PATH+'_classroom/model/best_model.pb', 'rb') as f:
        frozen_graph = tf.GraphDef()
        frozen_graph.ParseFromString(f.read())

    # create optimized graph
    trt_graph = trt.TrtGraphConverter(input_graph_def=frozen_graph, session_config=config,nodes_blacklist=return_elements_list,is_dynamic_op=True,precision_mode=precision,minimum_segment_size=segment).convert()

    sess = tf.Session(config=config)
    tf.import_graph_def(trt_graph,{'source':model.source},return_elements=return_elements_list)
    run_metadata = tf.RunMetadata()
    run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
    sess.run(tf.global_variables_initializer())
    while True:
        try:
            src_hdr_in, tgt_hdr_in = sess.run(next_element_large,
                feed_dict={handle_large: test_handle})
            src_hdr = np.zeros((test_batch_size, PADDING_HEIGHT, IMAGE_WIDTH, INPUT_CHANNEL))
            tgt_hdr = np.zeros((test_batch_size, PADDING_HEIGHT, IMAGE_WIDTH, TARGET_CHANNEL))            
            src_hdr[:,0:IMAGE_HEIGHT,:,:] = src_hdr_in
            tgt_hdr[:,0:IMAGE_HEIGHT,:,:] = tgt_hdr_in
            feed_dict = {model.source: src_hdr}
            output_tensor = sess.graph.get_tensor_by_name(output_tensor_name)
            denoised_1_bd = sess.run(output_tensor, feed_dict, options=run_options, run_metadata=run_metadata)
    # ...

Hi,
Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/best-practices/index.html#measure-performance

Thanks!

Hi. Thanks for replying.

I now convert the pb model to uff to avoid using TFTRT. Then I do inference by creating a tensorrt engine referring to the official python samples. I encounter new problems however, it seems that TensorRT doesn’t support tf.nn.space_to_depth and tf.nn.depth_to_space. Would using ONNX model solve this problem?

I notice that there are people who want to use tf.nn.space_to_depth and is recommended to implement it with tf.reshape and tf.transform. Does this mean that TensorRT doesn’t support these operations so I have to implement them on my own? However, the sample codes are using NumPy staffs. I’m wondering whether the results would be the same if I replace the NumPy functions with their TensorFlow counterparts.

Also, I still don’t know how to implement tf.nn.depth_to_space. Could you show me some codes? Thanks!