No improvement seen using TFTRT

lain_iw · May 2, 2021, 1:41am

Description

I’m using TensorFlow 1.15 and Ubuntu 16.04. At first, I installed TensorRT7 but it gave me an error looking for ‘libnvinfer.so.5’. There is no ‘libnvinfer.so.5’ but a ‘libnvinfer.so.7’ instead. Then I installed TensorRT5 and followed the instructions here. This time I successfully created the optimized graph.

I use Tensorflow’s profiler to generate a timeline.json. There are multiple names I’ve never seen before in the timeline table, such as ‘volta_scudnn_128x32_relu_small_nn_v1’, so I think the profiler is describing the optimized model, not the vanilla one.

However, no improvement appears according to the timeline.json, even if I changed precision mode or minimum segment size. The inference times are nearly the same. My network is purely CNN with a structure similar to U-Net.

Environment

TensorRT Version: 5.0.2.6
GPU Type: RTX 2080ti
Nvidia Driver Version: 460.67
CUDA Version: 10.0
CUDNN Version: 7.6.4
Operating System + Version: Ubuntu 16.04
Python Version: 3.6.13
TensorFlow Version: 1.15.0

Steps To Reproduce

How I installed TensorRT: I downloaded the tar file here, extracted it and installed three whls under its path. Then I added its path to the system’s LD_LIBRARY_PATH.

My codes:

    config = tf.ConfigProto(allow_soft_placement=True, graph_options=tf.GraphOptions(
        optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0)))
    config.gpu_options.allow_growth = True
    
    # load saved model
    with tf.gfile.GFile(SAVE_PATH+'_classroom/model/best_model.pb', 'rb') as f:
        frozen_graph = tf.GraphDef()
        frozen_graph.ParseFromString(f.read())

    # create optimized graph
    trt_graph = trt.TrtGraphConverter(input_graph_def=frozen_graph, session_config=config,nodes_blacklist=return_elements_list,is_dynamic_op=True,precision_mode=precision,minimum_segment_size=segment).convert()

    sess = tf.Session(config=config)
    tf.import_graph_def(trt_graph,{'source':model.source},return_elements=return_elements_list)
    run_metadata = tf.RunMetadata()
    run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
    sess.run(tf.global_variables_initializer())
    while True:
        try:
            src_hdr_in, tgt_hdr_in = sess.run(next_element_large,
                feed_dict={handle_large: test_handle})
            src_hdr = np.zeros((test_batch_size, PADDING_HEIGHT, IMAGE_WIDTH, INPUT_CHANNEL))
            tgt_hdr = np.zeros((test_batch_size, PADDING_HEIGHT, IMAGE_WIDTH, TARGET_CHANNEL))            
            src_hdr[:,0:IMAGE_HEIGHT,:,:] = src_hdr_in
            tgt_hdr[:,0:IMAGE_HEIGHT,:,:] = tgt_hdr_in
            feed_dict = {model.source: src_hdr}
            output_tensor = sess.graph.get_tensor_by_name(output_tensor_name)
            denoised_1_bd = sess.run(output_tensor, feed_dict, options=run_options, run_metadata=run_metadata)
    # ...

NVES · May 3, 2021, 7:22am

Hi,
Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/best-practices/index.html#measure-performance

Thanks!

lain_iw · May 6, 2021, 4:03am

Hi. Thanks for replying.

I now convert the pb model to uff to avoid using TFTRT. Then I do inference by creating a tensorrt engine referring to the official python samples. I encounter new problems however, it seems that TensorRT doesn’t support tf.nn.space_to_depth and tf.nn.depth_to_space. Would using ONNX model solve this problem?

I notice that there are people who want to use tf.nn.space_to_depth and is recommended to implement it with tf.reshape and tf.transform. Does this mean that TensorRT doesn’t support these operations so I have to implement them on my own? However, the sample codes are using NumPy staffs. I’m wondering whether the results would be the same if I replace the NumPy functions with their TensorFlow counterparts.

Also, I still don’t know how to implement tf.nn.depth_to_space. Could you show me some codes? Thanks!

spolisetty · May 18, 2021, 3:29pm

Hi @lain_iw,

We recommend you to use latest TensorRT version. Regarding installation issue, looks like you were getting dependency related issue, please follow the installation doc for correct steps.

Alternatively you can you TensorRT NGC container to avoid system dependency issues.

UFF and Caffe Parser have been deprecated from TensorRT 7 onwards, hence request you to try ONNX parser. Please check the below link for the same. GitHub - onnx/onnx-tensorrt: ONNX-TensorRT: TensorRT backend for ONNX

Thank you.

Topic		Replies	Views
No improvement in inference performance after Opt. with TensorRT TensorRT	6	1230	April 15, 2020
No performance improvement with TF-TRT optimization (ResNet50, DenseNet121) TensorRT	4	1098	June 15, 2020
Inference time using TF-TRT is the same as Native Tensorflow for Object Detection Models TensorRT tensorrt , tf-trt	4	1019	March 31, 2022
Conversion with no speed improvement, TRT-TF TensorRT	2	1141	October 12, 2021
TRT issue with Graph Creation - TRTEngineOP TensorRT	12	3144	November 4, 2019
problem with TFTRT TensorRT	4	1632	January 30, 2019
TensorRT can't speed on TensorFlow model Frameworks (archived) tensorflow	1	852	August 7, 2019
Dont see any speedups using TensorRT TensorRT	14	2979	October 12, 2021
TensorRT (TF-TRT) doesn't improve TF model in GeForce 1060? TensorRT	7	2938	January 18, 2019
Inference Time is not stable TensorRT	10	1757	January 3, 2019

No improvement seen using TFTRT

Description

Environment

Steps To Reproduce

Related topics