TF-TRTModel loading time is very slow

anirudh.badrinath99 · July 13, 2023, 6:12am

Description

I am converting my tensorflow SavedModel to TF-TRT using the following code and I am saving the converted model to SavedModel format.

“”"
if precision==“FP32”:
precision = trt_convert.TrtPrecisionMode.FP32
elif precision==“FP16”:
precision = trt_convert.TrtPrecisionMode.FP16
else:
raise ValueError(‘Invalid Precision Mode’)

conversion_params = trt_convert.TrtConversionParams(precision_mode=precision)
converter = trt_convert.TrtGraphConverterV2(input_saved_model_dir=model_path,
                                    conversion_params=conversion_params)
converter.convert()
converted_model_path = os.path.join(model_path,"converted_model")
converter.save(converted_model_path)
print("*****The converted tfrt model has been saved at ",converted_model_path)

“”"

The issue is that the load time is very high (around 1 min) when I try to load the converted model in the SavedModel format. My system spec is Intel Xeon silver 32 cores, 32GB RAM, RTX 3030ti.

Environment

TensorRT Version: 7.2.1.6
GPU Type: RTX 3080ti
Nvidia Driver Version: 470.103.01
CUDA Version: 11.4
CUDNN Version:
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable): 2.3.1
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): http://nvcr.io/nvidia/tensorflow:20.11-tf2-py3

AakankshaS · July 13, 2023, 1:07pm

Hi,

Request you to share the model, script, profiler, and performance output if not shared already so that we can help you better.

Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer to the below links for more details:

Thanks!

anirudh.badrinath99 · July 14, 2023, 7:20am

@AakankshaS I don’t have any issues with the throughput of the tf-trt model. The only problem that I face is that the load time of the serialized tf_trt-converted SavedModel is very high.

spolisetty · July 17, 2023, 4:16am

Hi,

We recommend that you please try the latest TensorRT version 8.6.1.
If you still face this issue, please share with us minimal issue repro model, scripts and steps.

Thank you.

anirudh.badrinath99 · July 18, 2023, 7:57am

def performance_testing(model_path):
    root = tf.saved_model.load(model_path)
    infer = root.signatures['serving_default']
    output_tensorname = list(infer.structured_outputs.keys())[0]
    INFERENCE_STEPS = 2000
    WARMUP_STEPS = 300
    features = tf.random.uniform((1,320,544,3))
    print(f"**********{features.dtype}************")
    try:
        step_times = list()
        step = 1
        for step in range(1, INFERENCE_STEPS + 1):
            if step % 100 == 0:
                print("Processing step: %04d ..." % step)
            start_t = time.perf_counter()
            mask = infer(features)[output_tensorname].numpy()
            step_time = time.perf_counter() - start_t
            if step >= WARMUP_STEPS:
                step_times.append(step_time)
    except tf.errors.OutOfRangeError:
        pass

    avg_step_time = statistics.mean(step_times)
    print("\nAverage step time: %.1f msec" % (avg_step_time * 1e3))
    print("Average throughput: %d samples/sec" % (
        1 / avg_step_time
    ))

Here’s the code to replicate my issue. It loads the serialized tf-trt SavedModel and then runs forward passes. It takes a minimum of 40 seconds to load the model. I have attached a converted Savedmodel that can be loaded and run with this code
converted_model.zip (75.8 MB)

spolisetty · July 20, 2023, 12:28pm

Hi,

Could you please share the output of the above script on your system.

Thank you.

anirudh.badrinath99 · July 24, 2023, 9:19am

Please find the attached output file.
I added just one extra print line that prints the time taken from the start of the script up until the first forward pass, just to give you more context.

def performance_testing(model_path):
    model_load_start = time.perf_counter()
    root = tf.saved_model.load(model_path)
    infer = root.signatures['serving_default']
    output_tensorname = list(infer.structured_outputs.keys())[0]
    INFERENCE_STEPS = 2000
    WARMUP_STEPS = 300
    features = tf.random.uniform((1,320,544,3))
    print(f"**********{features.dtype}************")
    try:
        step_times = list()
        step = 1
        for step in range(1, INFERENCE_STEPS + 1):
            if step % 100 == 0:
                print("Processing step: %04d ..." % step)
            start_t = time.perf_counter()
            # predict_and_average_masks(infer,features)
            mask = infer(features)[output_tensorname].numpy()
            # mask = tf.nn.softmax(tf.squeeze(mask))                    
            # mask = (tf.math.argmax(mask,axis=2)[:,:,tf.newaxis]).numpy()
            step_time = time.perf_counter() - start_t
            if step==1:
                print(f"Time taken for model loading and the first inference: {time.perf_counter()-model_load_start}")
            if step >= WARMUP_STEPS:
                step_times.append(step_time)
    except tf.errors.OutOfRangeError:
        pass

    avg_step_time = statistics.mean(step_times)
    print("\nAverage step time: %.1f msec" % (avg_step_time * 1e3))
    print("Average throughput: %d samples/sec" % (
        1 / avg_step_time
    ))

output.txt (698 Bytes)

spolisetty · July 25, 2023, 5:47pm

Hi,

If possible, could you please share with us the original model (not converted to TF-TRT) and time capturing for it for better debugging?

Thank you.

anirudh.badrinath99 · July 26, 2023, 6:49am

Please find the time_capture log and the unconverted TF saved_model zip folder.
output.txt (699 Bytes)
dummynet.zip (38.0 MB)

Thank you very much for your time and effort in helping with this issue!

spolisetty · August 14, 2023, 12:12pm

Hi,

We noticed similar behavior.
Please let us know the steps you took to create the TRT engine and save it.
Trying to load the model without saving it may cause delays due to build time.

https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#quickstart-guide

Thank you.

anirudh.badrinath99 · September 1, 2023, 10:35am

This is how I converted and saved the tf-trt model:

def convert_tf_model(model_path, is_h5=False, precision="FP16"):
    """
    Converts TF SavedModel to a TensorRT optimized SavedModel
    If the model is a H5 file (keras save format), it converts
    it to a Tensorflow SavedModel and then optimises it using 
    TensorRT. 
    """
    K.clear_session()
    if is_h5:
        model = tf.keras.models.load_model(model_path,compile=False)
        model_name = model_path.split(".")[0]+"_SavedModel" #get the model name from the model path of the h5 file
        model_path = os.path.join("../models",model_name)
        model.save(model_path) #cuz converter.convert needs it to be in SavedModel
        print("****The SavedModel for the given H5 file has been saved at****",model_path)
    else:
        model = tf.saved_model.load(model_path)
    if precision=="FP32":
        precision = trt_convert.TrtPrecisionMode.FP32
    elif precision=="FP16":
        precision = trt_convert.TrtPrecisionMode.FP16
    else:
        raise ValueError('Invalid Precision Mode')

    conversion_params = trt_convert.TrtConversionParams(precision_mode=precision)
    converter = trt_convert.TrtGraphConverterV2(input_saved_model_dir=model_path,
                                        conversion_params=conversion_params)
    converter.convert()
    converted_model_path = os.path.join(model_path,"converted_model")
    converter.save(converted_model_path)
    print("*****The converted tfrt model has been saved at ",converted_model_path)

Topic		Replies	Views
TensorRT view the layers that are converted TensorRT tensorrt	1	563	June 11, 2021
Tensorflow inference using TRT converted model TensorRT	10	1050	May 25, 2021
tensorflow.python.framework.errors_impl.OpError: file is too short to be an sstable TensorRT tensorrt , tensorflow , jetson-inference	1	1687	July 28, 2021
No performance improvement for Tensorflow TensorRT model on converted on Jetsons Xavier NX Jetson Xavier NX tensorrt , tensorflow	2	675	October 18, 2021
TF-TRT, why have to create TensorRT engine every time of inference ? TensorRT	7	2631	December 23, 2019
optimizing tf-trt load time Jetson Nano	12	4167	October 15, 2021
How can I optimize Tensorflow models on windows OS? The TF models are saved in the SavedModel format TensorRT	1	311	December 13, 2021
Multi view input prediction with tf-trt model in jetson nano Jetson Nano tensorrt , tensorflow	2	559	October 15, 2021
TF-TRT model very slow to load, with poor performance Jetson Xavier NX tensorrt	6	2150	July 21, 2021
Failure in verifying input shapes: Input shapes are inconsistent on the batch dimension TensorRT	8	1178	July 11, 2021

TF-TRTModel loading time is very slow

Description

Environment

Related topics