Description
I am converting my tensorflow SavedModel to TF-TRT using the following code and I am saving the converted model to SavedModel format.
“”"
if precision==“FP32”:
precision = trt_convert.TrtPrecisionMode.FP32
elif precision==“FP16”:
precision = trt_convert.TrtPrecisionMode.FP16
else:
raise ValueError(‘Invalid Precision Mode’)
conversion_params = trt_convert.TrtConversionParams(precision_mode=precision)
converter = trt_convert.TrtGraphConverterV2(input_saved_model_dir=model_path,
conversion_params=conversion_params)
converter.convert()
converted_model_path = os.path.join(model_path,"converted_model")
converter.save(converted_model_path)
print("*****The converted tfrt model has been saved at ",converted_model_path)
“”"
The issue is that the load time is very high (around 1 min) when I try to load the converted model in the SavedModel format. My system spec is Intel Xeon silver 32 cores, 32GB RAM, RTX 3030ti.
Environment
TensorRT Version: 7.2.1.6
GPU Type: RTX 3080ti
Nvidia Driver Version: 470.103.01
CUDA Version: 11.4
CUDNN Version:
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable): 2.3.1
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): http://nvcr.io/nvidia/tensorflow:20.11-tf2-py3
Hi,
Request you to share the model, script, profiler, and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.
While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer to the below links for more details:
Thanks!
@AakankshaS I don’t have any issues with the throughput of the tf-trt model. The only problem that I face is that the load time of the serialized tf_trt-converted SavedModel is very high.
Hi,
We recommend that you please try the latest TensorRT version 8.6.1.
If you still face this issue, please share with us minimal issue repro model, scripts and steps.
Thank you.
def performance_testing(model_path):
root = tf.saved_model.load(model_path)
infer = root.signatures['serving_default']
output_tensorname = list(infer.structured_outputs.keys())[0]
INFERENCE_STEPS = 2000
WARMUP_STEPS = 300
features = tf.random.uniform((1,320,544,3))
print(f"**********{features.dtype}************")
try:
step_times = list()
step = 1
for step in range(1, INFERENCE_STEPS + 1):
if step % 100 == 0:
print("Processing step: %04d ..." % step)
start_t = time.perf_counter()
mask = infer(features)[output_tensorname].numpy()
step_time = time.perf_counter() - start_t
if step >= WARMUP_STEPS:
step_times.append(step_time)
except tf.errors.OutOfRangeError:
pass
avg_step_time = statistics.mean(step_times)
print("\nAverage step time: %.1f msec" % (avg_step_time * 1e3))
print("Average throughput: %d samples/sec" % (
1 / avg_step_time
))
Here’s the code to replicate my issue. It loads the serialized tf-trt SavedModel and then runs forward passes. It takes a minimum of 40 seconds to load the model. I have attached a converted Savedmodel that can be loaded and run with this code
converted_model.zip (75.8 MB)
Hi,
Could you please share the output of the above script on your system.
Thank you.
Please find the attached output file.
I added just one extra print line that prints the time taken from the start of the script up until the first forward pass, just to give you more context.
def performance_testing(model_path):
model_load_start = time.perf_counter()
root = tf.saved_model.load(model_path)
infer = root.signatures['serving_default']
output_tensorname = list(infer.structured_outputs.keys())[0]
INFERENCE_STEPS = 2000
WARMUP_STEPS = 300
features = tf.random.uniform((1,320,544,3))
print(f"**********{features.dtype}************")
try:
step_times = list()
step = 1
for step in range(1, INFERENCE_STEPS + 1):
if step % 100 == 0:
print("Processing step: %04d ..." % step)
start_t = time.perf_counter()
# predict_and_average_masks(infer,features)
mask = infer(features)[output_tensorname].numpy()
# mask = tf.nn.softmax(tf.squeeze(mask))
# mask = (tf.math.argmax(mask,axis=2)[:,:,tf.newaxis]).numpy()
step_time = time.perf_counter() - start_t
if step==1:
print(f"Time taken for model loading and the first inference: {time.perf_counter()-model_load_start}")
if step >= WARMUP_STEPS:
step_times.append(step_time)
except tf.errors.OutOfRangeError:
pass
avg_step_time = statistics.mean(step_times)
print("\nAverage step time: %.1f msec" % (avg_step_time * 1e3))
print("Average throughput: %d samples/sec" % (
1 / avg_step_time
))
output.txt (698 Bytes)
Hi,
If possible, could you please share with us the original model (not converted to TF-TRT) and time capturing for it for better debugging?
Thank you.
Please find the time_capture log and the unconverted TF saved_model zip folder.
output.txt (699 Bytes)
dummynet.zip (38.0 MB)
Thank you very much for your time and effort in helping with this issue!
Hi,
We noticed similar behavior.
Please let us know the steps you took to create the TRT engine and save it.
Trying to load the model without saving it may cause delays due to build time.
https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#quickstart-guide
Thank you.
This is how I converted and saved the tf-trt model:
def convert_tf_model(model_path, is_h5=False, precision="FP16"):
"""
Converts TF SavedModel to a TensorRT optimized SavedModel
If the model is a H5 file (keras save format), it converts
it to a Tensorflow SavedModel and then optimises it using
TensorRT.
"""
K.clear_session()
if is_h5:
model = tf.keras.models.load_model(model_path,compile=False)
model_name = model_path.split(".")[0]+"_SavedModel" #get the model name from the model path of the h5 file
model_path = os.path.join("../models",model_name)
model.save(model_path) #cuz converter.convert needs it to be in SavedModel
print("****The SavedModel for the given H5 file has been saved at****",model_path)
else:
model = tf.saved_model.load(model_path)
if precision=="FP32":
precision = trt_convert.TrtPrecisionMode.FP32
elif precision=="FP16":
precision = trt_convert.TrtPrecisionMode.FP16
else:
raise ValueError('Invalid Precision Mode')
conversion_params = trt_convert.TrtConversionParams(precision_mode=precision)
converter = trt_convert.TrtGraphConverterV2(input_saved_model_dir=model_path,
conversion_params=conversion_params)
converter.convert()
converted_model_path = os.path.join(model_path,"converted_model")
converter.save(converted_model_path)
print("*****The converted tfrt model has been saved at ",converted_model_path)