Slow inference UNet Industrial TF-TRT

Description

Hi,
I am new to NVIDIA tools, I am running the notebook example of training, test and export to TF-TRT UNet Industrial on DAGM dataset provided at UNet Industrial Inference Demo with TF-TRT.

I was able to do all the pipeline and exported the checkpoint to inference with TF-TRT. I am running the inference in one image with the following command:

# inference with save TF-TRT model
hvd.init()
config = tf.ConfigProto()
config.gpu_options.allow_growth=True

start = time()
with tf.Session(graph=tf.Graph(), config=config) as sess:
        tf.saved_model.loader.load(
            sess, [tf.saved_model.tag_constants.SERVING], SAVED_MODEL_DIR)
        nodes = [n.name for n in tf.get_default_graph().as_graph_def().node]
        #print(nodes)
        output = sess.run(["UNet_v1/sigmoid:0"], feed_dict={"input:0": img})
print(f'Time spent: {time() - start}')

It was supposed to be very fast (since that is the premise of doing inference on NGC Containers), but it takes 60~70 seconds to run the inference.

What can I do to speed up this time?
Is there another way to load and predict my .pb model?
The model is attached in zip file.
TR-TRT-model-FP32.zip (6.5 MB)

Environment

GPU Type: NVidia RTX A6000
Container: ```
docker build . --rm -t unet_industrial:latest

Hi,

Request you to share the model, script, profiler, and performance output if not shared already so that we can help you better.

Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer to the below links for more details:

Thanks!