Slow inference UNet Industrial TF-TRT


I am new to NVIDIA tools, I am running the notebook example of training, test and export to TF-TRT UNet Industrial on DAGM dataset provided at UNet Industrial Inference Demo with TF-TRT.

I was able to do all the pipeline and exported the checkpoint to inference with TF-TRT. I am running the inference in one image with the following command:

# inference with save TF-TRT model
config = tf.ConfigProto()

start = time()
with tf.Session(graph=tf.Graph(), config=config) as sess:
            sess, [tf.saved_model.tag_constants.SERVING], SAVED_MODEL_DIR)
        nodes = [ for n in tf.get_default_graph().as_graph_def().node]
        output =["UNet_v1/sigmoid:0"], feed_dict={"input:0": img})
print(f'Time spent: {time() - start}')

It was supposed to be very fast (since that is the premise of doing inference on NGC Containers), but it takes 60~70 seconds to run the inference.

What can I do to speed up this time?
Is there another way to load and predict my .pb model?
The model is attached in zip file. (6.5 MB)


GPU Type: NVidia RTX A6000
Container: ```
docker build . --rm -t unet_industrial:latest


Request you to share the model, script, profiler, and performance output if not shared already so that we can help you better.

Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer to the below links for more details: