Correctly parallelise preprocessing and inference of a batch with a U-Net model on RTX5000 GPU

ar_djef · March 5, 2024, 9:57am

Hello everyone,
I’m having a runtime problem with my U-Net Model because the image is very large.
I need help to reduce this time.

1- The model used was save with keras in python (tensorflow 2.10 was used)
python: model.save(path). → It consists of 2 folders (“assets”, “variables”) and 2 files (“keras_metadata.pb”, “saved_model.pb”)
2- The model is loaded in c++ with cppflow (c++: model = new cppflow::model) and the inference is performed (“libtensorflow-gpu-windows-x86_64-2.10.0” is used instead of installing tensorflow)
3- The aim is to crop a large image to obtain a batch of 400 images, each of size 256x256x3, and predict with the previous model
4- Only the time taken to crop and convert cv::Mat to cppflow::tensor is about 800 milliseconds.
c++: cv::Mat flat = src.reshape(1, src.total() * src.channels());
std::vector img_data = flat ;
const cppflow::tensor input(img_data, { 400, 256, 256, 3});
5- The inference time is around 1300 milliseconds.
c++: auto output = (*model)(tensor)

I have the impression that the images in the batch are not really run in parallel because this time varies considerably depending on the size of the batch.

I use cuda 11.2 and cudnn 8.101

Topic		Replies	Views
Run inference on a batch of images & parallel inference using cuda on python threads TensorRT tensorrt , cuda	6	2286	January 6, 2022
Slow inference speed on RTX 3080 Frameworks cuda , tensorflow	0	1106	November 17, 2020
How can I improve my prediction performance in TenserRt 3.0? TensorRT	3	915	April 26, 2018
Latency when running TensorRT engine on two GPU TensorRT	9	1230	August 24, 2020
Unable to do inference of multiple engines in parallel TensorRT tensorrt , nano	3	1709	May 6, 2022
How can I optimize multi-batch and parallel inference in TensorRT for faster performance on high-resolution image patches? TensorRT tensorrt , cuda , ubuntu , python , cudnn , deep-learning	2	89	December 2, 2024
Batch Inference using BatchSize=8 takes nearly as long as 8 individual runs of BatchSize=1 TensorRT	3	1177	July 20, 2021
When using tensorrt's c++ API for inference under 3060 graphics card, the speed of loading the first picture is very slow TensorRT tensorrt , cuda	0	449	May 30, 2022
Optimal Trt inference using threads/processes for peoplenet model for Triton Inference Server - archived tensorrt , inference-server-triton , a100	1	1150	July 30, 2021
Terrible scaling behavior of TensorRT using C++ API TensorRT tensorrt , cudnn	5	36	March 6, 2025

Correctly parallelise preprocessing and inference of a batch with a U-Net model on RTX5000 GPU

Related topics