I am experimenting with object recognition, I have been using an existing project that builds a network from TensorRT APIs and then uses that to do inference. The code that does the inference is here.
I have had success using the sample code with BATCH_SIZE set to the default (1), on an RTX3070 I have consistently been getting 17ms inference time (compute capability 8.6, 5888 CUDA cores ) and on a Quadro K620 (compute capability 5.0, 384 CUDA cores) I am getting around 199ms. I am happy with this performance.
I did some reading about TensorRT and according to the documentation “A batch of inputs identical in shape and size can be computed on different layers of the neural network in parallel”, which suggests that a batch might take a similar time as a single inference if inference in the GPU is done in parallel.
I tested this by setting BATCH_SIZE = 8. I regenerated the engine file and ran again. But in this case inference was taking 100ms for the RTX3070 and 1447ms for the K620 (the code outputs inferencing time in ms). I had Task Manager open and for the selected GPU it showed no more than about 6% utilization during the actual inferencing. I suppose I was expecting to see significant usage of the GPU if there is parallelizing during inferencing.
Is this result expected? The K620 takes nearly the same time to do one run at BATCH_SIZE=8 as as it takes to do 8 separate BATCH_SIZE=1 inference runs, while the RTX3070 did only slightly better, taking about 6 times as long as a single inference.
Is there anything that I could do to improve this? The project moderator indicated that his results were similar with changing batch size.
TensorRT Version: 18.104.22.168
GPU Type: RTX 3070 and Quadro K620
Nvidia Driver Version: 22.214.171.12411
CUDA Version: cuda_11.1.1_456.81
CUDNN Version: cudnn-11.2-windows-x64-v126.96.36.199
Operating System + Version: Windows 10 Pro build 21H1
Environment: Visual Studio 2017, C++ project
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
- Exact steps/commands to build your repro
Here is the guide I followed to build the project in Windows.
- Exact steps/commands to run your repro
First run with BATCH_SIZE=1 to build the engine and get a baseline for inference time (using command line parameters as shown below)
Set BATCH_SIZE=8, recompile, run with command line parameters " -s test1.wts test1.engine l" to build engine file
Then run again to inference with command line parameters “-d test1.engine ./SAMPLES” where SAMPLES is a folder of jpg images to infer with
I can provide VS project if required.
- Full traceback of errors encountered