Description
Hi,
I’m having trouble running inference with batch size > 1.
I’m building the network from Resnet-50 ONNX, loading it into my C++ project. When running inference with batch_size=1 everything is fine. When running inference with batch _size >1 I get empty output buffer for inference index 1,2,etc’ - although inference for index 0 is fine.
I’ve built the network with maximum batch of batch_size=5:
builder->setMaxBatchSize(batch_size);
I’ve assigned input / output buffers for batch_size images:
for (size_t i = 0; i < engine->getNbBindings(); ++i)
{
auto binding_size = getSizeByDim(engine->getBindingDimensions(i)) * batch_size * sizeof(float);
cudaMalloc(&buffers[i], binding_size);
if (engine->bindingIsInput(i))
{
input_dims.emplace_back(engine->getBindingDimensions(i));
}
else
{
output_dims.emplace_back(engine->getBindingDimensions(i));
}
}
I’ve activated the enqueue API with batch_size of 5:
context->enqueue(batch_size, buffers.data(), localStream, nullptr);
I’m reading enough of the output results:
std::vector cpu_output(getSizeByDim(dims) * batch_size);
cudaMemcpy(cpu_output.data(), gpu_output, cpu_output.size() * sizeof(float), cudaMemcpyDeviceToHost);
I’ve read a few posts on the topic of running inference of several images at a time, and couldn’t locate the issue in my code yet - assistance will be appreciated.
imagenet_classes.txt (21.2 KB) SampleFlow.cpp (17.0 KB)
Environment
Windows 10
TensorRT Version: 7.2.1.6.Windows10.x86_64.cuda-10.2.cudnn8.0
GPU Type: QUADRO M2000M
Nvidia Driver Version: 26.21.14.4122
CUDA Version: 10.2
CUDNN Version: cudnn-10.2-windows10-x64-v8.0.5.39
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered