Hi,
I am trying to run inference on multiple images using TensorRT API.
Pseudo Code snippet for my application is
context.enqueue(batchSize, buffers, stream, nullptr);
Here,
buffers[0] = batchSize * INPUT_C * INPUT_H * INPUT_W
buffers[1] = batchSize * outputSize
If I run with batchSize=1 , I get correct output but with batchSize > 1 , detection for images other than first image are wrong.
Further, with batchSize=1, inference time is 7ms and for batchSize=3 it is around 16ms. So solving this issue will give a lot of boost to my application and as in general.
Can someone please suggest me what all things I can try to solve this issue.
I am allocating memory to “buffers” this way
for (int b = 0; b < engine.getNbBindings(); b++)
{
DimsCHW dims = static_cast<DimsCHW&&>(engine.getBindingDimensions(b));
size_t size = batchSize * dims.c() * dims.h() * dims.w() * sizeof(float);
std::cout << "size of buff = " << size << std::endl;
CudaCHECK(cudaMalloc(&buffers[b], size));
}
Is it supposed to be a 1D vector or 2D vector.
Thanks!