Description
I have a working TensorRT code to do inference on a network that accepts two inputs.
I’m trying to modify it so that the network uses on two buffers already existing on the GPU.
The inference results in a result all 0s. This is what I’m doing:
- I reserve two buffers like so:
void* hostDataBuffer0 = this->mpBuffers->getDeviceBuffer(this->pmParams->inputTensorNames[0]);
void* hostDataBuffer1 = this->mpBuffers->getDeviceBuffer(this->pmParams->inputTensorNames[1]);
- I manually copy my data into them:
cudaMemcpy(hostDataBuffer0, mem_ptr0, data_size0, cudaMemcpyDeviceToDevice);
cudaMemcpy(hostDataBuffer1, mem_ptr1, data_size1, cudaMemcpyDeviceToDevice);
- I no longer run:
this->mpBuffers->copyInputToDevice();
Instead I immediately run:
bool status = mContext->executeV2(this->mpBuffers->getDeviceBindings().data());
- The result of the inference is all 0s. What am I doing wrong?
I tried copying the references manually to the executeV2() but it crashes (Aborted, core dumped):
std::vector<void*> tmp;
tmp.push_back(hostDataBuffer0);
tmp.push_back(hostDataBuffer1);
bool execstatus = mContext->executeV2(tmp.data());
This also crashes (Aborted, core dumped):
int inputIndex0 = this->mEngine->getBindingIndex("input1");
int inputIndex1 = this->mEngine->getBindingIndex("input2");
void* buf[2];
buf[inputIndex0] = hostDataBuffer0;
buf[inputIndex1] = hostDataBuffer1;
bool status = this->mContext->executeV2(buf);
Environment
TensorRT Version: TensorRT-8.6.1.6
GPU Type: RTX 4080
CUDA Version: 12.3
Operating System + Version: Ubuntu 22.04