Inference on buffers already existing on GPU


I have a working TensorRT code to do inference on a network that accepts two inputs.

I’m trying to modify it so that the network uses on two buffers already existing on the GPU.

The inference results in a result all 0s. This is what I’m doing:

  1. I reserve two buffers like so:
void* hostDataBuffer0 = this->mpBuffers->getDeviceBuffer(this->pmParams->inputTensorNames[0]);
void* hostDataBuffer1 = this->mpBuffers->getDeviceBuffer(this->pmParams->inputTensorNames[1]);
  1. I manually copy my data into them:
cudaMemcpy(hostDataBuffer0, mem_ptr0, data_size0, cudaMemcpyDeviceToDevice);
cudaMemcpy(hostDataBuffer1, mem_ptr1, data_size1, cudaMemcpyDeviceToDevice);
  1. I no longer run:

Instead I immediately run:

bool status = mContext->executeV2(this->mpBuffers->getDeviceBindings().data());
  1. The result of the inference is all 0s. What am I doing wrong?

I tried copying the references manually to the executeV2() but it crashes (Aborted, core dumped):

std::vector<void*> tmp;
bool execstatus = mContext->executeV2(;

This also crashes (Aborted, core dumped):

int inputIndex0 = this->mEngine->getBindingIndex("input1");
int inputIndex1 = this->mEngine->getBindingIndex("input2");

void* buf[2];
buf[inputIndex0] = hostDataBuffer0;
buf[inputIndex1] = hostDataBuffer1;
bool status = this->mContext->executeV2(buf);


TensorRT Version: TensorRT-
GPU Type: RTX 4080
CUDA Version: 12.3
Operating System + Version: Ubuntu 22.04

It truned out the above works fine. My onnx is what’s faulty. Thanks.

