Inference on buffers already existing on GPU


I have a working TensorRT code to do inference on a network that accepts two inputs.

I’m trying to modify it so that the network uses on two buffers already existing on the GPU.

The inference results in a result all 0s. This is what I’m doing:

  1. I reserve two buffers like so:
void* hostDataBuffer0 = this->mpBuffers->getDeviceBuffer(this->pmParams->inputTensorNames[0]);
void* hostDataBuffer1 = this->mpBuffers->getDeviceBuffer(this->pmParams->inputTensorNames[1]);
  1. I manually copy my data into them:
cudaMemcpy(hostDataBuffer0, mem_ptr0, data_size0, cudaMemcpyDeviceToDevice);
cudaMemcpy(hostDataBuffer1, mem_ptr1, data_size1, cudaMemcpyDeviceToDevice);
  1. I no longer run:

Instead I immediately run:

bool status = mContext->executeV2(this->mpBuffers->getDeviceBindings().data());
  1. The result of the inference is all 0s. What am I doing wrong?

I tried copying the references manually to the executeV2() but it crashes (Aborted, core dumped):

std::vector<void*> tmp;
bool execstatus = mContext->executeV2(;

This also crashes (Aborted, core dumped):

int inputIndex0 = this->mEngine->getBindingIndex("input1");
int inputIndex1 = this->mEngine->getBindingIndex("input2");

void* buf[2];
buf[inputIndex0] = hostDataBuffer0;
buf[inputIndex1] = hostDataBuffer1;
bool status = this->mContext->executeV2(buf);


TensorRT Version: TensorRT-
GPU Type: RTX 4080
CUDA Version: 12.3
Operating System + Version: Ubuntu 22.04

It truned out the above works fine. My onnx is what’s faulty. Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.