@Beerend Thanks for your detailed reply. I have been experimenting with the interface with the simple mnist onnx example to better understand the basics of the inference engine. This example runs a synchronous inference, which is slightly different from your application.
The original program works as follows. An onnx model was defined and imported from a file, and an engine was built was from that:
mEngine = std::shared_ptr<nvinfer1::ICudaEngine>(
builder->buildEngineWithConfig(*network, *config), samplesCommon::InferDeleter());
From this a buffer manager object and execution context is created:
// Create RAII buffer manager object
samplesCommon::BufferManager buffers(mEngine, mParams.batchSize);
auto context = SampleUniquePtr<nvinfer1::IExecutionContext>(mEngine->createExecutionContext());
From here an input is read from a text file and put onto the host buffer (on CPU side to my understanding):
readPGMFile(locateFile(std::to_string(mNumber) + ".pgm", mParams.dataDirs), fileData.data(), inputH, inputW)
float* hostDataBuffer = static_cast<float*>(buffers.getHostBuffer(mParams.inputTensorNames[0]));
for (int i = 0; i < inputH * inputW; i++)
{
hostDataBuffer[i] = 1.0 - float(fileData[i] / 255.0);
}
Then the input data is copied to the gpu, inference is run, and the output data is copied from gpu to cpu:
buffers.copyInputToDevice();
bool status = context->executeV2(buffers.getDeviceBindings().data());
if (!status)
{
return false;
}
buffers.copyOutputToHost();
In the above, āstatusā returns as true. So with this as a baseline, I have been doing the following experimentation.
I already have loaded the same data into the GPU with pointer name āgpuTensorInputā and have set a desired output with pointer name ātensorOutputā.
I have first looked at the buffer object:
ābuffersā is a void** pointer with two pointers in the mDeviceBindings
- 0x0000000c04800e00 : location of input
- 0x0000000c04801c00 : location of output
As a test, I ran the following lines
auto a = buffers.getDeviceBindings().data();
auto b = buffers.getDeviceBindings();
I saw the following:
a is a void** that holds the address of the tensor input 0x0000000c04800e00
b is a vector of pointers that holds the addresses of the tensor input (0x0000000c04800e00) and output (0x0000000c04801c00).
From here, I tried implementing various approaches. First, I used this link as a guide: Developer Guide :: NVIDIA Deep Learning TensorRT Documentation
int inputIndex = mEngine->getBindingIndex("Input3");
int outputIndex = mEngine->getBindingIndex("Plus214_Output_0");
void* buf[2];
buf[0] = &gpuTensorInput;
buf[1] = &tensorOutput;
bool status = context->executeV2(buf);
This did not work, and āstatusā returned false.
I also tried experimenting with the following:
auto c = buffers.getDeviceBuffer(mParams.inputTensorNames[0]);
c = &gpuTensorInput;
bool status = context->executeV2(&c);
This also did not work and returned false for āstatusā.
I also tried:
buffers.getDeviceBindings()[0] = &gpuTensorInput;
buffers.getDeviceBindings()[1] = &tensorOutput;
bool status = context->executeV2(buffers.getDeviceBindings().data());
But I got the error: TRT] C:\source\rtExt\engine.cpp (902) - Cuda Error in nvinfer1::rt::ExecutionContext::executeInternal: 700 (an illegal memory access was encountered)
I believe I have to basically try to manipulate the pointer values in ābufferā but I am unsure how to do so.