Description
I am trying to enqueue a a inference task to the ExecutionContext but receive the following error:
safeContext.cpp (184) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
This happens with both enqueue and enqueueV2.
Synchronous execution ( execute() ) works without producing an error.
As recommended in the best practices, I deserialize the engine from file, which is a modified YoloV3 loaded from onnx.
I only have 1 engine and 1 ExecutionContext at the moment, but they donāt run on the applications main thread.
Could anyone point me in the right direction?
Environment
TensorRT Version: 7.2.1
GPU Type: GTX 1070 Driver Version: 455.38
Nvidia Driver Version:
CUDA Version: 11.1
CUDNN Version: Unsure using an official container image
Operating System + Version: Ubuntu 18.04
Container: nvcr.io/nvidia/tensorrt:20.10-py3
Working code
cudaMemcpyAsync(deviceBuffer[0], p_data,
context.binding.deviceBuffer[0].getSize(),
cudaMemcpyHostToDevice, context.stream);
if (context.context->execute(p_batchSize, &deviceBuffer[0]) != true) {
LOG(ERROR) << "SyncInference failed!";
}
for (auto i = 1; i < deviceBuffer.size(); ++i) {
cudaMemcpyAsync(context.binding.hostBuffer[i].get(), deviceBuffer[i],
context.binding.hostBuffer[i].getSize(),
cudaMemcpyDeviceToHost, context.stream);
}
Non working
cudaMemcpyAsync(deviceBuffer[0], p_data,
context.binding.deviceBuffer[0].getSize(),
cudaMemcpyHostToDevice, context.stream);
context.context->enqueue(p_batchSize, &deviceBuffer[0], context.stream, nullptr);
for (auto i = 1; i < deviceBuffer.size(); ++i) {
cudaMemcpyAsync(context.binding.hostBuffer[i].get(), deviceBuffer[i],
context.binding.hostBuffer[i].getSize(),
cudaMemcpyDeviceToHost, context.stream);
}