Description
Hello,
I am trying to run a TensorRT engine on a video on Jetson AGX platform. I have used one of your sample codes to build and infer the engine on a single image. It works alright. When I wanted to use the infer method repetitively I have seen that the overall time spent in the code was huge. The reason for this was that I was creating the execution context each time I ran the engine.
infer()
{
…
samplesCommon::BufferManager buffers(this->mEngine, mParams.batchSize);
auto context = std::shared_ptrnvinfer1::IExecutionContext(this->mEngine->createExecutionContext(), samplesCommon::InferDeleter());
buffers.copyInputToDevice();
bool status = context->executeV2(buffers.getDeviceBindings().data());
buffers.copyOutputToHost();
…
}
and so on.
This works for single image. But createExecutionContext() from the engine takes 20-30 milliseconds so I tried to define it outside the infer method. The cascades of failures lead me to one such test code that does this:
samplesCommon::BufferManager buffers(this->mEngine, mParams.batchSize);
auto context = std::shared_ptrnvinfer1::IExecutionContext(this->mEngine->createExecutionContext(), samplesCommon::InferDeleter());
buffers.copyInputToDevice(); //I ALSO TRIED THIS INSIDE THE LOOP, RESULT=SAME
while(1)
{
std::cout beginning
bool status = context->executeV2(buffers.getDeviceBindings().data());
buffers.copyOutputToHost();
std::cout ending
}
And then all hell broke loose after the first loop. First execution passes, beginning and ending prints come thru, the errors flow after the second execution of context.
[01/04/2022-08:52:53] [E] [TRT] …/rtExt/cuda/pointwiseV2Helpers.h (538) - Cuda Error in launchPwgenKernel: 716 (misaligned address)
[01/04/2022-08:52:53] [E] [TRT] FAILED_EXECUTION: std::exception
[01/04/2022-08:52:53] [E] [TRT] engine.cpp (179) - Cuda Error in ~ExecutionContext: 716 (misaligned address)
[01/04/2022-08:52:53] [E] [TRT] INTERNAL_ERROR: std::exception
[01/04/2022-08:52:53] [E] [TRT] Parameter check failed at: …/rtSafe/safeContext.cpp::terminateCommonContext::155, condition: cudnnDestroy(context.cudnn) failure.
[01/04/2022-08:52:53] [E] [TRT] Parameter check failed at: …/rtSafe/safeContext.cpp::terminateCommonContext::165, condition: cudaEventDestroy(context.start) failure.
[01/04/2022-08:52:53] [E] [TRT] Parameter check failed at: …/rtSafe/safeContext.cpp::terminateCommonContext::170, condition: cudaEventDestroy(context.stop) failure.
[01/04/2022-08:52:53] [E] [TRT] …/rtSafe/safeRuntime.cpp (32) - Cuda Error in free: 716 (misaligned address)
I tried using cudaDeviceSynchronize(); from another suggestion on a similar topic, it changed nothing. Tried unique_ptr and shared_ptr on context, didn’t change a thing. So my questions are:
- Is it not possible to use a context more than once?
- What is the solution to this situation. The example Python codes seem to pass context from method to method over and over and they seem to work fine. So what makes C++ code any different?
Please do not copy paste me links of educational sources, they don’t solve my problems, ever.
Thanks in advance,
Cem
Environment
TensorRT Version: 7.1.3
GPU Type: Jetson AGX
CUDA Version: 10.2
Operating System + Version: Ubuntu 18.04