Misaligned Address on repetitive run of IExecutionContext

cmtrhnn · January 4, 2022, 6:28am

Description

Hello,

I am trying to run a TensorRT engine on a video on Jetson AGX platform. I have used one of your sample codes to build and infer the engine on a single image. It works alright. When I wanted to use the infer method repetitively I have seen that the overall time spent in the code was huge. The reason for this was that I was creating the execution context each time I ran the engine.

infer()
{
…
samplesCommon::BufferManager buffers(this->mEngine, mParams.batchSize);
auto context = std::shared_ptrnvinfer1::IExecutionContext(this->mEngine->createExecutionContext(), samplesCommon::InferDeleter());
buffers.copyInputToDevice();
bool status = context->executeV2(buffers.getDeviceBindings().data());
buffers.copyOutputToHost();
…
}

and so on.

This works for single image. But createExecutionContext() from the engine takes 20-30 milliseconds so I tried to define it outside the infer method. The cascades of failures lead me to one such test code that does this:

samplesCommon::BufferManager buffers(this->mEngine, mParams.batchSize);
auto context = std::shared_ptrnvinfer1::IExecutionContext(this->mEngine->createExecutionContext(), samplesCommon::InferDeleter());
buffers.copyInputToDevice(); //I ALSO TRIED THIS INSIDE THE LOOP, RESULT=SAME
while(1)
{
std::cout beginning
bool status = context->executeV2(buffers.getDeviceBindings().data());
buffers.copyOutputToHost();
std::cout ending
}

And then all hell broke loose after the first loop. First execution passes, beginning and ending prints come thru, the errors flow after the second execution of context.

[01/04/2022-08:52:53] [E] [TRT] …/rtExt/cuda/pointwiseV2Helpers.h (538) - Cuda Error in launchPwgenKernel: 716 (misaligned address)
[01/04/2022-08:52:53] [E] [TRT] FAILED_EXECUTION: std::exception
[01/04/2022-08:52:53] [E] [TRT] engine.cpp (179) - Cuda Error in ~ExecutionContext: 716 (misaligned address)
[01/04/2022-08:52:53] [E] [TRT] INTERNAL_ERROR: std::exception
[01/04/2022-08:52:53] [E] [TRT] Parameter check failed at: …/rtSafe/safeContext.cpp::terminateCommonContext::155, condition: cudnnDestroy(context.cudnn) failure.
[01/04/2022-08:52:53] [E] [TRT] Parameter check failed at: …/rtSafe/safeContext.cpp::terminateCommonContext::165, condition: cudaEventDestroy(context.start) failure.
[01/04/2022-08:52:53] [E] [TRT] Parameter check failed at: …/rtSafe/safeContext.cpp::terminateCommonContext::170, condition: cudaEventDestroy(context.stop) failure.
[01/04/2022-08:52:53] [E] [TRT] …/rtSafe/safeRuntime.cpp (32) - Cuda Error in free: 716 (misaligned address)

I tried using cudaDeviceSynchronize(); from another suggestion on a similar topic, it changed nothing. Tried unique_ptr and shared_ptr on context, didn’t change a thing. So my questions are:

Is it not possible to use a context more than once?
What is the solution to this situation. The example Python codes seem to pass context from method to method over and over and they seem to work fine. So what makes C++ code any different?

Please do not copy paste me links of educational sources, they don’t solve my problems, ever.

Thanks in advance,
Cem

Environment

TensorRT Version: 7.1.3
GPU Type: Jetson AGX
CUDA Version: 10.2
Operating System + Version: Ubuntu 18.04

NVES · January 4, 2022, 6:38am

Hi,
Please refer to the below link for Sample guide.

Refer to the installation steps from the link if in case you are missing on anything

However suggested approach is to use TRT NGC containers to avoid any system dependency related issues.

In order to run python sample, make sure TRT python packages are installed while using NGC container.
/opt/tensorrt/python/python_setup.sh

In case, if you are trying to run custom model, please share your model and script with us, so that we can assist you better.
Thanks!

cmtrhnn · January 4, 2022, 7:27am

Hello @NVES

Is there an official customer support where I can file cases for my problems? This forum is going to give me cancer.

The question is really simple why does it fail when I run IExecutionContext->executeV2 two times.

copyDataToDevice
executeV2
copyDataFromDevice

copySameDataToDevice
executeV2
→ Fails here

cmtrhnn · January 4, 2022, 12:55pm

executeV2() has some problems to it. Devs need to make the code less susceptible to parameter errors.

execute() was giving an error about batchSize == 0 || batchSize <= Engine->getMaxBatchSize .

Removed builder->setMaxBatchSize() flag from builder. Built the engine again.

Now execute() works as well as executeV2().

You can remove that from example code in this line yolov4_deepstream/SampleYolo.cpp at master · NVIDIA-AI-IOT/yolov4_deepstream · GitHub

or make batchNumber 1 instead of 0 in this line yolov4_deepstream/main.cpp at master · NVIDIA-AI-IOT/yolov4_deepstream · GitHub
But it will make this line problematic, cause Common/BufferManager will fail at this assertion: assert(engine->hasImplicitBatchDimension() || mBatchSize == 0); yolov4_deepstream/SampleYolo.cpp at master · NVIDIA-AI-IOT/yolov4_deepstream · GitHub

If this repo does not belong to Nvidia please file a complaint, they carry your logo and address, if it belongs to Nvidia please moderate.

You are welcome,
Cem

Topic		Replies	Views
How to use TensorRT by the multi-threading package of python Jetson AGX Xavier tensorrt	13	18710	October 18, 2021
[TensorRT] engine happed a error in multithreaded TensorRT tensorrt , cuda	2	1557	January 19, 2023
TensorRT inference context in ROS callback TensorRT tensorrt , cuda	13	2549	January 8, 2023
Multiple calls of enqueueV2 TensorRT	15	2157	September 19, 2021
Adding multiple inference on TensorRT (Invalid Resource Handle Error) TensorRT	2	1708	December 4, 2019
TensorRT v21.12-py3 Docker image cannot work with GPU option on ARM (AGX) device TensorRT tensorrt	13	2517	February 18, 2024
TensorRT do_inference error TensorRT	19	8404	November 14, 2022
Work with batch in TensorRT TensorRT tensorrt , opencv , cuda , tensorflow	20	3826	July 20, 2021
context or other operations about cuda is blocking ? TensorRT	14	1610	January 24, 2019
TensorRT inference result of one image don't keep the same in high qps TensorRT tensorrt	1	603	June 29, 2022

Misaligned Address on repetitive run of IExecutionContext

Description

Environment

Related topics