Using TRT with NvEnc fails on CUDNN_STATUS_MAPPING_ERROR

Hello,

I am trying to build a pipeline like this: Camera → getImage → TRTDetectionYolo → NVEnc-Encode to file.

When I launch this pipeline, it fails on CUDNN_STATUS_MAPPING_ERROR.

I am using latest TRT, latest Cuda + Cudnn, latest Nvidia Video Codec.

The problem is, that when I do Just Camera → getImage → TRTDetectionYolo. It works
and what I do this pipeline Camera → getImage → NVEnc-Encode to file It works.

So for some reason it can not run together. For TRTDetectionYolo I am using cudaHostRegister, but tried to use cudaHostAlloc and cudaMallocManaged but achieved same error.

The error is thrown by TRT->executeV2 method.

NvEnc uses NVEncoderCuda provided by Nvidia.

Can you see a reason, why executing TRT and NvEnc together is resulting in Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR) ? Each encoder has its own cudaContext. TRT has its own cudaEngine. All is ran on same graphic.

[2020-08-21 10:51:28.872][encoder][error][thread 2784]: Encode frame fail error GetEncodedPacket : m_nvenc.nvEncLockBitstream(m_hEncoder, &lockBitstreamData) returned error 8 at ../../video-library/VideoLib/encoder/NvEncoder.cpp:493
.
[2020-08-21 10:51:28.872][trt][error][thread 2785]: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
[2020-08-21 10:51:28.873][trt][error][thread 2785]: FAILED_EXECUTION: std::exception

Edit: It fails even, when TRT is on GPU 0 and NvEnc on GPU 1.

Edit2: The initialize of the encoder is enough to cause this problem.

I figured it out. The encoder was creating CudaContext by calling cuCtxCreate. But the context was already created by TRT engine. So the mapped memory was lost (because I think that the context was overwrited on the stack).

I changed the creating to cuDevicePrimaryCtxRetain. So it receives already existing context or creates new one. The pipeline started working =).

If I understood something wrong, please do explain.

Primary context is a context by device. Context (by cuCtxCreate) is per thread. So Creating new context over existing (cuCtxCreate) will possibly destroy the previous one and creates new one, so the information about memory mapping is lost, that why I got CUDNN_STATUS_MAPPING_ERROR.