Multiple decoder instances causes CUDA API Error.

Board : GTX 1060 6GiB
Driver: 398.36
OS : Windows 10 x64 Pro 2018 April.

Repeatedly creating and releasing multiple HEVC decoders instances may cause the subsequent CUDA API to terminate abnormally.
There was no problem with the driver before 391.35.
Please tell me how to handle multiple decoder instances properly with the latest driver.

Hi Masa-Tam,

Could you provide us a sample code reproducing this issue?

You can send the sample code to this email address:

video-sdk-feedback@nvidia.com

Thanks,
Ryan Park

Hi Masa-Tam,

We received the sample code. Thanks for sending it to us.

It seems due to some reason or other you are running out of video memory.

Could you provide us some additional info:

  1. Would it be possible to explain your use-case? We would be interested to know as why you are facing the need of having to create and destroy multiple decoder instances.
  2. Do you mean the exactly same application which created and destroyed multiple decoder instances did not fail on 391.35 but fails 398.36. Please confirm this as this may be a valuable data point to aid our investigation.
  3. Please ensure that the decoder instances being created are being released when they are not needed anymore?
  4. Please refer to section 4.8 “..\Video_Codec_SDK_8.2.15\doc\NVDEC_VideoDecoder_API_ProgGuide” in the SDK package. Would like to draw your attention to following tips listed over there which can be used for reducing memory foot print when dealing with multiple instance decoding.

While decoding multiple streams it is recommended to create a single CUDA context and share it across multiple decode sessions. This saves the memory overhead associated with the CUDA context creation.

In the use cases where there is frequent change of decode resolution and/or post processing parameters, it is recommended to use cuvidReconfigureDecoder() instead of destroying the existing decoder instance and recreating a new one.

Thanks,
Ryan Park

Hi Ryan san,

  1. Since my application is a video editing software, it is necessary to obtain video from multiple decoders from multiple HEVC streams.
  2. My application will generate errors only by changing the driver. 391.35 is normal, fails at 398.36.
  3. The decoder instance is managed by the implementation of the IUnknown inheritance interface, and it uses lifetime management by using CComPtr. I think that Release () is called at the end of decoding and that the destructor was running and surely released. I confirmed it on the debugger.
  4. When inputting to the decoder, analysis is performed on the elementary stream to input in AU units, and the decoded video data is transferred to the physical memory as soon as possible. Is anything more necessary than that?

In my implementation I am creating a thread dedicated to running for CUDA API / NVDEC. A CUDA context is bound to this thread, and other threads are requesting execution to the dedicated thread through a lambda expression.

Sincerely yours,
Masa-Tam.

Just wanted to say I am experiencing problems very similar to this, unique to HEVC.

In stress-testing, the same application can stably decode 100+ (low resolution) H.264 videos simultaneously, and can dynamically create and destroy videos for many days at a time (and presumably longer). However when I point it at HEVC videos, the same app will generally crash after destroying 1-2 videos, though it can happily play many HEVCs simultaneously (and create them dynamically) also for days at a time.

I’ve spent a good bit of time attempting architectural changes to work around this problem, presuming it was my bug, but I’m increasingly suspicious that it’s not. My architecture is similar to how Masa-Tam describes theirs, minus the COM. Specifically, a thread per-video for FFmpeg demuxing and mapping/unmapping frames. All other interactions with CUDA and CUVID are funneled through a single thread and CUDA context, and all of those interactions are protected with a CUvideoctxlock.

Incidentally, I get the same behavior even if I disable all frame mapping and CUDA interactions - just allocating and deallocating the HEVC videos triggers the crash. Most commonly the crash occurs in a call to cuEventQuery(), resulting in a CUDA_ERROR_UNKNOWN. I’d be grateful for any insights as I’m basically out of things to try now.

Thanks,
@gerhans

This problem was fixed in driver 399.07.

Many thanks,
Masa-Tam.

I can confirm the same - 399.07 or later appears to fix this problem for me too.

Is it only for Windows?
For Ubuntu, the latest driver is 396.54 not 399.07!