We have a program with encoding and decoding capabilities. When it was initially implemented we checked ffmpeg implementation and implemented the same approach - use individual cuda context for each decoding and encoding session. This approach works fine in case of ffmpeg as user usually run several ffmpeg for different stream and it looks like nvidia driver works fine in case if multiple contexts created from different processes. In our case we have only one process and faced with the case that multiple contexts created slowly and we were not able to create more that 20 contexts in our application even on powerful GPU like Tesla P100.
I checked different applications and stackoverflow posts and found that some developers use only one context for both encoding and decoding. I checked this approach and it works fine to me:
- create context cuCtxCreate
- detach context from current thread cuCtxPopCurrent
- set the context as current for every thread before use it via cuCtxSetCurrent
- use the context in all encoding and decoding thread
- use cuvidCreateDecoder to create decoder or
- use NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS.device = the_context; nvEncOpenEncodeSessionEx to create encoder.
Everything works fine and this approach looks just brilliant. I just need nvidia|experts confirmation that it’s legal approach and not a side effect. Please note that without cuCtxSetCurrent call encoder will fail on second encoder session. So this call required even though context pushed in “device” parameter of NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS during encoding session creation nvEncOpenEncodeSessionEx. So I think cuCtxSetCurrent is actually mark context as sharable because in case if we have one context-one encoding session this call not required. But I cannot find any confirmation in docs.
There are also cuvidCtxLockCreate, cuvidCtxLock, cuvidCtxUnlock functions in SDK used for “Context-locking: to facilitate multi-threaded implementations” according to description. These function are not deprecated but I don’t use them in my current implementation and everything is fine. Should I use them or cuCtxSetCurrent is enough for both encoding threads and decoding threads.
Thanks a lot !