My understanding from the runtime/driver documentation regarding cuda contexts is that they can exist on multiple threads and can safely be push popped without any synchronization/locks. However the documentation inside nvcuvid.h appears to disagree
//! Context-locking: to facilitate multi-threaded implementations, the following 4 functions
//! provide a simple mutex-style host synchronization. If a non-NULL context is specified
//! in CUVIDDECODECREATEINFO, the codec library will acquire the mutex associated with the given
//! context before making any cuda calls.
//! A multi-threaded application could create a lock associated with a context handle so that
//! multiple threads can safely share the same cuda context:
//! - use cuCtxPopCurrent immediately after context creation in order to create a 'floating' context
//! that can be passed to cuvidCtxLockCreate.
//! - When using a floating context, all cuda calls should only be made within a cuvidCtxLock/cuvidCtxUnlock section.
//!
//! NOTE: This is a safer alternative to cuCtxPushCurrent and cuCtxPopCurrent, and is not related to video
//! decoder in any way (implemented as a critical section associated with cuCtx{Push|Pop}Current calls).
it sounds like this was a way to get around the single context per thread limitation (pre CUDA 4.0 I think) but should be unecessary now is that correct?
If so would it also be unecessary to use this with CUVIDDECODECREATEINFO::vidLock
CUvideoctxlock vidLock; /**< IN: If non-NULL, context lock used for synchronizing ownership of
the cuda context. Needed for cudaVideoCreate_PreferCUDA decode */
I am asking because this lock is still used inside the samples (NvDecoder.cpp:279).