How to decode multiple videos concurrently with NVENC?

I want to decode multiple videos concurrently with NVDEC within same process. I googled a lot but couldn’t find an answer to my questions.

  1. Do I need to create a separate decoder and parser for each video, meaning do I need to call cuvidCreateDecoder and cuvidCreateVideoParser for each video?
  2. If the answer to (1) is "Yes", do I need to create a separate context or all decoders intances will run in the same context?
  3. If the answer to (1) is "No", can you explain how to achieve concurrent decoding of multiple videos? Is cudaStream_t used somehow to parallelize multiple decoders? I coulnd't find where does decoder use cudaStream_t.

My experience with this situation is that you have to call cuvidCreateDecoder once for each stream and then manage each session separately (e.g. in a different thread). However I also find that the number of sessions is limited. On my laptop with Quadro K1100M I can only create 4. After that CreateDecoder throws an exception. Like you I have googled (with no luck) trying to determine how many sessions are supported by various hardware platforms…