Threaded Cuda video decoding

Hi, I’m working in Cuda 3.2, with OpenGL, to decode multiple videos at the same time. I am doing this across threads and I am running into some serious problems, e.g. BSOD. I think the problems stem from me not really understanding the Cuda threading model.

What is the purpose of pushing the Cuda context? Do I need to pop the context after its creation? What is the difference between pushing the Cuda context and locking the Cuda context? What areas of the Cuda calls needs to be locked while being accessed, specifically the decoding code, source loading, and Cuda initialization?

I am currently creating a thread for each video to load, initialize, and decode and then have the Pixel Buffer object copied to the texture and rendered in the main thread. Is this a good idea? Should I copy the Pixel Buffer into the texture in the video thread and only render the texture in the main thread?

Would it be a better plan to have one thread for all the video loading and initialization, another thread for all the decoding and Pixel Buffer copying, and do all the rendering in the main thread?

If it also helps, I am using a Quadro 6000 and I can get a maximum of 6 videos at once in my application and 7 standalone. I use to be able to get 7 and 8 respectively, but the new nvidia drivers 275.65 have dropped that number.
Also, if I go over my maximum number of videos my system will blue screen. No matter what I do, I can’t seem to catch any errors or get any error codes before it crashes.

I’m also working w/ video decoding on CUDA (albiet in a single threaded environment). As far as I know, the context is analogous to a CPU process (but on the GPU), and the device pointer’s mappings are associated to their given context (i.e. if you pass a CUdeviceptr to a different context, it won’t be valid). The idea behind pushing and popping contexts are to allow for using a different context on a given host thread. Namely, each host thread has a stack of current contexts, so in pushing and popping the host thread’s context, you can change the current one. When you pop a context, it becomes a floating context, and can be pushed as the current context for any host thread.

Hope that was useful, as for the other questions, I’d be interested to know more about those too…

All of your questions are probably not that important, what matters is:

(create context)

push context

… do your stuff… with api’s and what not.

pop context

(destroy context)

“think of push context” as entering the gpu.

“think of pop context” as leaving the gpu.

A gpu can also have multiple contexes so you can enter multiple and leave multiple… but for each push there should also be a pop.

You can also use NVIDIA’s CUDA accelerated Video decoding/encoding libraries available as part of CUDA distribution

Woops… I just noticed something, correct pseudo code is:

(create context, pop previous)

push context

… do your stuff… with api’s and what not.

pop context

(destroy context)

“think of push context” as entering the gpu.

“think of pop context” as leaving the gpu.

Are you referring to the cuvid API? (used in the decode/encode SDK examples?) or are there other ones?

Hi, jbluepolarbear, have you solved your problem yet? have you got all the answers to your questions? I am also having the same problems, could anyone please tell me the answers?

"

  1. why do we need to create a CUvideoctxlock? can I remove it? What areas of the Cuda calls needs to be locked while being accessed?
  2. If I call cuvidCtxLock(), what will happen? will all the other operations in the same context be blocked?
    "