CUDA 2.0 multi threading


I’ve got a multithreading application with the Cuda H264 decoder libaries. The decoder works fine with the NV12toYUV conversion done by CPU. I wanted to add the NV12TORGB (NV12ToARGB_drvapi) conversion from the SDK samples in order to speed up the whole decoding process. I’ve got a 999 error code on then a 700 error code (CUDA_ERROR_LAUNCH_FAILED) on cuvidMapVideoFrame function call. I suppose there is a possible conflict on multi threading with the H264 library and the fact to load and launch a Cuda module. I use the GTX280 and the last version of the cuda2.0 and last version of driver. I saw Eric Young posted a solution but I can’t download it from the forum site - Download error from the server ?


XP 32 bits OS

Hi Ollie,

You should take a look at the following thread discussion:

Since the device is a slave to the CPU, it would be theoratically possible for the CPU to launch several threads on it.
Unfortunately, you can only run one kernel at the time on the GPU, so I’m guessing that the GPU is going to queue all kernels launched and execute them serially. I’m not quite sure here, so confirmation from an Nvidia expert would be greatly appreciated.

In order to adapt your multi-threaded program to the GPU, you can either map every CPU thread to different GPUs (multithreading on multi-gpus), or launch an asynchronous kernel and use the CPU for another decoding thread.

Hope it helped !