Cuda-OpenCv: how to convert the ARGB within a CUdeviceptr into a cv:Mat RGB matrix?

Hi community,
I am using NVIDIA_CUDA-5.5_Samples/3_Imaging/cudaDecodeGL/videoDecodeGL.cpp to decode an mpeg-2 video. It all goes well but I wanted to do some OpenCv processing on the frames extracted by the cuda video decoder.
I have identified the function videoDecodeGL.cpp:copyDecodedFrameToTexture(…) where an ARGB frame is produced within a variable of type CUdeviceptr.

  1. How to inspect the content of that variable? (I tried cuda-gdb unsuccessfully)
  2. How to convert its content into a cv:Mat RGB image?
    Thank you for helping


You might be interested in this answer that I hacked together on SO:

In it, and the pastebin link, and I have shown how to modify the cudaDecodeGL sample so that a particular video frame gets saved to a bmp file. That should demonstrate how to capture a video frame in a familiar format. From there you just need to figure out how to convert that format to your cv:Mat RGB format.

Hi Bob,
This solution worked perfect for me. However the copy function cuMemcpyDtoHost() is costly to the point where my processing efficiency has been reduced from 360 fps to 10 fps.

  1. Is there any way around that.

  2. My decoded frames are 1280x720 and I intend to downscale them to 480x270. That may reduce the frame copy cost. How to do that while the frame is still in Device Memory? I have tried to do that through assigning these values, early on, at the cuvidCreateDecoder() call:

    // Scaling should be: 480 X 270
    oVideoDecodeCreateInfo_.ulTargetWidth = 480; //oVideoDecodeCreateInfo_.ulWidth;
    oVideoDecodeCreateInfo_.ulTargetHeight = 270; //oVideoDecodeCreateInfo_.ulHeight;
    // create the decoder
    CUresult oResult = cuvidCreateDecoder(&oDecoder_, &oVideoDecodeCreateInfo_);
    assert(CUDA_SUCCESS == oResult);

But clearly this was not enough.
Thanks a lot


I had meant cuMemcpyDtoH(). It seems cuMemcpyDtoHAsync() is the solution for the lost efficiency problem.
The frame ‘scaling down’ remains open.

I am not very knowledgeable about OpenCV.

If you want to stick each captured frame into a cv:Mat RGB image (your words), I don’t see a way around the frame-by-frame D to H memcpy. If, on the other hand, you wanted them in a cv::gpu::GpuMat container, then perhaps the D to H memcpy can be avoided. I really can’t tell you how to take device raw data and jam it into a cv::gpu::GpuMat container. It may be possible, I don’t know enough about OpenCV. Also note that part of the issue here may be unwanted synchronization. If the streams are managed carefully, it should be possible to do the D to H memcpy in an Async fashion.

Hi iod,

  1. How to inspect the content of that variable? (I tried cuda-gdb unsuccessfully)

Can you post the steps you took to try to inspect this in cuda-gdb to show exactly what failed? Also, which GPU were you running on?