Hi community,
I am using NVIDIA_CUDA-5.5_Samples/3_Imaging/cudaDecodeGL/videoDecodeGL.cpp to decode an mpeg-2 video. It all goes well but I wanted to do some OpenCv processing on the frames extracted by the cuda video decoder.
I have identified the function videoDecodeGL.cpp:copyDecodedFrameToTexture(…) where an ARGB frame is produced within a variable of type CUdeviceptr.
How to inspect the content of that variable? (I tried cuda-gdb unsuccessfully)
How to convert its content into a cv:Mat RGB image?
Thank you for helping
–
Iod
In it, and the pastebin link, and I have shown how to modify the cudaDecodeGL sample so that a particular video frame gets saved to a bmp file. That should demonstrate how to capture a video frame in a familiar format. From there you just need to figure out how to convert that format to your cv:Mat RGB format.
Hi Bob,
This solution worked perfect for me. However the copy function cuMemcpyDtoHost() is costly to the point where my processing efficiency has been reduced from 360 fps to 10 fps.
Is there any way around that.
My decoded frames are 1280x720 and I intend to downscale them to 480x270. That may reduce the frame copy cost. How to do that while the frame is still in Device Memory? I have tried to do that through assigning these values, early on, at the cuvidCreateDecoder() call:
// Scaling should be: 480 X 270
oVideoDecodeCreateInfo_.ulTargetWidth = 480; //oVideoDecodeCreateInfo_.ulWidth;
oVideoDecodeCreateInfo_.ulTargetHeight = 270; //oVideoDecodeCreateInfo_.ulHeight;
// create the decoder
CUresult oResult = cuvidCreateDecoder(&oDecoder_, &oVideoDecodeCreateInfo_);
assert(CUDA_SUCCESS == oResult);
I had meant cuMemcpyDtoH(). It seems cuMemcpyDtoHAsync() is the solution for the lost efficiency problem.
The frame ‘scaling down’ remains open.
Thanks
If you want to stick each captured frame into a cv:Mat RGB image (your words), I don’t see a way around the frame-by-frame D to H memcpy. If, on the other hand, you wanted them in a cv::gpu::GpuMat container, then perhaps the D to H memcpy can be avoided. I really can’t tell you how to take device raw data and jam it into a cv::gpu::GpuMat container. It may be possible, I don’t know enough about OpenCV. Also note that part of the issue here may be unwanted synchronization. If the streams are managed carefully, it should be possible to do the D to H memcpy in an Async fashion.