Hi Folks,
I intend to perform some basic operations on GPU. I have followed a code snippet to make the operation work on single dimensional float arrays. However, I would like to perform similar operation on an Image captured from camera. I am currently reading the frames from camera using the following code :
UniqueObj<Frame> frame(iFrameConsumer->acquireFrame());
IFrame *iFrame = interface_cast<IFrame>(frame);
if (!iFrame)
break;
// Get the Frame's Image.
Image *image = iFrame->getImage();
EGLStream::NV::IImageNativeBuffer *iImageNativeBuffer
= interface_cast<EGLStream::NV::IImageNativeBuffer>(image);
TEST_ERROR_RETURN(!iImageNativeBuffer, "Failed to create an IImageNativeBuffer");
int fd = iImageNativeBuffer->createNvBuffer(Argus::Size {m_framesize.width, m_framesize.height},
NvBufferColorFormat_YUV420, NvBufferLayout_Pitch, &status);
if (status != STATUS_OK)
TEST_ERROR_RETURN(status != STATUS_OK, "Failed to create a native buffer");
#if 1
NvBufferParams params;
NvBufferGetParams(fd, ¶ms);
char *data_mem = NULL;
int size = m_framesize.width* m_framesize.height;
int fsize = params.pitch[0] * m_framesize.height ;
data_mem = (char*)mmap(NULL, fsize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, params.offset[0]);
I have been successful in extracting the Y-channel image which I intend to use for my operation. However, I would like to make use of “Zero Copy” capability of TX1 and map the CPU memory pointer to GPU memory pointer which I am finding hard to accomplish using cudaHostAlloc.
Is there an easy way to map CPU memory pointer ( data_mem ) to GPU memory pointer which can be used to process the frame using CUDA code ? Should I use cudaMemCpy to first copy the frame buffer using data_mem and then perform the operations ? Is there another way to perform the operations on the frame without any copy required so as to minimize run time ?
Thanks.