Performance drawback: Copying data from PBO too slow

On Tx2 4g, we want to stream data from opengl to application , as in below codes. It turns out memcpy() become very slow, like 30ms, to copy 1920 X 1080 X 4 bytes, though it only cost 2ms if we do memcpy on ordinary CPU memeries. Could you please help to check if anything was wrong in our codes?

void PBOReceiver::getDataPBO(int startX,int startY,int w,int h, GLuint idx)
static char dst[192010804];
int index = 0, nextIndex = 0; // pbo index used for next frame
idx = idx %PBOChannelCount;
GLenum error= 0;
// “index” is used to copy pixels from a PBO to a texture object
// “nextIndex” is used to update pixels in a PBO
// In dual PBO mode, increment current index first then get the next index
index = pIndex[idx] = (pIndex[idx]+PBOChannelCount) % PBOBufferCount;
nextIndex = (index + PBOChannelCount) % PBOBufferCount;

//read data from FBO to PBO
//glReadPixels() will return immediately
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pboIds[index]);
// copy pixels from PBO to texture object
// Use offset instead of pointer.
// bind PBO to update pixel values
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pboIds[nextIndex]);
// map the buffer object into client’s memory
pPixelBuffer[nextIndex] = (GLubyte*)glMapBufferARB(GL_PIXEL_PACK_BUFFER_ARB, GL_READ_ONLY_ARB);
memcpy(dst, pPixelBuffer[nextIndex], 192010804 ); // slow, 30 ms
// let OpenGL release it
glUnmapBufferARB(GL_PIXEL_PACK_BUFFER_ARB); // release pointer to mapping buffer
cout<<“PBOSender Err!”<<endl;
// copy pixels from PBO to texture object
// Use offset instead of pointer.
// it is good idea to release PBOs with ID 0 after use.
// Once bound with 0, all pixel operations behave normal ways.


--------------------By the way, below is initiation codes ---------------
bool PBOReceiver::Init()
// create 2 pixel buffer objects, you need to delete them when program exits.
// glBufferDataARB with NULL pointer reserves only memory space.
glGenBuffersARB(PBOBufferCount, pboIds);
for(int i = 0; i < PBOBufferCount; i++){
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pboIds[i]);
return true;

You may try to run with OpenGL + NvBuffer. Here is a sample:

An NvBuffer is an EGLImage and you should be able to copy the data through GL function calls.
We have samples of using NvBuffer in


and a render implementation in


thanks, I’ll give it a try

Hi, DaneLLL,

Unfortunately, I couldn’t have the demo codes compiled from main_1.c running right. Upon execution, it reported following error:
NvEGLImageFromFd: Failed to create EGLImage from dma-buf fd (1828717595)
NvMapMemCacheMaint Bad parameter.

if I execute it with sudo, i.e. sudo ./out
it reported:
No protocol specified
nvbuf_utils: Could not get EGL display connection
No protocol specified
No protocol specified
NvEGLImageFromFd: No EGLDisplay to create EGLImage
NvMapMemCacheMaint Bad parameter

Could you help to check if anything wrong?

I found the solution to NvEGLImageFromFd() failure: the format arbg32 in dma-buf creation should be switched to nv12 or any other format.