The best way to copy OpenGL texture to CUDA

I have written a method where I take an OpenGL texture ID as input, read the contents of the texture, store it in CUDA’s memory space and output a CUdeviceptr. I have studied the postProcessGL example from the sdk. However, I am using the driver API.

In short, I do as follows:

  • take an OpenGL texture ID as input
  • allocate memory for a pixel buffer object, pbo
  • bind the pbo (as GL_PIXEL_PACK_BUFFER)
  • use glGetTexImage to read data from the texture (which I have bound using the provided ID) into the pbo
  • bind a 0 buffer and a 0 texture
  • register the pbo with CUDA using cuGLRegisterBufferObject()
  • map the pbo to a CUdeviceptr (devPtrTemp) using cuGLMapBufferObject()
  • copy from devPtrTemp to another CUdeviceptr (devPtr) using cuMemcpyDtoD()
  • unmap the pbo using cuGLUnmapBufferObject()
  • unregister the pbo using cuGLUnregisterBufferObject()
  • set the output to devPtr

It seems to work fine. However, is this really the most efficient way to do it? For example, I would like to avoid copying from one CUdeviceptr to another if possible. Since I want to unmap and unregister the buffer it seems to me like I have to do this copy though, or am I wrong?

Also, does anyone know whether using glGetTexImage together with pixel buffer objects actually means copying data and whether this is done in gpu memory or not?

Thanks in advance!

If you want to use the PBO data after unmapping and unregitering the PBO, then yes, you should do a memcopy since OpenGL allows PBOs to be moved in memory (even to cpu memory).

Do you do this operation once or for each frame? Note that if a PBO is being used as a texture source, you can register it once and then not unregister until you do the clean up.

There is an implicit memcopy when you call glTex[sub]Image with a PBO. This is due to OpenGL and is independent of CUDA. The vast majority of time this will be a copy of memory on the GPU, but, as I mentioned earlier, there is a possibility that a PBO has been moved to cpu memory (in which case you have a copy over PCIe). I believe the latter case can occur when you have PBOs and textures exceeding gpu memory resources.



Could you also explain or give me a direction for finding related documents/code/etc… about copy data on texture to cuda array? My purpose is to do some general purpose computations that implemented by Cg by my ex-colleague. The code snippet is like below:

void CColorImage::init_Cuda()


glGenTextures(1, &g_texture_tmp1);	// generate a texture ID

glEnable(GL_TEXTURE_2D);	// enable rectangle texture

glBindTexture(GL_TEXTURE_2D, g_texture_tmp1);	// bind the texture to a rectangle texture



glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F_ARB, window_width, window_height, 0, GL_LUMINANCE, GL_FLOAT, NULL);

cg_fprofile = cgGLGetLatestProfile(CG_GL_FRAGMENT);  //get the  latest fragment profile available 

cgGLSetOptimalOptions(cg_fprofile);  //for optimal compilation 


fragment_flow = cgCreateProgramFromFile(cg_context,CG_SOURCE,FragmentProgramFlow,cg_fprofile,NULL,NULL);  //compile and read Cg fragment program 

cgGLLoadProgram(fragment_flow);  //load the Cg fragment program

//get the handles of input variables in the Cg fragment program	 

cg_input2 = cgGetNamedParameter(fragment_flow, "input2"); 

cg_input3 = cgGetNamedParameter(fragment_flow, "input3"); 


void CColorImage::OnPaint()


// Enable the Cg program 



           //enable the input texture for the Cg program 


cgGLSetTextureParameter(cg_input2, g_texture[2]);


//Draw a quadrangle 



	  glTexCoord2f(0, 1);

                              glVertex2f(-1, 1);


                              glVertex2f(1, 1);

	  glTexCoord2f(1, 0);

                              glVertex2f(1, -1);

                              glTexCoord2f(0, 0);

	  glVertex2f(-1, -1);



// copy the result from the first-pass rendering to the tmp texture

            glBindTexture(GL_TEXTURE_2D, g_texture_tmp1);

glCopyTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 0, 0, window_width,window_height);

glBindTexture(GL_TEXTURE_2D, 0);

//disable the input texture for the Cg program


//disable the Cg program 



In Cg, it map texture ID directly to its cg_input2, but in CUDA, it seems that I need to map to cuda array… I would like to know how to map it.

Thank you a lot!!

What do you want to do with the texture in your CUDA code:

  1. are you going to fetch from it, like you would in a Cg program? If so, do you need texture functionality, or would coalesced accesses to memory with the data suffice?

  2. Is the texture in question used by OpenGL at all? Meaning, is it used only by the CUDA kernel, or by both CUDA and graphics rendering.

As far as mapping a graphics texture to a CUDA array is concerned, register the PBO, map it to get a device pointer, and then do a memcopy from that pointer to a CUDA array. Then bind the array to a CUDA texture and fire off your kernel. However, depending on your answers to the questions above, all these steps may not be necessary.


Hi Paulius:

  1. My goal is to use the data stored in texture(g_texture_tmp1, for example), I think I need to map the data to the cuda array that I want to use for another computing…

  2. First I want to do some computation by CUDA kernel, then I need to show the result on a window. My implementation is to combined CUDA and MFC.

If possible, could you explain the concept between texture reference and texture ID? I am new to CUDA/OpenGL, and still confused about that.

Thank your reply.

Hi Paulius,

Haven’t got your reply, maybe I didn’t give a clear description for my question…

I have read the sample code “boxFilter” and “postProcessGL,” but I still can’t get the point… I explain my question below, and hope you can give me some directions.

In the original code, I saw my ex-colleague generated and binded, uploaded some temporary texture in the init function like below:

glGenTextures(1, &g_texture_tmp1); //generate a texture ID

glEnable(GL_TEXTURE_2D); //enable rectangle texture

//Bind the texture to a rectangle texture

glBindTexture(GL_TEXTURE_2D, g_texture_tmp1);

//uploading the image to the texture memory of the GPU

glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F_ARB, window_width, window_height, 0, GL_LUMINANCE,GL_FLOAT,NULL);



Then, in the paint function, he also do the similar steps to other texture but loaded with the input data we get from our system:

glEnable(GL_TEXTURE_2D);  //enable rectangle texture  

glBindTexture(GL_TEXTURE_2D, g_texture[0]);

//Bind the texture to a rectangle texture

//uploading the image to the texture memory of the GPU




Then he mapped the texture as


cgGLSetTextureParameter(cg_input4, g_texture_tmp1);

and so on…

As my comprehension, the cg_input4 is used as the texture reference defined in CUDA programming guide. Note that at this time the data stored in this texture are not displayed, I mean, it didn’t get any relationship with PBO we defined eariler. I don’t know how to map the data in this texture to a CUDA array… In the programming guide I didn’t find the respective functions for this purpose. Even as your suggestion: Map PBO to a device pointer, I don’t know what function I should call. Maybe I misunderstood something, can you explain further for me? If you need, I can mail the .cpp file (combined with Cg) to you. Thanks for your reply in advance.
(upload://zud1TuLNptls3dnfHR0nhlJMz6c.cpp) (44.3 KB)

My point is that if the texture you are referring to is never used for rendering, just computing, don’t even create an OpenGL texture. Just load the data directly with cuda.

For rendering output from CUDA, create a PBO, register it with CUDA, map it to get the pointer, then have the CUDA write it’s output in the memory addressed by that pointer. Then unmap the PBO and display it with either a glDrawPixels or via an OpenGL texture.