bind texture reference to raw linear data or array

I am not a computer graphic people but I have been working on using CUDA to solve some other scientific computing problem. So I have a bounch of question regarding some CG terminologies:

There are functions in CUDA API to “bind” texture reference to raw linear global memory or CUDA array which is also in global memory. What does this “bind” do?

I suppose OpenGL texture memory is a chunk of memory which can be used for texture maping in graphic card. If I “bind” a texture reference to a CUDA array, does it means graphic card change the location of texture to the address of the CUDA array? So that in the future, whatever you put into the array will be used for texture mapping such as glTexImage2D?

The reason that I am asking this question is that I want to generate some results in CUDA kernel and display them directly on graphic card, without shipping them back to host memory and using OpenGL to display them. I have been stucked here for a couple of days and still coundn’t figure it out.

The demo code in imdenoise sample is rather confusing to me. In that example, a PBO buffer (I suppose it locates on the global memory) is created to store the processing result, and this demo use the following code to display the results:

glTexSubImage2D( GL_TEXTURE_2D, 0, 0, 0, imageW, imageH, GL_RGBA, GL_UNSIGNED_BYTE, BUFFER_DATA(0));

And BUFFER_DATA(0) is indeed a NULL pointer! So how could this display the result physically being stored in d_dst which is the global address of the PBO buffer?

I will appreaciate so much if some expert could shed some light on my question!


A texture object is actually some opaque definition of how the image data should be handled by the GL. Binding to a texture means, the GL needs to actually provide storage for the data. For normal GL users, acquiring the storage is transparent. glTexImage2D just happens to manage it for you. When you connect a CUDA buffer to a texture, you have to do the storage handling yourself.

  1. You need the texture info: glGenTexture()
  2. You need storage in GL memory space: the PBO
  3. You need to tell CUDA that it should use the PBO as a buffer: cudaGLRegisterBufferObject
  4. Construct a CUDA pointer into this buffer: cudaGLMapBufferObject
  5. CUDA work
  6. Stop writing to the buffer: cudaGLUnmapBufferObject
  7. Update the texture info with the buffer content: glTexSubImage2D()
  8. Draw the texture as normal

Providing GL calls like glTexSubImage2D with a NULL for the data is a common concept that expresses that the copy should happen on the GPU. Currently, the driver actually does perform a copy. Some NVIDIA guys have already said, they are working on getting rid of the copy, so glTexSubImage2D will hopefully get very fast.



Thank you very much for the prompt help! I will follow your advice.

Could you clarify the common concept of glTexSubImage2D usage further more:

Say I put the CUDA result in PBO, and I use NULL pointer in glTexSubImage2D to copy the PBO into texture memory. But how can I tell glTexSubImage2D where to find the data stored in PBO ?(since I use NULL pointer.) Dose glTexSubImage2D automatically copy the data from PBO? If so, what will happen if I have more than one PBO objects?


Thanks for Peter’s help. I just make it work!

But there is still a issue:

The function “cudaGLUnmapBufferObject” takes unusually long time (about 30 ms). Since I only use OPB buffer for CUDA to modify. Therefore I only map and unmap it once the whole application. So this is fine for this application.

Could anybody explain why cudaGLUnmapBufferObject takes so long?


See the man page of glTexSubImage2D - you can specify the rectangle to transfer.

There is currently an issue with CUDA keeping a shadow copy of the PBO somewhere. Simon explained this in some other thread when talking about the postProcessGL SDK example. Search the board for it. The delay should hopefully go away completely.

If you don’t modify the PBO using GL (just read from it by means of a texture), there is no need to unbind it from CUDA, as you have already observed, if you use the same GPU for CUDA and rendering. If you are running two or more cards, the texture will never get updated, so in this case the cudaGLUnmapBufferObject is necessary and will need to transfer data between the cards.