display a buffer openGL/cuda question


I think this question is more openGL stuff, but still it’s cuda related.

I’d like to share memory between openGL and cuda

with the commands

glBindBufferARB(GL_ARRAY_BUFFER_ARB, pbuffer);

glVertexPointer(2, GL_FLOAT, 0, NULL);

glDrawArrays(GL_POINTS, 0, DS);

it is possible to draw a shard buffer (pbuffer here). The buffer contains simply points (particles), that are then drawn to the right location.

In my buffer there are not Points but scalar values. They represent a color (let’s assume only gray for the moment). Is it possible to draw such an array with openGL?

What if I would have a cudaArray with such values? Would it also be possible to pass that directly to openGL for drawing? Or do I need to draw that point by point?

Since I’m new to openGL and CUDA I’m glad for any hint :)



Currently one has to use PBO.
In your case, you could convert the float to RGBA colors in CUDA, copy the values to a texture (as mentioned in another thread), and draw the texture in GL.
Just remember not to use glDrawPixels. It’s slower than texture.

Severin, you cannot share buffers between CUDA and OpenGL as they live in different contexts. So you need to copy the data between the contexts. It will be relatively fast as this will be in GPU mem. As asadafag already said, you can use a PBO or VBO for that. See the various image processing examples in the SDK. An important optimization is that you do not need to unbind the buffer from CUDA (writing to it) if OpenGL-copy-to-texture only reads from it. Just make sure that the CUDA kernel has finished before doing the texture copy.


thanks for your answers.

I took a closer look to the boxfilter example.

What is not yet clear to me is how all these different variables are related to each other.

so far I understand that

‘h_img’ is the original image on the host (used for loading it into the device)
‘d_img’ is the picture in the device memory and it’s modified by cuda
then there is d_array and tex, how are these related to each other and to d_img?


Is there no way to call glDrawPixels on a buffer that is in CUDA memory?

Currently I allocate an output buffer as follows to store the results of the CUDA kernel.

CUDA_SAFE_CALL( cudaMalloc( (void**) &DevOutBuf, OutBufSize));

Inside the CUDA kernel, the results are stored in DevOutBuf and are of type short.

The result is a grayscale image.

What is the easiest way to display the resulting image?

Can you point me to a specific image processing example? I am trying to display a grayscale image where pixels are of type short. I can display the image by copying it back into the host memory and using glDrawPixels, but this results in only 10 fps.

check out the postProcessGl example.

You can map an OpenGL PBO to CUDA memory, then have the CUDA kernel write to it, then unmap it and use glDrawPixels on it.

Hi folks,

my question goes kind of in the same direction. So far my app does the following:

  1. cudaAlloc two linear buffers A and B on the device side

  2. cudaMemcpy an image from host do device memory buffer A

  3. execute a kernel which loads parts of A into shared memory, does some transformation and stores result values in B. After this buffer B contains an image with RGB 16bit elements.

Now my question is:

How do I efficiently display the image in B with openGL without doing any additional host-device data transfers?

I understand I have to copy contents of buffer B somehow from the CUDA context into openGL context, but I have no clue as to how exactly I can do that.

I had a close look at the SDK projects that use PBOs for displaying images already, but they confuse me even more.

imageDenoising, for instance:

//in the main function you can find these three lines of code:

CUDA_SAFE_CALL( cudaMemcpyToArray(a_Src, 0, 0,h_Src, imageW * imageH * sizeof(uchar4), cudaMemcpyHostToDevice)


glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, imageW, imageH, 0, GL_RGBA, GL_UNSIGNED_BYTE, h_Src);


glBufferData(GL_PIXEL_UNPACK_BUFFER_ARB, imageW * imageH * 4, h_Src, GL_STREAM_COPY);

The same with boxFilter:

//the initCuda() function contains this line of code:

CUDA_SAFE_CALL( cudaMemcpyToArray( d_array, 0, 0, h_img, size, cudaMemcpyHostToDevice));


//additionally the initOpenGl() function copies the same h_img again:

glBufferDataARB(GL_PIXEL_UNPACK_BUFFER_ARB, width*height*sizeof(float), h_img, GL_STREAM_DRAW_ARB);

The obvious question is: Don’t they copy the image from host do device 3 or 2 times, respectively!?

I would appreciate any help.

ashcor: Check the nBody simulation that comes with the SDK. Earlier I pointed you to the postProcessGL example - my bad. I got confused with the names of the project.
What you have posted in the very first post is very similar to the way the nbody project renders its particles.

hope it helps.

After you have executed your kernel you can copy the VBO to a texture and output that texture on a simple polygon. This assumes that you have already initialized a texture of appropriate size.

      //Copy the VBO to a texture

	glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, vbo);  

	glBindTexture(GL_TEXTURE_2D, texName);

	glTexSubImage2D(GL_TEXTURE_2D,0, 0,0,

  width, height,


       //Draw a Single Textured Quad


  glTexCoord2f(0.0f, 0.0f); glVertex2f(-1.0,	-1.0	);

  glTexCoord2f(1.0f, 0.0f); glVertex2f(-1.0,	1.0	);	

  glTexCoord2f(1.0f, 1.0f); glVertex2f(1.0,	1.0	);

  glTexCoord2f(0.0f, 1.0f); glVertex2f(1.0,	-1.0	);




This is a device to device transfer.

Thanks, the trick is to init both PBO and Texture buffer with NULL to avoid unnecessary host-to-device transfers. Simply put, I have to copy each image I process with CUDA first

  1. from my CUDA device buffer to the mapped OpenGL PBO

  2. from the OpenGL PBO to a OpenGL texture buffer

  3. and finally from the texture buffer to the frame buffer

So so difficult after all.