OpenGL & CUDA

Hello,

First of all, thank you a lot for all the things I’ve learnt in these forums.
I want to draw an image using opengl and cuda. My image is calculated with a kernel in cuda and stored in an 1D float array and each position contains a value which determines the colour of this pixel. Once the image is calculated I need to draw it in the screen using the gpu as well. Do you know what is the best way to put it in the screen? I don’t know if I need to use texture memory or vertex buffer objects or something else or just use the opengl primitives like glVertex, etc… , so I’m confused about it and I hope you can help me. If you need more details, here I am. Thanks you very much.

Greetings

The simpleGL example in the SDK does what you want. The function you’re probably most interested in is this:

void runCuda( GLuint vbo)

{

	// map OpenGL buffer object for writing from CUDA

	float4 *dptr;

	CUDA_SAFE_CALL(cudaGLMapBufferObject( (void**)&dptr, vbo));

	// execute the kernel

	dim3 block(8, 8, 1);

	dim3 grid(mesh_width / block.x, mesh_height / block.y, 1);

	kernel<<< grid, block>>>(dptr, mesh_width, mesh_height, anim);

	// unmap buffer object

	CUDA_SAFE_CALL(cudaGLUnmapBufferObject( vbo));

}

But go through the whole code to get the hang of it.

As you can see, we map the VBO to a float4 pointer and then we can treat the data behind this pointer as a normal 1D array of type float4 in the kernel. We can access vertices positions (XYZW) or any other data such a buffer object represents (for example, pixel colours in RGBA)

The void display() function shows how to display such a VBO afterwards.

Only be advised, the kernel supplied there has uncoalesced writes!

//naive addressing in the pos array

pos[y*width+x] = make_float4(u, w, v, 1.0f);

You’ll probably want to find a smarter way to address writes to memory if you’re concerned about performance and optimization.

Also, from what I’ve read here openGL interoperability is quite slow currently in CUDA. It works but it’s slower than with DirectX.

You’ll most likely want to stay away from using primitives if you’re aiming for performance. They are simple to use but inefficient. If you wanted to use them, you could in principle copy data back to the CPU, extract primitives’ coordinates from it and call a loop of glRectf or whatnot but it’s about the slowest way to do it :)

You don’t have to use texture memory. Texture memory has nothing to do directly with displaying things, it’s only a way of addressing read-only arrays with some neat features like free interpolation and a bit of caching etc. Of course there’s nothing stopping you from reading data from textures and using it to write into a VBO.

Hi again!

Thanks a lot for your response Big Mac.
In fact, I think it’s just what I want. But one question more, once I’ve calculated the image and done the unmap to the vbo, can I access to vbo as an array? Because if I want to know the value of each pixel I need to read it with a sentence for, for instance, and then set te color apropiately in each position of the grid…or there’s function that having two arrays (one with the vertex of the image (the grid) and other with the colour of each pixel) draw it in the screen? Sorry if you don’t understand my question.

Thanks a lot!
Greetings

AFAIK you can bind any type of buffer object for CUDA. It could be a Pixel Buffer Object instead of a Vertex Buffer Object. I don’t know much about OpenGL but if I understand the concept of PBOs correctly, they store data of pixels as they appear on screen (or as they enter the framebuffer) so in a sense, modifying pixel buffers would let you directly “paint” pixels on screen. I might get this wrong so you better read about PBOs to confirm.

Ok, thank you very much. I’m going to read these papers and choose the best option…

Bye!

Hi again, now I’m thinking to use texture memory from reading the data I used to create the image and try to increase the perfomance. But I have some problem. Here I write some parts of my code:

[codebox]

// Global variable for allocating 1D array

texture<float, 1> textdata;

// In main(), I bind texture with data previously allocated with cudaMalloc

cudaBindTexture(0, textdata, data, sizedata * sizeof(float));

// In kernel to access data, I use

tex1D(textdata,index);

[/codebox]

But I only obtain valid data when index = 0. Rest of data return 0 value instead the real value. Do you know what’s the problem?

Thank you.

Yeah! I found the problem! The function to access is [font=“Courier New”]tex1Dfetch(textdata, index)[/font] instead of [font=“Courier New”]tex1D(texdata, index)[/font].
Thanks anyway!

Ooops! I forgot making the last question…do you know what is the size of the texture memory?
My card has 512MB of memory, but I’m not sure if this corresponds to global memory or is shared between diferent memory spaces…

thanks!

Texture memory and global memory are physically the same thing and both sit in GPU’s VRAM. Think of it this way: you can allocate an ordinary array in the VRAM and this becomes global memory. You can then tell the GPU to treat this array as a texture and from now on it will use a different method (and hardware unit) to access it. Keep in mind, optimized data access is very important for performance so this can make a huge difference.

Hi,

Equally in awe of some of the comments on these forums. A great resource.

Did you work out how to set the color for individual pixels in a VBO/PBO?

Thanks,

John

Hi, i’m not using pbo’s yet but soon I will do it. For getting access to the buffer object, you can use something like this:

// map the PBO to process its data by CPU

glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pbo_id);

GLubyte* ptr = (GLubyte*) glMapBufferARB(GL_PIXEL_PACK_BUFFER_ARB, GL_READ_ONLY_ARB);

if(ptr) {

	// You can set a color for each position of the buffer

	processPixels(ptr, ...);

	glUnmapBufferARB(GL_PIXEL_PACK_BUFFER_ARB);

}

// back to conventional pixel operation

glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, 0);

Anyway, if somebody knows more information about this question I would be very glad to read from you.

Greetings!

Actually I’m still on VBO’s too. I guess it really is a OpenGL question rather than CUDA.

My code is based on the simpleGL example using a VBO with four elements x, y, z, color.

I can access the verticies also thus:

[codebox]glBindBuffer(GL_ARRAY_BUFFER, vbo);

glVertexPointer(3, GL_FLOAT, 4*sizeof(float), 0); //extract the x, y, z coords

glEnableClientState(GL_VERTEX_ARRAY);

glDrawArrays(GL_POINTS, 0, mesh_width * mesh_height);

glDisableClientState(GL_VERTEX_ARRAY);[/codebox]

But then I don’t know how to use the fourth value as color (or colored texture).

I’ve tried to extend the vbo to six elements (after defining float6 with xyz and rgb elements):

[codebox]glBindBuffer(GL_ARRAY_BUFFER, vbo);

glVertexPointer(3, GL_FLOAT, 6*sizeof(float), 0);

glColorPointer(3, GL_FLOAT, 6sizeof(float), 3sizeof(float));

glEnableClientState(GL_VERTEX_ARRAY);

glEnableClientState(GL_COLOR_ARRAY);

glDrawArrays(GL_POINTS, 0, mesh_width * mesh_height);

glDisableClientState(GL_VERTEX_ARRAY);

glDisableClientState(GL_COLOR_ARRAY);[/codebox]

but it crashes with a memory overrun half way throught the first pass at the data.

Any ideas? I’m really hating OpenGL.

I’ve got it doing something close to what I would like to see in by following this tutorial.
[post=“0”]http://www.evl.uic.edu/aej/525/lecture06.html[/post]
However it has doubled the size of my vbo. I’m far more concerned with performance than looks.

Is there a more efficient way?

Thanks,
John