Hi everybody!!

I am trying to implement reflections on the GPU without raytracing. The algorithm is based on the paper “GPU-driven interactive reflections on curved objects” by Estalella, Martin, Drettakis and Tost. In a first render pass I have to render to positions and normals of the reflector into a texture. This I have made with GLSL, where the output of the fragment shader are the positions/normals and not the colors. After that I load every vertex of my objects to CUDA to calculate the reflected positions into one vertex buffer object.

The “PostProcessGL”-example renders the actual frame into a pbo, but the positions are interpreted as colors and the positions have so a maximum value of x, y, z is one, but the coordinates are often bigger than one.

Other examples, which are working with textures, copy the data of an image into texture and this doesn’t help me too.

So here is my question: How can I render the actual frame, copy to a texture and this texture to CUDA like an uniform texture in GLSL or how is it possible to interpret the pbo with values bigger than one (1.0)?

If somebody can help me or has another idea, it would be very nice!!!
If somebody doesn’t understand my problem, I can post some sample code!! :">


Greetings thopil :wave:

BTW did you noticed that in PostProcessGL precessed texture is copied to CPU, then back to GPU and only then rendered? Using this approach for real world applications could be a little slow.

Yes, of course I noticed that this solution could be a little slow. But I only need the texture in the kernel and it doesn’t be rendered.

I am searching for a solution, that renders my actual frame two times, copy this to the kernel and than throw this texture/pbo away.

Maybe I can render the frame to a texture and anyway bind the texture to CUDA, but in the examples there is always data on the host side…

Hi, me again!!! :angel:

I will try to restate my question:

I am rendering an image with OpenGL to one texture. How can I copy or upload this texture or maybe a framebuffer object to CUDA?

Thanks and greetings,

thopil :w00twave:

The postProcessGL sample does not read the image back to the CPU, it reads it to a PBO (pixel buffer object), which will normally reside in GPU memory. Hence, the glReadPixels is actually doing a copy in GPU memory, which should be very fast.

To answer your question, to transfer image data from OpenGL to CUDA you need to read it into a PBO (using glGetTexImage or glReadPixels), and then map the PBO in CUDA. The postProcessGL sample should show you how to do this.

If you don’t want the values to be clamped [0…1], you just need to use floating point image format.

I am actually experiencing problems with the speed of postProcessGL and OpenGL interoperability in general. On my laptop everything is fast and beautiful, but as soon as I plug in in an external display in Dualview display mode everything slows down. This also happens on my workstations when going from single view to dual view display mode. Actually only adding a second graphics card in the computer slows down the interoperability. See my previous post here:

OpenGL and CUDA Interoperability

Any insights into this?


Yes, this is a known bug which we’ll try to get fixed in the next driver release.

I have implemented the postProcessGL-example almost one to one, but if I test the values of the pbo in CUDA I still get normalized or clamped values. Have I tell OpenGL that I want to use floating point values?

The values have to be in world coordinates that I can work on.

Thanks, thopil :icecream:

If you render to the floating point texture you need to set glClampColorARB for unclamped values