I am trying to implement reflections on the GPU without raytracing. The algorithm is based on the paper “GPU-driven interactive reflections on curved objects” by Estalella, Martin, Drettakis and Tost. In a first render pass I have to render to positions and normals of the reflector into a texture. This I have made with GLSL, where the output of the fragment shader are the positions/normals and not the colors. After that I load every vertex of my objects to CUDA to calculate the reflected positions into one vertex buffer object.
The “PostProcessGL”-example renders the actual frame into a pbo, but the positions are interpreted as colors and the positions have so a maximum value of x, y, z is one, but the coordinates are often bigger than one.
Other examples, which are working with textures, copy the data of an image into texture and this doesn’t help me too.
So here is my question: How can I render the actual frame, copy to a texture and this texture to CUDA like an uniform texture in GLSL or how is it possible to interpret the pbo with values bigger than one (1.0)?
If somebody can help me or has another idea, it would be very nice!!!
If somebody doesn’t understand my problem, I can post some sample code!! :">
BTW did you noticed that in PostProcessGL precessed texture is copied to CPU, then back to GPU and only then rendered? Using this approach for real world applications could be a little slow.
The postProcessGL sample does not read the image back to the CPU, it reads it to a PBO (pixel buffer object), which will normally reside in GPU memory. Hence, the glReadPixels is actually doing a copy in GPU memory, which should be very fast.
To answer your question, to transfer image data from OpenGL to CUDA you need to read it into a PBO (using glGetTexImage or glReadPixels), and then map the PBO in CUDA. The postProcessGL sample should show you how to do this.
If you don’t want the values to be clamped [0…1], you just need to use floating point image format.
I am actually experiencing problems with the speed of postProcessGL and OpenGL interoperability in general. On my laptop everything is fast and beautiful, but as soon as I plug in in an external display in Dualview display mode everything slows down. This also happens on my workstations when going from single view to dual view display mode. Actually only adding a second graphics card in the computer slows down the interoperability. See my previous post here:
I have implemented the postProcessGL-example almost one to one, but if I test the values of the pbo in CUDA I still get normalized or clamped values. Have I tell OpenGL that I want to use floating point values?
The values have to be in world coordinates that I can work on.