CUDA Rendering Porting GL/GLSL app to CUDA

I have an app which uses GLSL some phyiscally based lighting techniques. I am working on enhancing the shaders, but I also want to see if I can get a speed up by using CUDA instead of GL and GLSL. My app loads a few objects and I move them about, rotate them etc. via the glTransform, glRotate etc.

My biggest stumbling block so for is representing the geometry in the CUDA. I assume that once I have the model data in a structure of some sort it should be simple enough to apply transformations to translate/rotate/etc.

Let’s say my scene consists of a single cube loaded from an .obj file. Once I read the file, I have all the vertex, material etc. information etc. In my OpenGL app I stuff it in a VBO and let GL do it’s thing, i.e. GL does all the transformations, rasterization, etc. and I end up with the final image of the rendered scene. If I want to make my app CUDA only, am I going to have to recreate the GL pipeline in CUDA?

I suppose it would appear that the real question here might be “Why not just use GL?”. The answer is multifaceted:

  1. GLSL doesn’t support some mathematical operations I need.

  2. My algorithms have outgrown GLSL. They aren’t necessariIly traditional rendering algorithms. My application is a mix of traditional rendering and GPGPU.

  3. Speed. I need more speed.

  4. Ease of development.

Any sugeestions on:

Papers to read?
Keywords to google?
Anything else?


I’ve tried doing “software rendering” in CUDA, in particular I was doing 2D polygon rasterization in a CUDA kernel. It is a very basic rasterizer, no texturing or lighting applied. I used a tiling scheme, much like the old PowerVR chips. I ended up being only a third as fast as pure OpenGL rendering - and that is after spending a lot of time on code optimization. CUDA was faster only in some extreme scenarios where OpenGL became very fill rate limited.

I was able to perform some metric (e.g. mean square error) computations during the CUDA rendering phase, which would have required me to run a second pass if I had used OpenGL to render this image. I am not sure if this tiny advantage outweighs the speed disadvantage. I haven’t finished this project yet, but you’ll find some Win32 binary code of my renderer in the “Genetic image compression using transparent polygons” thread.

That name of yours rings a bell. You’re the guy who prefers to use a pseudonym on the OpenSceneGraph mailing list, heh. I really wonder what you’re up to. Being very secretive. ;)

This is similar to what I’ve implemented also. I got the 2D bounding box of the triangle, split it into tiles (say 16x16), passed the tiles and the barycentric coordinates of the triangle to the kernel. Works, but is not as fast as GL rasterization.

Wouldn’t table lookups (texture reads) or other approximations of these mathematical operations (e.g. Taylor or Exponential series) be an option in your GLSL code?

It might still be possible to combine CUDA and traditional GLSL rendering. OpenScenegraph recently got “osgCuda” (developed at University of Siegen) which may possibly of some use to you.

Looks like rendering in CUDA may end up being slower.0

I can’t imagine it would be easier than GLSL. CUDA has some nasty performance bottlenecks in case you don’t watch out for a coalesced memory access pattern, and make good use of shared and constant memory . One more thing to consider: GLSL is not vendor specific, CUDA currently is.