CUDA rasterizer


I’m trying to develop a CUDA triangle rasterizer and I was wondering…

  1. Have the NVIDIA’s modern cards dedicated silicon for this? or… is all made using CUDA/PTX? Did you implement rasterization using CUDA in your drivers?

  2. How do do you setup the triangles ( f.ex: tile assignment )? Do you use sweep rasterization, edge walking, block scanning, etc?

  3. Do you have any in-depth technical article about how could be this done using CUDA? So far I’ve seen Michael Abrash’s tile rasterization in LRB and Devmaster’s rasterization series… but I’m not sure if this could be applicable in an efficient manner in CUDA because these articles are thought for CPUs really.

Btw… I would buy a NVIDIA’s book ( like the GPU Gems or the latest D.Kirk’s one ) about this theme! :teehee: It’s just and idea for future books :santa:


I think the best way is use the opengl interop.

You and I have done a lot of CUDA raytracing, but rasterization is indeed also interesting.

At SIGGRAPH 09, there was a short talk about using CUDA to rasterize polygons. I think it was this paper. I watched their presentation and remember that I was skeptical that they’d get very fast rasterization compared to hardware Z, but it was really surprising just how well they did. Their rasterization was of course a special case of multiple layers but the general concepts were there for generalized rasterization too.

I also think they did a poster or sketch at SIGGRAPH ASIA 09.

Also of great interest is the REYES implementation by Kun Zhou. This is rasterization as well, though the micropoly generation is the interesting bit, and the rasterization is special case for small tiles.

Yep, that’s the easy way… but, for this case, I want to go the hard way :pirate:

On the other hand I want to write a custom rasterizer to control better the AA ( mitchell,sinc,etc )… because I’m not sure if the samples can be controlled as I need in OpenGL 3.2

@SPWorley: Oh I forgot Renderants, yep!

The depth peeling/A-buffer/multilayer technique is very interesting indeed!

John Owens’ PhD thesis might be of interest. It’s called “Computer Graphics on a Stream Processor.” Based on my trivial skimming of it, it seems like he describes a rasterization pipeline on an architecture similar in spirit to CUDA.

Well, i wrote a triangle rasterizer in CUDA just like the one you say, and I discovered that if you are performance-oriented on a traditional scene, the one and only way to do it is to assign one triangle per thread, and the just forward-rasterize it enumerating the pixels.

While this is really slow (5 milliseconds for 70.000 polys on a 8600GT) and leads to tons of divergence, the other approaches have:
-many more memory accesses (they are evil)
-many times the number of threads, for example if you want to have a thread per pixel and then enumerate the triangles enclosing that pixel.
-heavy serialization due to atomization, if you want to use something similar to REYES: there the bottleneck is the phase of scan+generation of micropolys, where you have to check the area of each poly to assign threads to its fragments.

And generally turn out to be even slower. Anyway i expect that to change with Fermi…