Hello fellow raytracers! Just kidding…
Anyway, I’m another guy currently into raytracing, but for my Master’s degree (!). My experience so far, as related in the recent “GLSL vs CUDA” thread, is that GLSL appears to be faster with my current implementation.
I have ported a fragment shader that performs the entire raytracing of primary rays only from GLSL to CUDA. It uses a uniform grid as acceleration structure, since it is reasonably simple to build and traverse. Depending on the scene I can get about 8M rays/sec. The CUDA version at best is able to match that performance.
I haven’t found any use for SM4.0, only minor things like integer addressing to textures. No geometry shaders, transform feedbacks and stuff like that :).
Also, I have some experience with kd-trees on the CPU world. And I must say they’re a pain in the ***. Specially the SAH building. Too many corner cases and precision problems… But they’re at least 2x faster for raytracing than the uniform grid, for example. Never tried a BVH, but that latest paper using it in the GPU was rather interesting. Take a look at kd-tree vs BVH construction and traversal algorithms and choose the one that looks simpler :).
IMO, the easiest option is to implement acc. struct. building on the CPU and just use the GPU to perform the actual raytracing. I don’t know which option I would recommend: GLSL or CUDA. For me, a naive mapping to either API is similar in difficulty. That is, each fragment/thread traces a single ray. You can start from there and maybe experiment with packets/frustums later on.
Sorry I haven’t voted, I just have no experience with any option :(.
DenisR, you should check Vlastimil Havran’s and Ingo Wald’s thesis. They’re the best reference I’ve found about using kd-trees. Havran mainly discusses the SAH principle, while Wald is more about implementing fast construction and ray traversal (but watch out for some “hidden” complications in his work). There is also a publication called “On building SAH kd-trees and doing that in O(nlogn)”, or something like that. It’s really good to get a grip about SAH construction, since it seems to be used for BVHs as well.
Oh, on the CPU world kd-tree internal nodes hold:
. a split plane (axis + position)
. pointer to left child (right = left + 1)
. flag to indicate leaf node
While leaves hold:
. pointer to triangle references
. triangle total
Some algorithms, like MLRTA (good read!), require additional data such as leaf bounding boxes (AABB).