If not, is it due to a hardware or CUDA limitation?

I was hoping to implement a ray tracer and attempt to parallelize it across an SLI system. The algorithm becomes recursive when its time to perform reflection, refraction, etc. Lacking recursion would be a major problem, to my knowledge.

No, recursion is not supported and it is mainly a hardware limitation. Implementing a stack, that could get very deep, would have to exist in local memory, which is very slow, rather than registers. CUDA gives the programmer full flexibility to control the use of shared memory. Perhaps, you could implement a recursive stack on your own, utilizing the shared memory of the multiprocessor. Or you could call the same kernel with different data repeatedly on the host side.

Just a dummy guess, provided the compiler does not have the notion of function so that everything is inlined, it seems hard to have recursion ? (there is indeed no stack as we are used to)

Perhaps this changed since cuda 1.x, please correct me if i’m wrong …

But in theory, can’t every recursive function be rewritten into a non recursive one ? (that’s really an algorithmic question External Image

This is certainly true, though some recursive algorithms will need a stack-like data structure in the non-recursive form. For example, a parser which recognizes strings described by a context-free grammar requires a stack.

You might be thinking about tail-recursive algorithms. These sorts of algorithms can be converted directly to a loop without any stack.

This is the particular ray tracing algorithm at its core. It’s pretty easy to follow, but I’ve never tried to go from recursion to a stack.

color spawnRay(ray, depth) {
// Code to determine if ray hit an object
// ...
// If ray didn't intersect anything, just return background color
if (!intersection) {
return background_color;
// There has been an intersection
} else {
// MAX_DEPTH sets a limit to the recursion depth (10 is fine)
if (depth < MAX_DEPTH) {
//Spawn reflection ray if intersected object hit is reflective
if (kr > 0) {
retcolor += spawnRay(reflect_ray, depth+1);
}
//Spawn transmission ray if intersected object is transparent
if (kt > 0) {
retcolor += spawnRay(trans_ray, depth+1);
}
}
}
return retcolor;
}

from the programming theory any recursion can be replaced with a tree-structured data processing. Imagine that rays from a tree. So go by iterations, cast all known rays. Deterimine the new rays that are to be computed, by putting them into the nodes of the tree. Cast again. Repeat until you’re done. After your tree is ready, process all nodes to integrate information and form final screen pixel values.

I implemented a Chess engine, which normally uses a lot of recursion, google for alpha-beta algorithm. I simulate recursion with implementing my own stack in shared memory and a statemachine. GpuChess

i just finished a cuda raytracer for a master’s project, and doing it iteratively wasn’t that bad. i just maintain two arrays: a ‘current’ array and ‘new’ array. i loop through the current array, throwing any reflective or refractive rays into the ‘new’ array. once the loop is done i copy the ‘new’ to ‘current’ and start over. i’ll be putting my code and report online in about a week.

Very nice, I think a lot of people can base their work on this. I would like to have it when starting my raytrace-work ;)

Just a remark, if you are only using triangles (which it looked like), you can use a faster intersection algorithm than counting the number of times you cross a line of a polygon. It uses barycentric coordinates.

I’m confused - can someone help to explain cronus98’s code and how it works? I’ve tried adapting his code to work within my raytracer but I’m having trouble - It works for reflection and everything apart from refraction. I think this is because I need to restructure the colouring of transparent objects and mine colouring code looks different from his but I am not too familiar with raytracing and how to fix it, can anyone help by looking at my code below?