Recursive non-kernel functions

I was trying to do some tree processing yesterday but I didn’t succeed. This is what I got:

[codebox]Error: Not-inlined non-intrinsic function calls not supported yet : __dot_4f[/codebox]

I suspect it might have something to do with recursive functions not being supported on CUDA OpenCL yet. So my question is simple. When?

Or have I hit something totally different?

Running Win7 with 196.21.

Thanks.

I am sorry to report that the ‘yet’ in that message looks like never. My last printed version of the spec is getting old but section 6.8 Restrictions, point ‘i’ states that Recursion is not supported. Check out the latest version on-line to confirm this is still the case.

Further research revealed that this message is not related to recursion.

But esentially you’re right. I had the (wrong) impression that recursion restrictions apply only to kernels.

Damn!

Thanks for the hint.

I get this when a function calls another function. I don’t know why there is this crazy restriction External Media

It is actually possible to pile up function calls, as long as youre not attempting recursion. Just make sure your code is an absolute syntatic/semantic marvel (easier said than done) and the compiler will stop nagging.

Give us some tools, nVidia!

I’m curious as how nested function calling works. Does the GPU have a stack? IIRC not, though I think I remember Gregory Diamos explained how it could be implemented once (or was it someone else?) - it used local memory if I’m not mistaken. That means arguments to device functions must be passed through registers which are finite in numbers so you’re likely to run out quickly with nested calls. And this means no arbitrary depth recursion, except for maybe tail recursion if the compiler is smart enough to convert it to a loop.

Doesn’t sound very plausible. If this were the case you might hit the bottom even with a single subcall with lots of arguments.

Wouldn’t it be easier to just tell the compiler to inline/unroll all the functions? I mean, if the hw architecture is hairy, this looks like the path of the least resistance.

Something like this could also explain why recursive functions are not available.

Of course, fake recursion with home baked stacks is possible (finished one just this morning), though you’ll have a heck of a time debugging it External Media

The compiler inlines functions by default, yes.

When your main function uses loads of variables, registers spill to local mem, this might be what happens with nested function calls as well. If you implemented a stack over lm you could, in principle, do arbitrary depth recursion (well, until you get lm overflow :) ). This would require dynamic allocation (or at least pseudo-dynamic, stack like, with a fixed-length indexed local array). The compiler isn’t smart enough to do this (yet?) and without tail recursion optimization it could cause a nasty performance hit.