Recursion in Cuda 3.1

I understand that recursion is only supported on devices with compute 2.0 architecture. However, I was wondering how exactly it is implemented, as aren’t GPU cards “stackless”, so to speak? Are there any performance hits that come with with using a recursive function?


See this thread for some speculation as to how it may be implementation and what the overheads could be. I don’t think that anyone has benchmarked this in detail yet.