I understand that recursion is only supported on devices with compute 2.0 architecture. However, I was wondering how exactly it is implemented, as aren’t GPU cards “stackless”, so to speak? Are there any performance hits that come with with using a recursive function?
Thanks!