Compilation of device lambdas are still disabled by default in Cuda 9.2. But how production worthy are they now? Are there any specific issues to be aware of that mean they shouldn’t be used extensively in production code yet?
Ok I found more details of the nature of device lambda implementation here:
"As described earlier, the CUDA compiler replaces an extended lambda expression defined in host code with an instance of a named placeholder type. The placeholder type for an extended host device lambda invokes the orignal lambda’s operator() with an indirect function call 21.
The presence of the indirect function call may cause an extended host device lambda to be less optimized by the host compiler than lambdas that are implicitly or explicitly host only. In the latter case, the host compiler can easily inline the body of the lambda into the calling context. But in case of an extended host device lambda, the host compiler encounters the indirect function call and may not be able to easily inline the original host device lambda body."
But I am curious in the case of a non-capturing lambda in Cuda 9.2 is it a pretty safe bet to assume it will now be inlined completely with no performance loss?