I am trying to template the function multiBlock Reduce that uses cooperative groups that is included in SDK 11. I noticed that this CUDA sample just does a reduction for float arrays and I want to generalize it for double as well using a template. My code compiles, however when I run it I get this error: Kernel execution failed : (801) operation not supported.
Another thing that I noticed is that to use cooperative groups kernels the cudaLaunchCooperativeKernel must be used. It is possible to pass a template kernel to that wrapper since it receives a pointer to the kernel?