I have a reduce kernel which using cooperative groups (Contains cg::sync(grid), cg::sync(cta), grid.thread_rank(), etc). When the reduce kernel launched from host, it works correctly. When the reduce kernel launched(as child kernel) from another kernel(parent kernel), I got unspecified launch failure error. If I get rid of cg::sync(grid) from child kernel, there’s no error.
So, my question is: Can I use cooperative_groups::sync(grid) in child kernel (CUDA dynamic parallelism) ?
No, that’s not possible.
Grid synchronization requires the kernel to be launched with cudaLaunchCooperativeKernel
. But cudaLaunchCooperativeKernel
cannot be used in device code