Accessing cudaLaunchCooperativeKernel api from python (pycuda, cupy, etc..?)

eSkape · September 12, 2019, 9:36am

To date I’ve run most of my Cuda kernels from Pycuda, but now have need to run coopoerative groups to sync grids, which requires to use the cudaLaunchCooperativeKernel api.

Unfortunately PyCuda would seem to be unable, and not have a future in this regard.

Is there another route I can take on this, and keep my host code in python?

striker159 · September 12, 2019, 10:43am

Kernels which are executed in the same cuda stream are serialized.
You could therefore split your kernel into two kernels and launch them one after another.

i.e. instead of

__global__ void kernel(){
   //part A
   gridsync();
   //part B
}
...
cudaLaunchCooperativeKernel(kernel)

you can do

__global__ void kernelA(){
   //part A
}

__global__ void kernelB(){
   //part B
}
...
cudaLaunchKernel(kernelA)
cudaLaunchKernel(kernelB)

Robert_Crovella · September 12, 2019, 11:41am

You can do pretty much any combination of python and CUDA using python ctypes. There are various examples, including showing kernel launches, although probably not any that show a cooperative kernel launch.

eSkape · September 13, 2019, 3:54am

Thank you Both!