I want to call another kernel after a kernel has finished executing. I used cudaStreamAddCallback, but I can’t call any cuda API in this callback function. How should I get the “kernel execution end” event in time?
If it is just “call another kernel after a kernel has finished executing”, then:
Kernel_1 <<< Num_Blocks, Num_Threads >>> ();
Kernel_2 <<< Num_Blocks, Num_Threads >>> ();
...
Kernel_N <<< Num_Blocks, Num_Threads >>> ();
cudaDeviceSynchronize();