Lets assum that I am exexuting a GEMM kernel on stream1 and at the same time I want to execute another kernel on stream2.
I am seeing if the second kernel is light I have chance to run it in parallel, but is it possible to launch second kernel with “cudaLaunchCooperativeKernel” and run them in parallel? is it allowed ?
you can launch kernels in whatever order you would like. Whether they run in “in parallel” is a function of you satisfying the requirements for concurrent kernels (this topic is widely covered and there is a cuda sample code) and what else is going on on the device. It is not a decision you make, to run them in parallel, or not.