Running kernels concurrently on parallel streams/same start time

I am attempting to run 2 kernels on parallel streams to have the same start time.

PT2_x<<<gride, blocke, 0, stream1>>>(kvx, d_groundedratio, d_areas, d_index, d_alpha2, d_vx, d_isice, nbe);
cudaStreamSynchronize(stream1);

PT2_y<<<gride, blocke, 0, stream2>>>(kvy, d_groundedratio, d_areas, d_index, d_alpha2, d_vy, d_isice, nbe);
cudaStreamSynchronize(stream2);

In my NSIGHT systems report, I observed that the PT2_y starts towards the end of PT2_x. How can I get the two kernels running on parallel streams to start at the same time?

How can I get the two kernels running on parallel streams to start at the same time?

By making sure that it is possible. This comes down to resource utilization, and the question has been asked many times on many forums.

For example, launch both kernels with one block of one thread. make sure the kernels run for a long time (e.g. 1 millisecond or longer). Then you will see them start at the same time/overlap.

But as you start increasing the resource requirements for each kernel, eventually you will get to the point where they cannot coexist; the GPU does not have infinite capacity. So the GPU may choose to run one, then the other.

Just like you would do if you were overloaded. You would not mow the lawn and wash dishes at the same time. You would do one, then the other. You can wash dishes and you can mow the lawn, but you don’t have the resources to do them both at the same time.

If you would like to see kernels running at the same time, try running the concurrentKernels CUDA sample code.

1 Like