cudaLaunchCooperativeKernel and default stream semantics

On the documentation page for cudaLaunchCooperativeKernel, there is a statement in the Notes section that says:

Does this mean:

A. if the cooperative kernel is launched on default stream, it will follow that semantics. Or
B. if the cooperative kernel is launched on any stream, it will block for any previously issued work to complete and block any work issued after to the same device?

It should behave like any other kernel launch.

If you specify a created stream, then it should obey the ordering expectations for that stream.

If you don’t (i.e. you specify the NULL stream) then it should follow the default stream semantics, whose behavior is documented on the page you linked. The behavior varies based on whether or not the legacy default stream behavior has been overridden.

So that is your choice A, I believe.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.