The driver API spec says “Blocks until the device has completed all preceding requested tasks.”, does it mean “Blocks until the device has completed all preceding requested tasks in the streams belongs to the context.”? Or it means to wait for all tasks of all of the streams on the device?
It should wait for all tasks in the context, i.e. on streams in the context.
Thanks, epk! My question is, does it wait for tasks in other contexts on the same device? I thought it would not, but a little bird told me that it will. So I want to get the clarification here.
does it wait for tasks in other contexts on the same device?
No, it doesn’t. But - you could just write a sample program and check
Shameless self-plug: … and your program would be pretty short if you used the upcoming version 0.5 of my Modern-C++ CUDA API wrappers which supports the combined Driver + Runtime API, with RAII objects and exception-protected calls, e.g.:
auto context_1 = cuda::context::create(device);
auto context_2 = cuda::context::create(device);
auto stream_1 = context_1.create_stream(cuda::stream::nonblocking);
auto stream_2 = context_2.create_stream(cuda::stream::nonblocking);
stream_1.enqueue.kernel_launch(my_kernel, my_launch_config, some_args, would_go_here);
stream_2.enqueue.kernel_launch(my_kernel, my_launch_config, different_args, perhaps);
// no need for destroying any of the contexts or streams here, just exit the scope or function.
etc. That’s not a full program of course, but it’s the gist of what you could write to check.
and has exception triggering like