Streams and Kernel Execution Order

I have a question about the order of kernel calls when using streams. For the following code

for (i = 0; i < 10; i++) {
kernel_a<<<1,1,0,stream[i]>>>(i);
kernel_b<<<1,1,0,stream[i]>>>(i);
}

will the kernel execution order be guaranteed (i.e. as executed on the GPU) to be

kernel_a<<<1,1,0,stream[1]>>>(1)
kernel_b<<<1,1,0,stream[1]>>>(1)
kernel_a<<<1,1,0,stream[2]>>>(2)
kernel_b<<<1,1,0,stream[2]>>>(2)
kernel_a<<<1,1,0,stream[3]>>>(3)
kernel_b<<<1,1,0,stream[3]>>>(3)
etc.
kernel_a<<<1,1,0,stream(9)>>>(9)
kernel_b<<<1,1,0,stream(9)>>>(9)

So that’s one form of my question. More generally, I’m really trying to see the asynchronous API provides guarantees about kernel execution order (on the GPU). (in particular: is the order of kernel calls deterministic?)

I have code which uses streams, and I have found that empirically it produces the same results each run, and order does matter, so the kernel order seems deterministic. I’m trying to see if it is a fluke due to the kernel scheduling or the behavior of my code, or whether it’s because the driver enforces a certain deterministic order of kernel execution.

Thanks for any help.

kernel ordering is deterministic only within each stream. The driver is free to reorder kernels and/or run them in parallel if they are from separate streams.

kernel ordering is deterministic only within each stream. The driver is free to reorder kernels and/or run them in parallel if they are from separate streams.