I have a question about the order of kernel calls when using streams. For the following code

for (i = 0; i < 10; i++) {

kernel_a<<<1,1,0,stream[i]>>>(i);

kernel_b<<<1,1,0,stream[i]>>>(i);

}

will the kernel execution order be guaranteed (i.e. as executed on the GPU) to be

kernel_a<<<1,1,0,stream[1]>>>(1)

kernel_b<<<1,1,0,stream[1]>>>(1)

kernel_a<<<1,1,0,stream[2]>>>(2)

kernel_b<<<1,1,0,stream[2]>>>(2)

kernel_a<<<1,1,0,stream[3]>>>(3)

kernel_b<<<1,1,0,stream[3]>>>(3)

etc.

kernel_a<<<1,1,0,stream(9)>>>(9)

kernel_b<<<1,1,0,stream(9)>>>(9)

So that’s one form of my question. More generally, I’m really trying to see the asynchronous API provides guarantees about kernel execution order (on the GPU). (in particular: is the order of kernel calls deterministic?)

I have code which uses streams, and I have found that empirically it produces the same results each run, and order does matter, so the kernel order seems deterministic. I’m trying to see if it is a fluke due to the kernel scheduling or the behavior of my code, or whether it’s because the driver enforces a certain deterministic order of kernel execution.

Thanks for any help.