with the OpenACC async clauses, I can execute kernels and update data asynchronously. Now, I want to use different streams (=integer expressions).
- I have heard that using integer=0 as argument for the async clause, PGI interprets it as synchronous behavior (only integer values > 0 are asynchronous). Is that true?
- With CUDA streams, it is said that if a call blocks, it blocks all other calls of the same type behind it (even in other streams) - where the call is either of type kernel or memcopy. I assume that is a hardware issue. So, is it also true for OpenACC?
#pragma acc kernels async(1) ... #pragma acc update async(1) ... #pragma acc update async(2)
Issuing these operations, does it mean that the second update can only be executed after the first update? The reason would be: stream 1 is executed first since an operations was first issued here (and not in stream 2). So, the kernel execution will start, afterwards the first update. And since the second update has to wait until the first update did finish, it will be executed at last. In this case, everything actually serialized.
#pragma acc kernels async(1) ... #pragma acc update async(1) ... #pragma acc kernels async(2) ... #pragma acc update async(2)
If I understand it right, here, only kernel execution in stream 2 and update in stream 1 could be really overlapped. Correct?