Hi,
I want to start multiple kernels asynchronously with different streams in a for loop:
for (i = 0; i < streams; i++) {
#pragma acc parallel loop private(i) async(10+i)
...
}
But when I wait for the kernels as follows, I get a segmentation fault (for streams >= 5):
for (i = 0; i < streams; i++) {
#pragma acc wait(10+i)
}
If I do one of the following, it works fine:
- Inserting nanosleep(1) after the wait statement in the loop
- Using only one #pragma acc wait (without arguments) instead of the wait-loop
- Manually calling the wait statements:
#pragma acc wait(10)
#pragma acc wait(11)
...
- Manually unroll the wait-loop:
i = 0;
#pragma acc wait(10 + i)
i++;
#pragma acc wait(10 + i)
i++;
...
Thanks,
Fabian