acc wait segmentation fault

Hi,

I want to start multiple kernels asynchronously with different streams in a for loop:

for (i = 0; i < streams; i++) {
	#pragma acc parallel loop private(i) async(10+i)
	...
}

But when I wait for the kernels as follows, I get a segmentation fault (for streams >= 5):

for (i = 0; i < streams; i++) {
	#pragma acc wait(10+i)
}

If I do one of the following, it works fine:

  1. Inserting nanosleep(1) after the wait statement in the loop
  2. Using only one #pragma acc wait (without arguments) instead of the wait-loop
  3. Manually calling the wait statements:
#pragma acc wait(10)
#pragma acc wait(11)
...
  1. Manually unroll the wait-loop:
i = 0;
#pragma acc wait(10 + i)
i++;
#pragma acc wait(10 + i)
i++;
...

Thanks,
Fabian

Hi Fabian,

Thanks for the report. I was able to recreate the problem and have sent a report (TPR#21434) to engineering for further investigation.

The error only occurs when optimizing the “i” variable. Hence, you should be able to add the “volatile” attribute to the declaration of “i” to stop the compiler from optimizing it.

Best Regards,
Mat

Dear Mat,
Thanks for the workaround. We will test it.
Sandra

Hi Mat,

The workaround works.

Thanks,
Fabian

TPR 21434 - UF: Using a loop index variable in a loop of “wait” directives causes a segv at -O2

This problem now compiles in the 15.7 release.

thanks,
dave