In order to test gang and worker clause, these is a small program.
#include<stdio.h>
#include<stdlib.h>
#define N 1000
#define M 1000
int main()
{
int *A;
A=(int *)malloc(N*M*sizeof(int));
for(int i=0;i<N*M;i++){
A[i]=-1;
}
#pragma acc kernels loop gang(100),worker(128)
for(int i=0;i<N*M;i++)
{
A[i]=i;
}
for(int i=0;i<10;i++)
printf("A=%d\n",A[i]);
return 0;
}
Under the linux os , compile information :
[wcj@localhost example]$ pgcc -acc -Minfo gang.c
NOTE: your trial license will expire in 8 days, 13.7 hours.
main:
12, Memory set idiom, loop replaced by call to __c_mset4
17, Generating present_or_copyout(A[0:1000000])
Generating NVIDIA code
Generating compute capability 1.0 binary
Generating compute capability 2.0 binary
Generating compute capability 3.0 binary
18, Loop is parallelizable
Accelerator kernel generated
18, #pragma acc loop gang(100), vector(128) /* blockIdx.x threadIdx.x */
And the execution information:
[wcj@localhost example]$ ./a.out
A=0
A=1
A=2
A=3
A=4
A=5
A=6
A=7
A=8
A=9
Accelerator Kernel Timing data
/home/wcj/Yunio/openacc/example/gang.c
main NVIDIA devicenum=0
time(us): 703
18: kernel launched 1 times
grid: [7813] block: [128]
device time(us): total=80 max=80 min=80 avg=80
elapsed time(us): total=96 max=96 min=96 avg=96
27: data copyout reached 1 times
device time(us): total=623 max=623 min=623 avg=623
My question is that why grid number is not equal to the number that set the value in gang clause.
And I test the program under window OS using the PGI workstation. From the execution information, I know grid number is equal to the number that set the value in gang clause.
why the result of grid number are not the same under different system. Maybe it is the bug of the linux version of PGI compiler?
[/list][/code][/quote]