Hi,
I’ve been experimenting with the trial version of PGI 10.0 for Windows (the Accelerator in specific), and I am getting weird responses!
#include <stdio.h>
#include<math.h>
#include <stdlib.h>
#include<time.h>
int main(){
printf("trial3dsubsetDiffSizes\n");
int a[100][99][89];
for (int k=0;k<89;k++)
for (int j=0;j<99;j++)
for (int i=0;i<100;i++)
a[i][j][k]=i+j+k;
for (int k=0;k<89;k++)
for (int j=0;j<99;j++)
for (int i=0;i<100;i++)
printf("%d\n",a[i][j][k]);
#pragma acc region
{
for (int k=5;k<60;k++)
for (int j=3;j<70;j++)
for (int i=50;i<99;i++)
a[i][j][k]*=5;
}
for (int k=0;k<89;k++)
for (int j=0;j<99;j++)
for (int i=0;i<100;i++)
printf("%d\n",a[i][j][k]);
printf("finished\n");
return 0;
}
It compiles but I get no output at all:
PGI$ pgcc -ta=nvidia,time,keepgpu -Minfo=all,accel trial3DsubsetDiffSizes.c
NOTE: your trial license will expire in 7 days, 13.1 hours.
main:
26, Generating copy(a[50:98][3:69][5:59])
28, Loop is parallelizable
Accelerator kernel generated
28, #pragma acc for parallel, vector(55)
29, Loop is parallelizable
30, Loop is parallelizable
PGI$ trial3DsubsetDiffSizes.exe
PGI$
on the other hand, a similar program when compiled with -ta=nvidia,time -Minfo=accel doesn’t print any info, but works correctly, and doesn’t print timing info as well.
The main idea of my program is similar to the code above, I need to accelerate a 3-level-deep loop around a 3D array, or a 1D array using macros to calculate the 3D index, what’s the best way to do it using the accelerator?
note: the actual program is using dynamically allocated arrays, not statically allocated like in this example
Thanks