PGI 10.0 on Windows XP (Accelerator)

Hi,

I’ve been experimenting with the trial version of PGI 10.0 for Windows (the Accelerator in specific), and I am getting weird responses!

#include <stdio.h>
#include<math.h>
#include <stdlib.h>
#include<time.h>

int main(){

printf("trial3dsubsetDiffSizes\n");


int a[100][99][89];

for (int k=0;k<89;k++)
for (int j=0;j<99;j++)
for (int i=0;i<100;i++)
a[i][j][k]=i+j+k;



for (int k=0;k<89;k++)
for (int j=0;j<99;j++)
for (int i=0;i<100;i++)
printf("%d\n",a[i][j][k]);


#pragma acc region
{
for (int k=5;k<60;k++)
for (int j=3;j<70;j++)
for (int i=50;i<99;i++)
a[i][j][k]*=5;

}


for (int k=0;k<89;k++)
for (int j=0;j<99;j++)
for (int i=0;i<100;i++)
printf("%d\n",a[i][j][k]);

printf("finished\n");

return 0;
}

It compiles but I get no output at all:

PGI$ pgcc -ta=nvidia,time,keepgpu -Minfo=all,accel trial3DsubsetDiffSizes.c
NOTE: your trial license will expire in 7 days, 13.1 hours.
main:
26, Generating copy(a[50:98][3:69][5:59])
28, Loop is parallelizable
Accelerator kernel generated
28, #pragma acc for parallel, vector(55)
29, Loop is parallelizable
30, Loop is parallelizable
PGI$ trial3DsubsetDiffSizes.exe
PGI$

on the other hand, a similar program when compiled with -ta=nvidia,time -Minfo=accel doesn’t print any info, but works correctly, and doesn’t print timing info as well.

The main idea of my program is similar to the code above, I need to accelerate a 3-level-deep loop around a 3D array, or a 1D array using macros to calculate the 3D index, what’s the best way to do it using the accelerator?

note: the actual program is using dynamically allocated arrays, not statically allocated like in this example

Thanks

Hi ghandurah,

Since I’m out of the office for the next two weeks and the Windows requires a user be logged into the console in order to run on a GPU, I’m not able to recreate your error. Though, the code does seem to run correctly on Linux.

Does the code print the values if you compile without “-ta”? How about with just “-ta=nvidia”?

As for the dynamic arrays, you may need to use “-Msafeptr” or add the C99 restrict keyword to each of your pointers. Without this, the compiler must presume that your pointers could overlap and cannot generate accelerator code.

  • Mat

Does the code print the values if you compile without “-ta”? How about with just “-ta=nvidia”?

Nothing at all.

Hi ghandurah,

It looks like your program is seg faulting because ‘a’ is too large. Try reducing the size of a to less than 250,000 elements.

  • Mat

Thanks a lot mkcolg, I declared it as a dynamic array and it worked correctly.

Sorry but I have two more questions,

I need to compute some values then store them as constants because they’ll be used to declare arrays (they represent the dimensions) and as array subscripts in other places, I declared them before the main function as follows:

const int X=20;
const int Y=10; 

const int I=2.5*X+2*Y;

//J same way
//K same way

#define A(i,j,k) A[(i)*((J+2)*(K+2))+(j)*(K+2)+(k)]
// other arrays same way

it gives me this error:
PGC-S-0074-Non-constant expression in initializer ==> pointing to const int I=2.5X+2Y;

So I modified it to:

const int X=20;
const int Y=10; 

int Ii=2.5*X+2*Y;

const int I=Ii;

//J same way
//K same way

#define A(i,j,k) A[(i)*((J+2)*(K+2))+(j)*(K+2)+(k)]
// other arrays same way

still the same error,

I used #define:

#define X 20
#define  Y 10 

#define  I (2.5*X+2*Y) 
//J same way
//K same way	
	 
#define  X1 (X+1)   
#define   X2 (X+2)

#define A(i,j,k) B[(i)*((J+2)*(K+2))+(j)*(K+2)+(k)]
// other arrays same way

I got this:

PGC-W-0046-Non-integral array subscript is cast to int (Acc: 1158)
PGC-W-0046-Non-integral array subscript is cast to int (Acc: 1162)
main:
461, Accelerator region ignored
464, Accelerator restriction: size of the GPU copy of an array depends on values computed in this loop
465, Accelerator restriction: size of the GPU copy of ‘A’ is unknown
Accelerator restriction: size of the GPU copy of ‘B’ is unknown
Accelerator restriction: one or more arrays have unknown size

the rest of code after definitions

inline void init3Darray (float *arr,int a, int b, int c, float val){
	int index;
	for (int i=1; i<a;i++)
		for (int j=1; j<b;j++)
			for (int k=1; k<c;k++){
				index=i*b*c+j*c+k;
				arr[index]=val;
			}

}



int main(){



float *A=(float *)malloc((I+1)*(J+2)*(K+2)* sizeof (*A) );	init3Darray(A,I+1,J+2,K+2,0.0);

//B same way


#pragma acc region
{
//all those constants (P1, KP, etc) are declared the same way  as I , J , K   
for ( int k=P1; k<=KP;k++) 
	for ( int j=P1; j<=JP;j++)
		for ( int  i=2;i<=I;i++)
            	A(i,j,k)=B(i,j,k)*A(i,j,k); 
}





return 0;
}

Is it better to use “acc region” or “acc for” for loops like the one above? any recommendations? given that the arrays will actually be much larger in size.

Second question:
I need to use cutil_inline, when I use it like this:
#include <cutil_inline.h>

the compiler doesn’t recognize it, so I copied the cutil_inline.h file to the same directory and used #include “cutil_inline.h”, it did recognize it but didn’t recognize the other libraries referenced in it.

Thanks

Hi ghandurah,

PGC-W-0046-Non-integral array subscript is cast to int (Acc: 1158)
PGC-W-0046-Non-integral array subscript is cast to int (Acc: 1162)

These warnings are most likely for “I”. Since I uses a “2.5”, the value will be a float. These warnings are letting you know that the value is being cast to an int in order to be used as a subscript.

464, Accelerator restriction: size of the GPU copy of an array depends on values computed in this loop
465, Accelerator restriction: size of the GPU copy of ‘A’ is unknown
Accelerator restriction: size of the GPU copy of ‘B’ is unknown
Accelerator restriction: one or more arrays have unknown size

You need to use the copy clauses to tell the compiler how much of A and B to copy over to the GPU. Something like:

#pragma acc region copy(A[0:AMAX]), copyin(B[0:BMAX])



Is it better to use “acc region” or “acc for” for loops like the one above?

You must use an acc region to define the accelerator region. The acc for is optional but can be used to adjust the GPU schedule.

any recommendations? given that the arrays will actually be much larger in size.

Just make sure that your GPU has enough memory. If not, then you’ll need to add an outer strip mine loop (i.e. send over portions of the array a time).

but didn’t recognize the other libraries referenced in it

This isn’t a library we ship, hence, you add “-L/path/to/library” on your link line so the linker can find it.

I should note that don’t yet support calling CUDA functions from within an accelerator region.

  • Mat

Thank you :)