Hi,
When I compile my program with PGI 11.9 compiler, I got the following errors:
Compiling /home/rengan/clever/clever_pgi/sources/funcs.c
clever_process:
94, Memory copy idiom, loop replaced by call to __c_mcopy4
getNeighbors:
177, Memory copy idiom, loop replaced by call to __c_mcopy4
PGC-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Unsupported local variable (/home/rengan/clever/clever_pgi/sources/funcs.c: 218)
findRegions:
221, Loop is parallelizable
Accelerator scalar kernel generated
Accelerator kernel generated
221, #pragma acc for parallel, vector(256) /* blockIdx.x threadIdx.x */
225, Loop carried scalar dependence for ‘bestRep’ at line 233
Scalar last value needed after loop for ‘bestRep’ at line 239
Scalar last value needed after loop for ‘bestRep’ at line 242
Loop carried scalar dependence for ‘bestDist’ at line 233
Accelerator restriction: scalar variable live-out from loop: bestRep
Inner sequential loop scheduled on accelerator
PGC/x86-64 Linux 11.9-0: compilation completed with warnings
the code around line 94 is:
currentReps = (int*)malloc(sizeof(int)*512);
94: for(j=0; j<512; j++)
currentReps[j] = bestReps[j];
and the code around line 177 is:
if(delRep < tRepSize - 1)
{
177: for(j = delRep; j < tRepSize-1; j++)
{
nebRepIDs[nebID*512+j] = nebRepIDs[nebID*512+j+1];
}
}
They are just usual for loops. And there are no such errors when I compile the serial code which has no acc directives.
The findRegions() function is :
void findRegions(struct dataPt *datasetBegin, int dataSetSize, int* neighboringTable, int* h_solutionSize, int i, clusterID_type* clusterIDs)
{
int i_dataPoint, i_rep;
struct dataPt *pt1, *pt2;
218: #pragma acc region copyin(datasetBegin[0:dataSetSize-1], neighboringTable[0:4096*512-1], h_solutionSize)
{
#pragma acc for parallel
for(i_dataPoint = 0; i_dataPoint < dataSetSize; i_dataPoint++)
{
int bestRep = -1;
float bestDist = -1;
for(i_rep = 0; i_rep<h_solutionSize[i]; i_rep++)
{
pt1 = datasetBegin + i_dataPoint;
pt2 = datasetBegin + neighboringTable[512*i+i_rep];
float dx = pt1->x - pt2->x;
float dy = pt1->y - pt2->y;
float d = (dx*dx)+(dy*dy);
if(d<bestDist || bestRep==-1)
{
bestRep = i_rep;
bestDist = d;
}
}
clusterIDs[i_dataPoint] = bestRep;
struct dataPt *tmpPt;
tmpPt = datasetBegin + i_dataPoint;
tmpPt->clusterID = bestRep;
}
}
}
The definitions of struct dataPt and cluster_IDs are as follows:
typedef unsigned char clusterID_type;
struct dataPt {
float x, y;
int z;
clusterID_type clusterID;
};
clusterID_type* cluster_IDs;
Here, does the unsupported local variable mean the variable “datasetBegin” which is a pointer of struct dataPt?
Some compilation flags in Makefile are:
CC = pgcc
CFLAGS = -ta=nvidia,time -Minfo -Msafeptr
OPT = -O3
LD = pgcc
LDFLAGS = -ta=nvidia,time -Minfo -Msafeptr
Thanks for your help.