openacc error !don't parallel independent loop

simoliok · September 13, 2015, 3:56pm

Hi
I read this code in c++ :

int curLevel = 1;
#pragma acc data copyin(dataElement[0:nEdge] , graphIndex[0:noOfNodes]) , create(visited[0:noOfNodes]),create(mask[0:noOfNodes]) ,create(parentDevice[0:noOfNodes]),create(distanceDevice[0:noOfNodes]),create(curIndexDevice[0:noOfNodes]) 
	{
			//start data region
#pragma acc parallel loop independent 
			for (int i = 0; i < noOfNodes; i++)
			{
				visited[i] = false;
				mask[i] = false;
				parentDevice[i] = -2;
				distanceDevice[i] = 0;
				curIndexDevice[i] = 0;
			}
			//start bfs set clock
			visited[source] = true;
			distanceDevice[source] = 0;
			parentDevice[source] = -1;
			//int loopIndex = graphIndex[source + 1] - [graph]
#pragma acc update device(visited[source:1],parentDevice[source:1],distanceDevice[source:1])
			/*find neighbours of source and set to new level work set and set curIndex to 1 show level 1*/
			for (int i = graphIndex[source]; i < graphIndex[source+1]; i++)
			{
				visited[i] = true;
				mask[i] = true;
				distance[i] = 1;
				parent[i] = source;
				curIndexDevice[i]=1;
				//nodesArray[i].curIndex = 1;
			}
			int i, j;			
			do{
				stop = false;
               //#pragma acc update device(curLevel)
			#pragma acc update device(stop)
#pragma acc parallel loop independent  copy(stop , curLevel) //,lastprivate(stop)
				for (i = 0; i < noOfNodes; i++)
				{					
					if (mask[i] && curLevel == curIndexDevice[i]){//if mask node true and cureIndex equal with current level cand be find neighbours
						//#pragma acc parallel loop
						for (j = graphIndex[i]; j < graphIndex[i + 1]; j++){
						//for (j = 0; j < 128; j++){
							//int index = nodesArray[i].neighbours[j];
							if (!visited[dataElement[j]]){//if visited=true then before find neighbours	
								mask[dataElement[j]] = true;
								visited[dataElement[j]] = true;//go to true
								parent[dataElement[j]] = i;
								stop = true;
								curIndexDevice[dataElement[j]] = curLevel + 1;
								distance[dataElement[j]] = distance[i] + 1;
								//curIndexDevice[j] = 2;
							}// end  if 
						}//end for j
					}//end if
				}//end parallel for
				//#pragma acc update device(curLevel)
				curLevel++;
				#pragma acc update device(curLevel)
				#pragma acc update device(stop)
				//#pragma acc update host(stop)

			} while (stop);
		}//end data region
	//end bfs pause time

And i use this command in pgi 64 windows :
pgcpp -acc -Minfo newbfs.cpp

and in this link i gave this result from this command :

MatColgrove · September 14, 2015, 5:01pm

Hi simoliok,

The compiler feedback messages are indicating that the “i” loop is getting parallelized but when the compiler attempts to auto-parallelize the inner “j” loop it can’t due to the loop dependencies on the inner loop arrays. You use a look-up array to get the index into the arrays. At compile time, the compiler has no way of knowing if all the values used in the look-up array are different, hence must assume the worst case scenario that they are all the same and therefor can’t be parallelized.

Note that you can’t use the “update” directive for “curLevel” and “stop” since they’re device scoping is within the “i” loop. You would need to put them in the outer data region in order to use them in a “update” directive.

The “live-out” messages are correct, but I think can be ignored here. Though you might consider using an “atomic update” directive when assigning “stop”.

Please let me know if you have a specific question you’d like addressed. Also, posting a small reproducing example that I can compile is helpful.

Mat