Update cluase dosen't work

Hi everybody.
I want to write breadth first search with openacc.
Everything is right to me but this line

#pragma acc update host(stop[0])
#pragma acc update device(curLevel[0])

dosen’t work , update cluase dosen’t work!!!
anyone can help me!?!

//Global include and variables
/*Openacc test */
#define _CRT_SECURE_NO_DEPRECATE
#include <ctime>
#include <iostream>
#include <fstream>
#include <stdio.h>
#include <queue>
#include <set>
#include <string>
#include <omp.h>
#include <vector>
#include <map>
using namespace std;

int main()
{
	cout << "\n ****  Start prog Test functionality of OPENACC *** \n" << endl;
//	//define variables
	int * restrict graphIndex;
	int * restrict data;
	int * restrict dist;
	int * restrict parent;
	int * restrict curIndex;
	bool * restrict vis;
	bool * restrict mask;
	int * restrict curLevel;
	bool * restrict stop;
	int sizeOne = 1;

	int noOfNodes, noOfEdges, numOfNeighbour;
//	FILE *in_file = fopen("test__Meta_undirected.txt", "r");
	FILE *in_file = fopen("fMeta.txt", "r");
	fscanf(in_file, "%d %d", &noOfNodes, &noOfEdges);	

	//	//initialize variables 
	data = new int[noOfEdges*2];
	dist = new int[noOfNodes];
	parent = new int[noOfNodes];
	curIndex = new int[noOfNodes];
	vis = new bool[noOfNodes];
	mask = new bool[noOfNodes];
	int src, dest;

	graphIndex = new int[noOfNodes + sizeOne];		
	curLevel = new int[sizeOne];
	stop = new bool[sizeOne];

	graphIndex[0] = 0;	
	curIndex[0] = 0;
	for (int i = 1 ; i <=noOfNodes; i++)
	{
		fscanf(in_file, "%d ", &numOfNeighbour);
		graphIndex[i] = numOfNeighbour + graphIndex[i-1];
		cout << i << " : ";
		curIndex[i] = 0;
		cout << graphIndex[i] << endl;
	}
	fclose(in_file);
	

	//FILE *data_file = fopen("test__data__undirected.txt", "r");	
	FILE *data_file = fopen("fData.txt", "r");

	fscanf(data_file , "%d %d", &noOfNodes, &noOfEdges);
	int index;
	for (int i = 0; i <	noOfEdges; i++)
	{
		
		fscanf(data_file, "%d %d", &src, &dest);
		
		index = graphIndex[src] + curIndex[src];
		cout << endl << i << ": index - " << index << " : " << src << " -> " << dest << endl;

		data[index] = dest;
		curIndex[src]++;
		index = graphIndex[dest] + curIndex[dest];
		cout << endl << i << " index - " << index << " : " << dest << " -> " << src << endl;
		data[index] = src;
		curIndex[dest]++;
		//cout << endl << i << ":" << src << " , "  << dest << endl;		
	}
	
	for (int i = 0; i < noOfNodes; i++){
		dist[i] = 0;
		parent[i] = -1;
		vis[i] = false;
		mask[i] = false;
		curIndex[i] = 0;
	}


	for (int i = 0; i < noOfEdges * 2; i++)
	{
		cout << endl << "index: " << i << "data: " << data[i];

	}
	do{
		cout << endl << "please insert source node :" ;
		cin >> src;
		if (src == -1) break;
		for (int j = graphIndex[src]; j < graphIndex[src + 1]; j++)
		{


			cout << data[j] << " ";
		}
		 

	} while (1);

	cout << endl << "please insert source node :" ;
	cin >> src;
	
	for (int j = graphIndex[src]; j < graphIndex[src + 1]; j++)
	{

		
			mask[data[j]] = true;
			vis[data[j]] = true;//go to true	
			parent[data[j]] = src;
			curIndex[data[j]] =  1;
			dist[data[j]] =  1; 
			cout << endl << "data: " << data[j] ;
		}

	int j;	
	curLevel[0] = 1;
	stop[0] = false;;
//	//openACC Try	
//	//start data region	
	int counter = 0;
#pragma acc data region copyin(data[0:noOfEdges], curLevel[0:1] , stop[0:1] , curIndex[0:noOfNodes] , mask[0:noOfNodes] ,  vis[0:noOfNodes] , graphIndex[0:noOfNodes]) , copy( dist[0:noOfNodes] , parent[0:noOfNodes]  ) 	
	{
//
		do{
			counter++;
			stop[0] = false;
#pragma acc update device(stop[0])
#pragma acc update device(curLevel[0])
#pragma acc kernels 
			{
#pragma acc for independent 
				for (int i = 0; i < noOfNodes; i++)
				{
					
					if (mask[i]==true && curIndex[i] == curLevel[0])
					{
#pragma acc for independent 
						for ( j = graphIndex[i] ; j < graphIndex[i + 1]; j++)
						{
							
							if (vis[data[j]] == false) {
								vis[data[j]] = true;//go to true	
								mask[data[j]] = true;								
								parent[data[j]] = i;
								stop[0] = true;								
								curIndex[data[j]] = curLevel[0] + 1;
								dist[data[j]] = dist[i] + 1;															
							}
						}
					}
				}
			}//end acc kenels
			
			curLevel[0] = curLevel[0] + 1;
			


#if defined(_OPENACC) 
#pragma acc update host(stop[0])
#pragma acc update device(curLevel[0])
#endif 
			cout << endl << stop[0] << "----------------------" << curLevel[0] << "----------------------";
//
		} while (counter==8);
		//while (stop[0]);

	} //end data region

	for (int i = 0; i < noOfNodes; i++)
	{
		cout << endl << i << ": " << dist[i] << " - " << parent[i];		

	}
//
//
	return 0;
}

Hi simoliok,

Can you send your data file to PGI customer service (trs@pgroup.com) and ask them to send them to me?

I’d like to run your program to better understand what’s happening but can’t without the data files.

Thanks,
Mat

Hi Mat
This code no need to any data file

//Global include and variables
/*Openacc test */
#define _CRT_SECURE_NO_DEPRECATE
#include <ctime>
#include <iostream>
#include <fstream>
#include <stdio.h>
#include <queue>
#include <set>
#include <string>
#include <omp.h>
#include <vector>
#include <map>
using namespace std;

int main()
{
	int * restrict curLevel;
	bool * restrict stop;
	int sizeOne = 1;
	
	curLevel = new int[sizeOne];
	stop = new bool[sizeOne];

	curLevel[0] = 1;
	stop[0] = false;;
//	//openACC Try	
//	//start data region	
	int counter = 0;
#pragma acc data region copy(  curLevel[0:1]  , stop[0:1]) 	   
	{		
//
		do{
			counter++;
			stop[0] = false;

//

#pragma acc update device(stop[0])
#pragma acc update device(curLevel[0])

#pragma acc kernels 
			{
#pragma acc for independent 
				for (int i = 0; i < 128; i++)
				{
					int j;
					j = 0;
					stop[j] = true;
					curLevel[j] = curLevel[j] + 1;
					if (curLevel[j] > 40) stop[j] = false;
				}

				// acc_update_device 
			}//end acc kenels									

#pragma acc update host(  stop[0])
#pragma acc update device(curLevel[0])
			
			////
			cout << endl << "Stop" << stop[0] << " curLevel " << curLevel[0] << " counter" << counter;
			
			//} while (counter<8);
		} while (stop[0]);

	} //end data region

	return 0;
}

The Result this code on my PC is only

stop 1  curLevel 1 counter 1

Thanks A lot

Hi simoliok,

There’s a number of problems here.

#pragma acc update device(stop[0])

When using a single value, this is the length to use not the element number. Hence, you’re saying to copy 0 elements. To fix, either change this to “stop[0:1]” or “stop[1]”.

#pragma acc update device(curLevel[0])

At the end of the compute region, you are accidently updating the device with the host copy instead of the updating the host with the device copy. Change “device” to “host” here.

curLevel[j] = curLevel[j] + 1;

This is a race condition since all threads will be trying to read and write to the same variable. The value of curLevel will depend on the order and when the threads access the variable. To fix, use an atomic region. Since you want to use this thread’s value of curLevel, you’ll want to use an atomic capture.

               stop[j] = true; 
               ...
               if (curLevel[j] > 40) stop[j] = false;

Here’s another race condition. You can use atomics again to make assignment to “stop[j]” visible to all threads, but you’ll be toggling between “true” and “false” depending upon when the threads are executing. I’d remove setting stop to true and optionally add an atomic write when assigning stop to false.

Here’s my changes:

% cat test_10_28_15.cpp
//Global include and variables
 /*Openacc test */
 #define _CRT_SECURE_NO_DEPRECATE
 #include <stdio.h>
 #include <ctime>
 #include <iostream>
 #include <fstream>
 #include <stdio.h>
 #include <queue>
 #include <set>
 #include <string>
 #include <omp.h>
 #include <vector>
 #include <map>
 using namespace std;

 int main()
 {
    int * restrict curLevel;
    bool * restrict stop;
    int sizeOne = 1;

    curLevel = new int[sizeOne];
    stop = new bool[sizeOne];

    curLevel[0] = 1;
    stop[0] = false;;
 //   //openACC Try
 //   //start data region
    int counter = 0;
 #pragma acc data region copy( curLevel[0:1]  , stop[0:1])
    {
 //
       do{
          counter++;
          stop[0] = true;
          curLevel[0] = 1;

 //

 #pragma acc update device(stop[0:1])
 #pragma acc update device(curLevel[0:1])

 #pragma acc kernels
          {
 #pragma acc loop independent
             for (int i = 0; i < 128; i++)
             {
                int j;
                int cl;
                j = 0;
//                stop[j] = true;
#pragma acc atomic capture
{
                curLevel[j] = curLevel[j] + 1;
                cl = curLevel[j];
}
                if (cl > 40) {
#pragma acc atomic write
                   stop[j] = false;
                }
             }
             // acc_update_device
          }//end acc kenels

 #pragma acc update host(  stop[0:1])
 #pragma acc update host(curLevel[0:1])
          ////
          cout << endl << "Stop" << stop[0] << " curLevel " << curLevel[0] << " counter" << counter << endl;

          //} while (counter<8);
       } while (stop[0]);

    } //end data region

    return 0;
 }

% pgc++ -acc -Minfo=accel test_10_28_15.cpp -V15.9
main:
     32, Generating copy(curLevel[:1],stop[:1])
     45, Generating update device(stop[:1],curLevel[:1])
     47, Loop is parallelizable
         Accelerator kernel generated
         Generating Tesla code
         47, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
     69, Generating update host(stop[:1],curLevel[:1])
% a.out

Stop0 curLevel 129 counter1

Hope this helps,
Mat

Hi Mat
Thanks very much for your answer.
Thank you for learning me new things :) .
I have a question about this problem , Thinking about that we have a scenario like this:
We have this code

#pragma acc for independent 
            for (int i = 0; i < 128; i++) 
            { 
               int j; 
               j = 0; 
               stop[j] = true; 
               curLevel[j] = curLevel[j] + 1; 
               if (curLevel[j] > 40) stop[j] = false; 
            } 

        
#pragma acc update host(  stop[0]:1) 
#pragma acc update device(curLevel[0]:1)

And we have 4 threads on device called them (p0 , p1 , p2 , p3).
In the start of this code all of them(p0 , p1 , p2 , p3) access to stop[0] and the value of stop[0] is flase for all of them , if we think after a time that they recived to this line of code

 if(condition) stop[j] = true;

We consider that this condition is true for thread p0 and this condition is flase for threads p1 and p2 , p3.
So at end of ( #pragma for independent ) when the execution environment back to host and in this line of code

 #pragma acc update host(  stop[0]:1)

I think and i expect that stop[0] in host become to true !
******It’s like a Reduction on ( OR , stop[0] ) ******

Hi Mat
I send data files to trs@pgroup.com
But i copy data here too , you can save as them to files (fData.txt and fMeta.txt )

fData.txt is here :

24 47
0 1
0 2
0 3
0 5
0 6
0 8
1 2
1 7
1 9
2 8
3 5
4 5
4 9
4 10
5 6
5 7
6 7
6 10
7 8
7 10
8 12
9 10
9 13
10 11
10 12
10 13
12 20
13 19
13 21
13 22
14 15
14 16
14 18
15 16
15 18
16 23
17 18
17 21
17 22
18 19
18 21
19 20
19 22
20 22
20 23
21 22
21 23

And fMeta.txt is here:

24 47
6 4 3 2 3 5 4 5 4 4 7 1 3 5 3 3 3 3 5 4 4 5 5 3

In the first version you have, the problem is that you have “stop” set to both true and false in the same loop. The resulting value at the end of the loop will be non-deterministic and dependent upon the order in which the threads are executed.

If you have a single assigned to “stop” within a conditional, then yes I would agree that it’s like having an OR operation.

Note that on NVIDIA GPUs, you should be thinking in terms of thousands if not millions of threads. Also, the syntax is “stop[0:1]” or “stop[:1]” where starting element is assumed to be 0.

PGI customer service sent me your data files. Though I think it would be beneficial for you to go back to your original code and see if you can correct the issues. Post your update and I can then help with any additional issues.

  • Mat

Hi Mat
I know about a millions of threads in GPU!
Are you test my code with data files!!!
I want to implement Breadth First Search with OpenAcc with these 2 data files!!!
My code in sequential and Openmp works fine And i now want to write BFS in Openacc!

Anyone can help me?!

Are you test my code with data files!!!

Given your initial version had issues, I did not go back and test your original code. Given the advice on this thread, I think it would be beneficial for you to go back and see if you can get it working. If you still have problems, then post your updated code and we can investigate how to fix them.

  • Mat

Hi Mat
Can i send to you a pdf file that show what am i doing in this code , And show hat what’s wrong about the result?!
Can you give me your email or i send it to ( trs@pgroup.com )!
Thanks Mat

Sure. You can send it to TRS and they can forward it to me.

  • Mat