questions about a program

evanchong · July 31, 2015, 3:41pm

I tried to parallelize this code segment but it failed to generate the correct answer.
Basically I only want to parallelize the outmost for loop.
All the code inside the outmost loop are expected to run sequentially by each thread.
The code is functionally correct. The only problem is how to use this directive correctly.
Any ideas about this ? It should be simple parallel model to be parallelized.

#pragma acc parallel loop copyin(in_data[num_record],threads,num_record,num_feature,new_poi\
nt[num_feature],k,z0,z1) copyout(rst[threads*k]) private(in_data_copy)
    for (int i=0; i<threads; i++){
      seed = 19 + i;
	 for (int j = 0; j< 4177; j++){
        in_data_copy[j] = in_data[j];

        for (int l =0; l <8; l++){
          rand = generateGaussianNoise(0,0,&z0,&z1,&generate,&seed);
          in_data_copy[j].record[l] = in_data[j].record[l] + rand;
        }
      }

      knn(in_data_copy,num_record,num_feature,new_point,k,rst+i*k);

    }

Also, several questions confused me here .

Did I really need to use the “copyin” and “copyout” here? Or when it would be a requirement to use “copyin” and “copyout”. I thought “copyin” and “copyout” might only
necessary when you want to share some data across multiple “parallel” region.
I used private here to declare “in_data_copy” as private for each thread, yet when I declare the size using [start:size] semantics, the compiler report errors. In the documents, it said “copy of item will be created for each parallel gang”, does it mean shared by the gang ?

Thanks.

MatColgrove · July 31, 2015, 10:28pm

Did I really need to use the “copyin” and “copyout” here? Or when it would be a requirement to use “copyin” and “copyout”.

I’m not quite understanding the question. There’s no requirement other than the data needs to be available on the device. If the compiler can determine how to copy over the data, it will. However since you’re writing in C/C++ it’s more likely that you will need to add a data clause to indicate how much data to move or a “present” clause to indicate that the data is already over on the device.

“copyin” and “copyout” just indicate the direction which to copy the data. “copy” is both directions, while “create” only allocates memory but does not synchronize.

I thought “copyin” and “copyout” might only
necessary when you want to share some data across multiple “parallel” region.

These are data clauses, not to be confused with a data region. A data region can span across compute regions (as well as subroutines) and more has to do with the lifetime of the device variables. Data clauses just specify the direction to copy and size of the data. There’s also an “update” directive which can be used to synchronize device and host data from within a data region.

I used private here to declare “in_data_copy” as private for each thread, yet when I declare the size using [start:size] semantics, the compiler report errors.

What was the error? From what I can tell “in_data_copy” is an array of structs. If it’s a fixed size struct, then you should be fine. But I’m guessing you have dynamic data members. In which case you can’t privatize them since the compiler has no way of knowing how big the struct is. Aggregate data types with dynamic data members are not supported within data clauses either. It’s the biggest limitation in OpenACC and one the standards committee is looking to address. But it’s a very difficult issue so it will take some time.

If you’re interested, my GTC2015 talk (https://www.youtube.com/watch?v=rWLmZt_u5u4) on OpenACC C++ Class Management touches upon the issue.

In the documents, it said “copy of item will be created for each parallel gang”, does it mean shared by the gang ?

If you privatize a variable on a gang loop, the variable will shared by all workers and vectors within the same gang.

Mat

Topic		Replies	Views
questions about #threads Legacy PGI Compilers	5	4081	August 3, 2015
Question regarding copyin and copyout Legacy PGI Compilers	4	4411	February 12, 2020
Evaluating and understanding OpenACC and PGI Legacy PGI Compilers	3	3574	June 22, 2015
Some troubles with kernel generation in OpenACC Legacy PGI Compilers	6	3822	January 29, 2013
#pragma acc kernels loop Versus #pragma acc parallel loop Legacy PGI Compilers	3	10497	June 1, 2015
should use to "acc reduction" in an inner loop Legacy PGI Compilers	4	4176	December 6, 2012
Atomic usage Legacy PGI Compilers	8	3618	July 2, 2019
How to parallelize this loop... Legacy PGI Compilers	14	7811	December 18, 2012
Question about data movement as seen from compiler feedback Legacy PGI Compilers	7	3553	January 28, 2013
Using Shared Memory Legacy PGI Compilers	4	6045	June 13, 2012

questions about a program

Related topics