Loop carried dependence of a->x prevents parallelization

1446561717 · April 9, 2023, 2:40am

When I used nested loops to implement complex operations, I encountered a problem

    cufftComplex* a = (cufftComplex*)malloc(Length_theta* M* sizeof(cufftComplex));
        #pragma acc kernels
	for(int i=0;i<Length_theta;i++)
	for(int j=0;j<M;j++)
	{
	double Theta=(-90+i)*deg2rad;
	a[i*M+j].x=cos(2*PI*f0*d*sin(Theta)/c*j);
	a[i*M+j].y=sin(2*PI*f0*d*sin(Theta)/c*j);
//	if(i==3&&j<10)std::cout <<Theta*180.0/PI<<'\t'<< a[i*M+j].x<<'\t'<<a[i*M+j].y <<'\t'<<'\n';
		
	}

The feedback on this part is

    157, Loop carried dependence of a->x prevents parallelization
         Loop carried dependence of a->x prevents vectorization
         Loop carried backward dependence of a->x prevents vectorization
    158, Loop is parallelizable
         Generating NVIDIA GPU code
        157, #pragma acc loop seq
        158, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */

1446561717 · April 9, 2023, 3:29am

What I want to achieve is a total length_ Theta * M threads are calculated together because there is no data dependency between them, but it seems that the compiler does not understand it that way

MatColgrove · April 10, 2023, 4:09pm

With the “kernels” construct, the compiler must prove there are no dependencies in order to parallelize the loops. However since you’re using computed indices, the compiler can’t tell if the accesses to “a” are independent across loop iterations.

To fix, either use “kernels loop independent” or the “parallel” construct where “independent” is the default. “independent” asserts to the compiler that there are no dependencies.

Example:

   #pragma acc kernels loop independent collapse(2)
	for(int i=0;i<Length_theta;i++)
	for(int j=0;j<M;j++)
	{

or

   #pragma acc parallel loop collapse(2)
	for(int i=0;i<Length_theta;i++)
	for(int j=0;j<M;j++)
	{

Hope this helps,
Mat

system · April 24, 2023, 4:09pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Loop "too deeply nested" and "data dependency Legacy PGI Compilers	9	10588	November 27, 2017
prevent parallelization Legacy PGI Compilers	3	1921	March 22, 2012
loop carried dependence Legacy PGI Compilers	2	13251	September 15, 2009
Incorrect Results with triply nested loop Legacy PGI Compilers	4	2279	November 16, 2015
Acceleration of nested loops Legacy PGI Compilers	5	4047	November 8, 2016
Complex loop that worked in 18.4 not accel in 18.7 Legacy PGI Compilers	2	1618	August 24, 2018
Complex loop carried dependence Legacy PGI Compilers	1	3898	December 21, 2015
paralle + independent and kernels + vector_length() Legacy PGI Compilers	5	4038	August 20, 2012
acc kernels / acc parallel question Legacy PGI Compilers	2	3858	September 1, 2017
Clause 'Worker(value)' not allowed in 'Parallel Loop' direct Legacy PGI Compilers	2	1828	April 17, 2018

Loop carried dependence of a->x prevents parallelization

Related topics