PGCC-W-0155-External and Static variables are not supported in acc routine

Mxtv · December 22, 2020, 8:33am

hello I’m all new to parallel programming and got a job like this. My task is to parallel this algorithm. I’ve tried a couple of different ways on your forum, but it doesn’t work.
Could someone see this particular case and suggest what needs to be done to get it to run properly.
The program consists of three files, I will post Below I will put the algorithm code,

 acc_init(acc_device_nvidia);


 begin = omp_get_wtime();
 
 RasterizedPolygon *rasterizedPolygons = new RasterizedPolygon[pcount];    

int start_row = 0;
int end_row = tiff_height;
int idBlock = 0;    
polygon *one_polygon;
ring *one_ring;  


  #pragma acc data  copy(rasterizedPolygons[0:pcount])\
                  copyin(pBlocks[0:blocksCount],adfGeoTransform)
              
                
  {

  #pragma acc parallel loop private ( start_row, end_row,idBlock)
  
for ( int linex=start_row; linex < end_row; linex++ )
{
   double dy = adfGeoTransform[3] + 0 * adfGeoTransform[4] + (linex+0.5) *
            adfGeoTransform[5];
    // sprawdzenie w ktorym bolku miesci sie linia
  
  
     #pragma acc loop private(block)
   
   int block;
    for ( block=0; block<blocksCount; block++)
    {
        if (dy >= pBlocks[block].min && dy < pBlocks[block].max)
        {
            idBlock = block;
            break;
        }
    }
    // szukanie poligonow tylko w tym bloku
    for (int polyId=0; polyId < pBlocks[idBlock].indexes.size(); polyId++)
    {
        int polygony = pBlocks[idBlock].indexes[polyId];
        one_polygon = polygons+polygony;
        int rcount = one_polygon->rings.size();
  
        for (int nring=0; nring <rcount; nring++)
        {
            one_ring = one_polygon->rings[nring];
           int vcount = one_ring->points.size();
           
            for ( int vertexz=0; vertexz < vcount-1; vertexz++)
            {
                double dx1, dy1, dx2, dy2;
                // d1 < d2
                dx1 = one_ring->points[vertexz]->x;
                dx2 = one_ring->points[vertexz+1]->x;

                dy1 = one_ring->points[vertexz]->y;
                dy2 = one_ring->points[vertexz+1]->y;

                if (dy1 > dy2)
                {
                
                  int pom1 = dx1;
                  dx1=dx2;
                  dx2=pom1;
                  
                  int pom2=dy1;
                  dy1=dy2;
                  dy2=pom2;
                
                   // myswap(dx1, dx2);
                    //myswap(dy1, dy2);
                }
                if ((dy < dy2)&&(dy>=dy1))
                {
                    // obliczanie wspolrzednej przeciecia
                   double ionx = (dy-dy1)*(dx2-dx1)/(dy2-dy1)+dx1;
                    // nr kolumny w pixelach
                    int x = (ionx - adfGeoTransform[0] -
                        linex * adfGeoTransform[2] ) /
                        adfGeoTransform[1];
                    rasterizedPolygons[polygony].
                        addIntersect(linex, x);
                }
            }
        }
    }
}

}

The compiler displays errors:
rasterize(std::basic_string<char, std::char_traits, std::allocator>, std::basic_string<char, std::char_traits, std::allocator>, int, int, int):
196, Generating copy(rasterizedPolygons[:pcount])
Generating copyin(pBlocks[:blocksCount],adfGeoTransform[:])
Accelerator kernel generated
Generating Tesla code
200, #pragma acc loop gang /* blockIdx.x /
210, #pragma acc loop seq
219, #pragma acc loop seq
225, #pragma acc loop seq
230, #pragma acc loop vector(128) / threadIdx.x */
225, Loop is parallelizable
230, Loop is parallelizable
PGCC-W-0155-External and Static variables are not supported in acc routine - _ZN46_INTERNAL_24_librasterizationACC2_cpp_ce5c3146St19piecewise_constructE (librasterizationACC2.cpp: 465)

librasterizationACC2.cpp (12.1 KB) librasterizationACC2.h (1.4 KB) mainACC2.cpp (566 Bytes)

MatColgrove · December 22, 2020, 7:36pm

Hi Mxtv,

The compiler is parallelizing the code, though, implicit scheduling is just vectorizing the innermost loop. You may want to add “gang vector” to your parallel loop directive so only the outermost loop is parallelized. Once that’s working, you can experiment with parallelization of the inner loops.

This is a warning that one of your routines contains a static variable (piecewise_construct which is a constexpr), Depending on how this variable is being used, it may or may not be an issue.

Not sure what it’s coming from since I can’t compile the attached code since it’s missing “librasterizationACC2.h”. Are you able to attach all dependent header files?

-Mat

Mxtv · December 23, 2020, 12:06pm

Hi Mat,

Thanks for such a quick reply. I have included the rest of the program files. Thank you in advance.

MatColgrove · December 28, 2020, 5:37pm

Thanks Mxtv. The “gdal” library was too time consuming to get installed, so I still haven’t been able to compile your program nor know where the reference to “piecewise_construct” is coming from. Compiling with “-P” and then inspecting the post-processed (.i) file, might give some clues.

You are using vectors, which are ok, but you’ll want to make sure to use CUDA Unified Memory (-gpu=managed) and restrict to only using the access operator. “addInterect” uses a push_back which is not safe to parallelize. You’ll need to not use “push_back” in order to safely parallelize this loop.

-Mat