Segmentation fault

Dear All

I am facing a problem while porting a large code with many subroutines using the PGI Accelerator directives. When I compile the code with pgf90 (without a target acceleraor flag) the program works. Now I just added two !$acc region / !$acc end region directives over two loops of one subroutine two convert them to two GPU compute regions. When I now compile the code with the -ta=nvidia flag I get the following compiler feedback for this particular subroutine.
When I now run the program it terminates with a “Segmentation fault”.
I also tried to accelerate several other loops (each one individualy) and I always faced the same problem.
What do I have to pay attention to? What could cause this fault?
Thank you very much in advance.

 162, Generating copyout(vort$p(1:3,1:knend))
         Generating compute capability 1.3 binary
    163, Loop carried dependence of 'uf$p' prevents parallelization
         Loop carried backward dependence of 'uf$p' prevents vectorization
         Complex loop carried dependence of 'vort$p' prevents parallelization
         Sequential loop scheduled on host
         Loop not vectorized/parallelized: contains call
    164, Loop is parallelizable
         Accelerator kernel generated
        164, !$acc do parallel, vector(3)
             CC 1.3 : 4 registers; 24 shared, 24 constant, 0 local memory bytes; 25 occupancy
    185, Generating copyin(pcell$p(1:8,1:kcend))
         Generating compute capability 1.3 binary
    195, Complex loop carried dependence of 'q$p' prevents parallelization
         Complex loop carried dependence of 'qt$p' prevents parallelization
         Complex loop carried dependence of 'vc$p' prevents parallelization
         Complex loop carried dependence of 'sixl$p' prevents parallelization
         Complex loop carried dependence of 'sixr$p' prevents parallelization
         Complex loop carried dependence of 'siyl$p' prevents parallelization
         Complex loop carried dependence of 'siyr$p' prevents parallelization
         Complex loop carried dependence of 'sizl$p' prevents parallelization
         Complex loop carried dependence of 'sizr$p' prevents parallelization
         Complex loop carried dependence of 'sjxl$p' prevents parallelization
         Complex loop carried dependence of 'sjxr$p' prevents parallelization
         Complex loop carried dependence of 'sjyl$p' prevents parallelization
         Complex loop carried dependence of 'sjyr$p' prevents parallelization
         Complex loop carried dependence of 'sjzl$p' prevents parallelization
         Complex loop carried dependence of 'sjzr$p' prevents parallelization
         Complex loop carried dependence of 'skxl$p' prevents parallelization
         Complex loop carried dependence of 'skxr$p' prevents parallelization
         Complex loop carried dependence of 'skyl$p' prevents parallelization
         Complex loop carried dependence of 'skyr$p' prevents parallelization
         Complex loop carried dependence of 'skzl$p' prevents parallelization
         Complex loop carried dependence of 'skzr$p' prevents parallelization
         Loop carried dependence of 'dudx$p' prevents parallelization
         Loop carried backward dependence of 'dudx$p' prevents vectorization
         Complex loop carried dependence of 'dudx$p' prevents parallelization
         Loop carried dependence of 'dudy$p' prevents parallelization
         Loop carried backward dependence of 'dudy$p' prevents vectorization
         Complex loop carried dependence of 'dudy$p' prevents parallelization
         Loop carried dependence of 'dudz$p' prevents parallelization
         Loop carried backward dependence of 'dudz$p' prevents vectorization
         Complex loop carried dependence of 'dudz$p' prevents parallelization
         Loop carried dependence of 'dvdx$p' prevents parallelization
         Loop carried backward dependence of 'dvdx$p' prevents vectorization
         Complex loop carried dependence of 'dvdx$p' prevents parallelization
         Loop carried dependence of 'dvdy$p' prevents parallelization
         Loop carried backward dependence of 'dvdy$p' prevents vectorization
         Complex loop carried dependence of 'dvdy$p' prevents parallelization
         Loop carried dependence of 'dvdz$p' prevents parallelization
         Loop carried backward dependence of 'dvdz$p' prevents vectorization
         Complex loop carried dependence of 'dvdz$p' prevents parallelization
         Loop carried dependence of 'dwdx$p' prevents parallelization
         Loop carried backward dependence of 'dwdx$p' prevents vectorization
         Complex loop carried dependence of 'dwdx$p' prevents parallelization
         Loop carried dependence of 'dwdy$p' prevents parallelization
         Loop carried backward dependence of 'dwdy$p' prevents vectorization
         Complex loop carried dependence of 'dwdy$p' prevents parallelization
         Loop carried dependence of 'dwdz$p' prevents parallelization
         Loop carried backward dependence of 'dwdz$p' prevents vectorization
         Complex loop carried dependence of 'dwdz$p' prevents parallelization
         Loop carried dependence of 'dtdx$p' prevents parallelization
         Loop carried backward dependence of 'dtdx$p' prevents vectorization
         Complex loop carried dependence of 'dtdx$p' prevents parallelization
         Loop carried dependence of 'dtdy$p' prevents parallelization
         Loop carried backward dependence of 'dtdy$p' prevents vectorization
         Complex loop carried dependence of 'dtdy$p' prevents parallelization
         Loop carried dependence of 'dtdz$p' prevents parallelization
         Loop carried backward dependence of 'dtdz$p' prevents vectorization
         Complex loop carried dependence of 'dtdz$p' prevents parallelization
         Loop carried dependence of 'dkdx$p' prevents parallelization
         Loop carried backward dependence of 'dkdx$p' prevents vectorization
         Complex loop carried dependence of 'dkdx$p' prevents parallelization
         Loop carried dependence of 'dkdy$p' prevents parallelization
         Loop carried backward dependence of 'dkdy$p' prevents vectorization
         Complex loop carried dependence of 'dkdy$p' prevents parallelization
         Loop carried dependence of 'dkdz$p' prevents parallelization
         Loop carried backward dependence of 'dkdz$p' prevents vectorization
         Complex loop carried dependence of 'pcell$p' prevents parallelization
         Sequential loop scheduled on host
         Generating copyout(k(1:8))
         Loop not vectorized/parallelized: contains call
    196, Loop is parallelizable
         Accelerator kernel generated
        196, !$acc do parallel, vector(8)
             CC 1.3 : 4 registers; 24 shared, 32 constant, 0 local memory bytes; 25 occupancy
    408, Loop not vectorized: may not be beneficial
         Loop unrolled 4 times
    410, Loop not vectorized: data dependency
    473, Loop not vectorized: data dependency
    699, Loop not vectorized: may not be beneficial
         Loop unrolled 4 times
    701, Loop not vectorized: data dependency
    728, Loop not vectorized: data dependency
    921, Loop not vectorized: data dependency
    950, Loop not vectorized: data dependency
    980, Loop not vectorized: data dependency
    995, Loop not vectorized: data dependency

Hi elephant,

The seg fault could be caused by any number of things, exactly what I’m not sure. My best guess would be that compiler is generating some bad code since the outer loops are not paralleliable. It’s trying to run the outer loops on sequentially on the host and the inner loops on the GPU.

To test this theory, try putting the ACC REGION directives only around the loops at line 164 and 196. If it works, then most likely the compiler is doing a poor job of managing data between the inner device loop and the outer host loop.

Though, my strategy here would be ignore the seg fault for now and work on modifying the code so that the outer loops parallelize. It looks like you have a lot of loop carried dependencies as well as a function call.

Also, please feel free to send in the code to PGI Customer Service (trs@pgroup.com) so we can take a look at the seg fault, and if it is indeed a compiler error, then we can fix the problem.

Thanks,
Mat