ehehe… I should RTFM more often. :)
Anyway, I managed to port the code to C that compiles under pgcc and just stuck it all in one file just to make it easier for myself in the meantime. The issue is that when I tried the pragmas suggested I did not get the right outputs – in fact I ended up getting no output at all, results were still initialized to zero, or were orders of magnitude off. When the pragmas are not in place, I get the exact same output I get with compiling with MSVC 2008/2010, which is good, because I know the code is working.
I do want to mention however, that even without pragmas, the code executes in ~17 seconds for the same data set with a single CPU thread, vs 58 seconds for the MSVC version, but regardless of that, OpenMP runs it in ~9 secs.
I have sent the ported code to the support e-mail in case any suggestions can be made in regards to why the pragma additions are not producing the correct outputs. Perhaps there were some implied changes mentioned in the post that I did not implement?
I compiled as:
pgcc -Minline -ta=nvidia,cc20 -acc -Minfo file.c
Here is the output I got when I did the pragmas as suggested with -Minfo flag and I get the bogus results:
PGC-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Mismatched loop levels when adding syn
cs (ifmm.c: 466)
main:
368, time inlined, size=2, file ifmm.c (132)
462, Generating present_or_copy(aEph[0:Nobs])
Generating present_or_copy(aEth[0:Nobs])
Generating present_or_copy(aCobsZ[0:NgO])
Generating present_or_copy(aCobsY[0:NgO])
Generating present_or_copy(aCobsX[0:NgO])
Generating present_or_copy(acs32[0:Nobs])
Generating present_or_copy(acs23[0:Nobs])
Generating present_or_copy(acs22[0:Nobs])
Generating present_or_copy(acs13[0:Nobs])
Generating present_or_copy(acs12[0:Nobs])
Generating present_or_copy(akuZ[0:NgONgS])
Generating present_or_copy(akuY[0:NgONgS])
Generating present_or_copy(akuX[0:NgONgS])
Generating present_or_copy(arz[0:Nobs])
Generating present_or_copy(ary[0:Nobs])
Generating present_or_copy(arx[0:Nobs])
Generating present_or_copy(aRadio[0:NgONgS])
Generating present_or_copy(aAggregation_Jz[0:NgONgS])
Generating present_or_copy(aAggregation_Jy[0:NgONgS])
Generating present_or_copy(aAggregation_Jx[0:NgONgS])
Generating present_or_copy(negObs[0:NgO])
Generating present_or_copy(aNgS)
Generating present_or_copy(aNgO)
Generating present_or_copy(afactorJ)
Generating present_or_copy(k0)
Generating present_or_copy(coefEcartEphZ)
Generating present_or_copy(coefEcartEthZ)
Generating present_or_copy(coefEcartEphY)
Generating present_or_copy(coefEcartEthY)
Generating present_or_copy(coefEcartEphX)
Generating present_or_copy(prodEscal)
Generating present_or_copy(vvar4)
Generating present_or_copy(vvar3)
Generating present_or_copy(vvar2)
Generating present_or_copy(vvar1)
Generating present_or_copy(aux6)
Generating present_or_copy(aux5)
Generating present_or_copy(aux4)
Generating present_or_copy(aux3)
Generating present_or_copy(aux2)
Generating present_or_copy(aux1)
Generating present_or_copy(ap)
Generating present_or_copy(an)
Generating present_or_copy(am)
Generating present_or_copy(cont)
Generating present_or_copy(accumZ)
Generating present_or_copy(accumY)
Generating present_or_copy(accumX)
466, Accelerator kernel generated
468, #pragma acc loop gang /* blockIdx.x */
476, #pragma acc loop vector(256) /* threadIdx.x */
481, #pragma acc loop vector(256) /* threadIdx.x */
468, Scalar last value needed after loop for 'prodEscal' at line 619
Scalar last value needed after loop for 'prodEscal' at line 620
Scalar last value needed after loop for 'prodEscal' at line 621
Scalar last value needed after loop for 'aux3' at line 625
Scalar last value needed after loop for 'aux3' at line 631
Scalar last value needed after loop for 'aux3' at line 681
Scalar last value needed after loop for 'aux6' at line 625
Scalar last value needed after loop for 'aux6' at line 577
Scalar last value needed after loop for 'aux6' at line 578
Scalar last value needed after loop for 'aux2' at line 624
Scalar last value needed after loop for 'aux2' at line 630
Scalar last value needed after loop for 'aux2' at line 681
Scalar last value needed after loop for 'aux5' at line 624
Scalar last value needed after loop for 'aux5' at line 574
Scalar last value needed after loop for 'aux5' at line 575
Scalar last value needed after loop for 'aux1' at line 623
Scalar last value needed after loop for 'aux1' at line 629
Scalar last value needed after loop for 'aux1' at line 567
Scalar last value needed after loop for 'aux1' at line 568
Scalar last value needed after loop for 'aux1' at line 569
Scalar last value needed after loop for 'aux1' at line 681
Scalar last value needed after loop for 'aux4' at line 623
Scalar last value needed after loop for 'aux4' at line 571
Scalar last value needed after loop for 'aux4' at line 572
Accelerator restriction: scalar variable live-out from loop: accumX
Accelerator restriction: scalar variable live-out from loop: aux4
Accelerator restriction: scalar variable live-out from loop: aux1
Accelerator restriction: scalar variable live-out from loop: coefEcartEphX
Accelerator restriction: scalar variable live-out from loop: accumY
Accelerator restriction: scalar variable live-out from loop: aux5
Accelerator restriction: scalar variable live-out from loop: aux2
Accelerator restriction: scalar variable live-out from loop: coefEcartEphY
Accelerator restriction: scalar variable live-out from loop: coefEcartEthY
Accelerator restriction: scalar variable live-out from loop: accumZ
Accelerator restriction: scalar variable live-out from loop: aux6
Accelerator restriction: scalar variable live-out from loop: aux3
Accelerator restriction: scalar variable live-out from loop: coefEcartEphZ
Accelerator restriction: scalar variable live-out from loop: coefEcartEthZ
Accelerator restriction: scalar variable live-out from loop: prodEscal
Accelerator restriction: scalar variable live-out from loop: vvar4
Accelerator restriction: scalar variable live-out from loop: vvar3
Accelerator restriction: scalar variable live-out from loop: vvar2
Accelerator restriction: scalar variable live-out from loop: vvar1
Accelerator restriction: scalar variable live-out from loop: am
Accelerator restriction: scalar variable live-out from loop: cont
Accelerator restriction: scalar variable live-out from loop: an
Conditional loop will be executed in scalar mode
476, Accelerator restriction: induction variable live-out from loop: ap
477, Accelerator restriction: induction variable live-out from loop: am
Accelerator restriction: induction variable live-out from loop: ap
481, Complex loop carried dependence of '*(aAggregation_Jz).real' prevents parallelization
Loop carried dependence of '*(aAggregation_Jz).real' prevents parallelization
Complex loop carried dependence of '*(aAggregation_Jz).imag' prevents parallelization
Complex loop carried dependence of '*(aAggregation_Jy).real' prevents parallelization
Loop carried dependence of '*(aAggregation_Jy).real' prevents parallelization
Complex loop carried dependence of '*(aAggregation_Jy).imag' prevents parallelization
Complex loop carried dependence of '*(aAggregation_Jx).real' prevents parallelization
Loop carried dependence of '*(aAggregation_Jx).real' prevents parallelization
Complex loop carried dependence of '*(aAggregation_Jx).imag' prevents parallelization
Complex loop carried dependence of '*(arx)' prevents parallelization
Complex loop carried dependence of '*(aCobsX)' prevents parallelization
Complex loop carried dependence of '*(ary)' prevents parallelization
Complex loop carried dependence of '*(aCobsY)' prevents parallelization
Complex loop carried dependence of '*(arz)' prevents parallelization
Complex loop carried dependence of '*(aCobsZ)' prevents parallelization
Complex loop carried dependence of '*(akuZ)' prevents parallelization
Complex loop carried dependence of '*(akuY)' prevents parallelization
Complex loop carried dependence of '*(akuX)' prevents parallelization
Scalar last value needed after loop for 'prodEscal' at line 619
Scalar last value needed after loop for 'prodEscal' at line 620
Scalar last value needed after loop for 'prodEscal' at line 621
Complex loop carried dependence of '*(acs32)' prevents parallelization
Complex loop carried dependence of '*(acs22)' prevents parallelization
Complex loop carried dependence of '*(acs12)' prevents parallelization
Complex loop carried dependence of '*(acs23)' prevents parallelization
Complex loop carried dependence of '*(acs13)' prevents parallelization
Complex loop carried dependence of '*(aEph).real' prevents parallelization
Complex loop carried dependence of '*(aEth).real' prevents parallelization
Scalar last value needed after loop for 'aux3' at line 625
Scalar last value needed after loop for 'aux3' at line 631
Scalar last value needed after loop for 'aux3' at line 681
Complex loop carried dependence of '*(aEph).imag' prevents parallelization
Complex loop carried dependence of '*(aEth).imag' prevents parallelization
Scalar last value needed after loop for 'aux6' at line 625
Scalar last value needed after loop for 'aux6' at line 577
Scalar last value needed after loop for 'aux6' at line 578
Scalar last value needed after loop for 'aux2' at line 624
Scalar last value needed after loop for 'aux2' at line 630
Scalar last value needed after loop for 'aux2' at line 681
Scalar last value needed after loop for 'aux5' at line 624
Scalar last value needed after loop for 'aux5' at line 574
Scalar last value needed after loop for 'aux5' at line 575
Scalar last value needed after loop for 'aux1' at line 623
Scalar last value needed after loop for 'aux1' at line 629
Scalar last value needed after loop for 'aux1' at line 567
Scalar last value needed after loop for 'aux1' at line 568
Scalar last value needed after loop for 'aux1' at line 569
Scalar last value needed after loop for 'aux1' at line 681
Scalar last value needed after loop for 'aux4' at line 623
Scalar last value needed after loop for 'aux4' at line 571
Scalar last value needed after loop for 'aux4' at line 572
Accelerator restriction: scalar variable live-out from loop: accumX
Accelerator restriction: scalar variable live-out from loop: aux4
Accelerator restriction: scalar variable live-out from loop: aux1
Accelerator restriction: scalar variable live-out from loop: coefEcartEphX
Accelerator restriction: scalar variable live-out from loop: accumY
Accelerator restriction: scalar variable live-out from loop: aux5
Accelerator restriction: scalar variable live-out from loop: aux2
Accelerator restriction: scalar variable live-out from loop: coefEcartEphY
Accelerator restriction: scalar variable live-out from loop: coefEcartEthY
Accelerator restriction: scalar variable live-out from loop: accumZ
Accelerator restriction: scalar variable live-out from loop: aux6
Accelerator restriction: scalar variable live-out from loop: aux3
Accelerator restriction: scalar variable live-out from loop: coefEcartEphZ
Accelerator restriction: scalar variable live-out from loop: coefEcartEthZ
Accelerator restriction: scalar variable live-out from loop: prodEscal
Accelerator restriction: scalar variable live-out from loop: vvar4
Accelerator restriction: scalar variable live-out from loop: vvar3
Accelerator restriction: scalar variable live-out from loop: vvar2
Accelerator restriction: scalar variable live-out from loop: vvar1
Accelerator restriction: scalar variable live-out from loop: am
488, Accelerator restriction: induction variable live-out from loop: ap
Scalar last value needed after loop for 'prodEscal' at line 619
Scalar last value needed after loop for 'prodEscal' at line 620
Scalar last value needed after loop for 'prodEscal' at line 621
Scalar last value needed after loop for 'aux3' at line 625
Scalar last value needed after loop for 'aux3' at line 631
Scalar last value needed after loop for 'aux3' at line 681
Scalar last value needed after loop for 'aux6' at line 625
Scalar last value needed after loop for 'aux6' at line 577
Scalar last value needed after loop for 'aux6' at line 578
Scalar last value needed after loop for 'accumZ' at line 534
Scalar last value needed after loop for 'accumZ' at line 535
Scalar last value needed after loop for 'aux2' at line 624
Scalar last value needed after loop for 'aux2' at line 630
Scalar last value needed after loop for 'aux2' at line 681
Scalar last value needed after loop for 'aux5' at line 624
Scalar last value needed after loop for 'aux5' at line 574
Scalar last value needed after loop for 'aux5' at line 575
Scalar last value needed after loop for 'accumY' at line 537
Scalar last value needed after loop for 'accumY' at line 538
Scalar last value needed after loop for 'aux1' at line 623
Scalar last value needed after loop for 'aux1' at line 629
Scalar last value needed after loop for 'aux1' at line 567
Scalar last value needed after loop for 'aux1' at line 568
Scalar last value needed after loop for 'aux1' at line 569
Scalar last value needed after loop for 'aux1' at line 681
Scalar last value needed after loop for 'aux4' at line 623
Scalar last value needed after loop for 'aux4' at line 571
Scalar last value needed after loop for 'aux4' at line 572
Scalar last value needed after loop for 'accumX' at line 540
Scalar last value needed after loop for 'accumX' at line 541
Accelerator restriction: scalar variable live-out from loop: accumX
Accelerator restriction: scalar variable live-out from loop: aux4
Accelerator restriction: scalar variable live-out from loop: aux1
Accelerator restriction: scalar variable live-out from loop: coefEcartEphX
Accelerator restriction: scalar variable live-out from loop: accumY
Accelerator restriction: scalar variable live-out from loop: aux5
Accelerator restriction: scalar variable live-out from loop: aux2
Accelerator restriction: scalar variable live-out from loop: coefEcartEphY
Accelerator restriction: scalar variable live-out from loop: coefEcartEthY
Accelerator restriction: scalar variable live-out from loop: accumZ
Accelerator restriction: scalar variable live-out from loop: aux6
Accelerator restriction: scalar variable live-out from loop: aux3
Accelerator restriction: scalar variable live-out from loop: coefEcartEphZ
Accelerator restriction: scalar variable live-out from loop: coefEcartEthZ
Accelerator restriction: scalar variable live-out from loop: prodEscal
Accelerator restriction: scalar variable live-out from loop: vvar4
Accelerator restriction: scalar variable live-out from loop: vvar3
Accelerator restriction: scalar variable live-out from loop: vvar2
Accelerator restriction: scalar variable live-out from loop: vvar1
490, Accelerator restriction: induction variable live-out from loop: ap
Accelerator restriction: induction variable live-out from loop: am
491, Accelerator restriction: induction variable live-out from loop: ap
Accelerator restriction: induction variable live-out from loop: am
492, Accelerator restriction: induction variable live-out from loop: ap
Accelerator restriction: induction variable live-out from loop: am
494, Accelerator restriction: induction variable live-out from loop: an
Accelerator restriction: induction variable live-out from loop: ap
500, Accelerator restriction: induction variable live-out from loop: an
Accelerator restriction: induction variable live-out from loop: ap
Accelerator restriction: induction variable live-out from loop: am
502, Accelerator restriction: induction variable live-out from loop: am
Accelerator restriction: induction variable live-out from loop: an
Accelerator restriction: induction variable live-out from loop: ap
504, Accelerator restriction: induction variable live-out from loop: am
505, Accelerator restriction: induction variable live-out from loop: am
506, c_prod inlined, size=6, file ifmm.c (694)
511, Accelerator restriction: induction variable live-out from loop: an
Accelerator restriction: induction variable live-out from loop: ap
Accelerator restriction: induction variable live-out from loop: am
513, Accelerator restriction: induction variable live-out from loop: an
Accelerator restriction: induction variable live-out from loop: ap
Accelerator restriction: induction variable live-out from loop: am
515, Accelerator restriction: induction variable live-out from loop: am
516, Accelerator restriction: induction variable live-out from loop: am
517, c_prod inlined, size=6, file ifmm.c (694)
522, Accelerator restriction: induction variable live-out from loop: an
Accelerator restriction: induction variable live-out from loop: ap
Accelerator restriction: induction variable live-out from loop: am
524, Accelerator restriction: induction variable live-out from loop: an
Accelerator restriction: induction variable live-out from loop: ap
Accelerator restriction: induction variable live-out from loop: am
526, Accelerator restriction: induction variable live-out from loop: am
527, Accelerator restriction: induction variable live-out from loop: am
528, c_prod inlined, size=6, file ifmm.c (694)
532, Accelerator restriction: induction variable live-out from loop: am
Accelerator restriction: induction variable live-out from loop: ap
534, Accelerator restriction: induction variable live-out from loop: an
Accelerator restriction: induction variable live-out from loop: ap
535, Accelerator restriction: induction variable live-out from loop: an
Accelerator restriction: induction variable live-out from loop: ap
537, Accelerator restriction: induction variable live-out from loop: an
Accelerator restriction: induction variable live-out from loop: ap
538, Accelerator restriction: induction variable live-out from loop: an
Accelerator restriction: induction variable live-out from loop: ap
540, Accelerator restriction: induction variable live-out from loop: an
Accelerator restriction: induction variable live-out from loop: ap
541, Accelerator restriction: induction variable live-out from loop: an
Accelerator restriction: induction variable live-out from loop: ap
543, Accelerator restriction: induction variable live-out from loop: an
544, Accelerator restriction: induction variable live-out from loop: ap
567, c_prod inlined, size=6, file ifmm.c (694)
568, c_prod inlined, size=6, file ifmm.c (694)
569, c_prod inlined, size=6, file ifmm.c (694)
619, c_prod inlined, size=6, file ifmm.c (694)
620, c_prod inlined, size=6, file ifmm.c (694)
621, c_prod inlined, size=6, file ifmm.c (694)
637, time inlined, size=2, file ifmm.c (132)
653, difftime inlined, size=2, file ifmm.c (83)
691, Accelerator restriction: induction variable live-out from loop: ap
PGC/x86-64 Windows 12.6-0: compilation completed with warnings