I decided to try out pgprof with an accelerated kernel mainly for my own education and to see if there are bottlenecks I’m missing. I followed the example in the PGI Tools document:
> make runsorad-vector32.exe
pgfortran -fast -r4 -Mextend -Mpreprocess -Ktrap=fp -ta=nvidia,time -Minfo=ccff -c src/sorad.vector32.f
pgfortran -fast -r4 -Mextend -Mpreprocess -Ktrap=fp -ta=nvidia,time -Minfo=ccff -c src/sorad.orig.noaero.donottouch.f
pgfortran -fast -r4 -Mextend -Mpreprocess -Ktrap=fp -ta=nvidia,time -Minfo=ccff -c src/driver-check.f90
pgfortran -fast -r4 -Mextend -Mpreprocess -Ktrap=fp -ta=nvidia,time -Minfo=ccff sorad.vector32.o sorad.orig.noaero.donottouch.o driver-check.o -o runsorad-vector32.exe
> pgcollect -time runsorad-vector32.exe
...output from program...
> ls pg*out
pgpacc.out pgprof.out
> pgprof -exe runsorad-vector32.exe
At that point, things diverge. Instead of seeing the two Accelerator columns as shown in Figure 15.11, I get the normal two-column mode. Of course, that also means that nothing of my GPU kernel is displayed as well. Likewise the Accelerator “undertab” is ever-blank.
Yet, there is that non-zero-size pgpacc.out file with cryptic information in it. Is there an extra flag/switch I need to use to get pgprof to read the accelerator results?
Matt