Using the CUDA Visual Profiler and the Pgprof Profiler

Hi. I am having problems using the CUDA Visual Profiler and the Pgprof Profiler. To show what the problem is I compile a little test program (see below) as follows with the pgi 10.5 compiler on Windows 7, 64 bit and a Gtx 260:

pgfortran vecmult.f90 -Minfo=ccff -ta=nvidia -o vec

  1. CUDA Visual profiler. When I start the profiling process, I get the following message:

=== Start profiling for session ‘Session1’ ===
Start program ‘C:/Workspace/vektormult/vec.exe’ run #1
licensed libpgacc.dll not found, exiting

What could this mean? Is it because I only have a trial version of the PGI compiler?

  1. Pgprof Profiler

After typing: pgcollect vec
and: Pgprof –exe vec

Pgprof starts and I can see how much time was spent on the first (host) loop, anyways I can find no information about the data transfer time for the arrays or the time spent on the second loop (GPU). So there is no ‘accelerator region time’ or ‘accelerator kernel time’ row as shown here: http://www.pgroup.com/lit/articles/insider/v2n1a2.htm. Why not?

The program I used reads as follows:

program vecmult

real,dimension(:),allocatable :: A,B,C
integer :: N,M

!M=2^24
M=16777216
allocate(A(M))
allocate(B(M))
allocate(C(M))

do N=1,M
 A(N) = 3./real(N)
 B(N) = 2./real(N)
end do

!$acc region copyout(C(1:M))
do N=1,M
 C(N) = A(N)*B(N)
end do

!$acc end region

write(*,*) C(1)

end program