Hi,
As far as I am aware I should be able to profile an openacc application using nvprof, but whenever I attempt to profile an application nvprof reports that no kernels were profiled.
I.e.
Using the vecaddmod example from the openacc getting started guide (corrected so that it compiles)
! Fortran OpenACC example from the PGI OpenACC Getting Started Guide
! Chapter 2.10.1 - Vector Addition on the GPU
! http://www.pgroup.com/doc/openacc_gs.pdf
module vecaddmod
implicit none
contains
subroutine vecaddgpu( r, a, b, n )
real, dimension(:) :: r, a, b
integer :: n
integer :: i
!$acc kernels loop copyin(a(1:n),b(1:n)) copyout(r(1:n))
do i = 1, n
r(i) = a(i) + b(i)
enddo
end subroutine
end module
program main
use vecaddmod
implicit none
integer :: n, i, errs, argcount
real, dimension(:), allocatable :: a, b, r, e
character*10 :: arg1
argcount = command_argument_count()
n = 1000000 ! default value
! @note - Corrected operator = to ==
if( argcount == 1 )then
call get_command_argument( 1, arg1 )
read( arg1, '(i)' ) n
if( n <= 0 ) n = 100000
endif
allocate( a(n), b(n), r(n), e(n) )
do i = 1, n
a(i) = i
b(i) = 1000*i
enddo
! compute on the GPU
call vecaddgpu( r, a, b, n )
! compute on the host to compare
do i = 1, n
e(i) = a(i) + b(i)
enddo
! compare results
errs = 0
do i = 1, n
if( r(i) /= e(i) )then
errs = errs + 1
endif
enddo
print *, errs, ' errors found'
if( errs ) call exit(errs)
end program
saved as f1.f90 and compiled using
pgfortran -acc -fast -Minfo=accel -g f1.f90
attempting to capture data results in the following output.
nvprof f1.exe
0 errors found
==3648== NVPROF is profiling process 3648, command: f1.exe
==3648== Profiling application: f1.exe
==3648== Profiling result:
No kernels were profiled.
==3648== API calls:
No API activities were profiled.
==3648== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.
Running the file with
PGI_ACC_TIME=1
PGI_ACC_NOTIFY=1
gives the following output
0 errors found
PGI: "acc_shutdown" not detected, performance results might be incomplete.
Please add the call "acc_shutdown(acc_device_nvidia)" to the end of your application to ensure that the performance results are complete.
launch CUDA kernel file=C:\Users\ptheywood\SATGPU\vecaddmod\f1.f90 function=vecaddgpu line=14 device=0 threadid=1 num_gangs=7813 num_workers=1 vector_length=128 grid=7813 block=128
Accelerator Kernel Timing data
C:\Users\ptheywood\SATGPU\vecaddmod\f1.f90
vecaddgpu NVIDIA devicenum=0
time(us): 3,708
13: compute region reached 1 time
14: kernel launched 1 time
grid: [7813] block: [128]
device time(us): total=0 max=0 min=0 avg=0
13: data region reached 1 time
13: data copyin transfers: 5
device time(us): total=2,449 max=1,213 min=5 avg=489
17: data region reached 1 time
17: data copyout transfers: 1
device time(us): total=1,259 max=1,259 min=1,259 avg=1,259
call acc_shutdown(acc_device_nvidia)
prior to the final if statement does not resolve the PGI message either.
Version numbers as follows:
pgfortran --version
pgfortran 15.10-0 64-bit target on x86-64 Windows -tp haswell
The Portland Group - PGI Compilers and Tools
Copyright (c) 2015, NVIDIA CORPORATION. All rights reserved.
nvprof --version
nvprof: NVIDIA (R) Cuda command line profiler
Copyright (c) 2012 - 2015 NVIDIA Corporation
Release version 7.5.18 (21)
Is there anything that I am missing?
Thanks,
Peter.