As far as I am aware I should be able to profile an openacc application using nvprof, but whenever I attempt to profile an application nvprof reports that no kernels were profiled.
Using the vecaddmod example from the openacc getting started guide (corrected so that it compiles)
! Fortran OpenACC example from the PGI OpenACC Getting Started Guide ! Chapter 2.10.1 - Vector Addition on the GPU ! http://www.pgroup.com/doc/openacc_gs.pdf module vecaddmod implicit none contains subroutine vecaddgpu( r, a, b, n ) real, dimension(:) :: r, a, b integer :: n integer :: i !$acc kernels loop copyin(a(1:n),b(1:n)) copyout(r(1:n)) do i = 1, n r(i) = a(i) + b(i) enddo end subroutine end module program main use vecaddmod implicit none integer :: n, i, errs, argcount real, dimension(:), allocatable :: a, b, r, e character*10 :: arg1 argcount = command_argument_count() n = 1000000 ! default value ! @note - Corrected operator = to == if( argcount == 1 )then call get_command_argument( 1, arg1 ) read( arg1, '(i)' ) n if( n <= 0 ) n = 100000 endif allocate( a(n), b(n), r(n), e(n) ) do i = 1, n a(i) = i b(i) = 1000*i enddo ! compute on the GPU call vecaddgpu( r, a, b, n ) ! compute on the host to compare do i = 1, n e(i) = a(i) + b(i) enddo ! compare results errs = 0 do i = 1, n if( r(i) /= e(i) )then errs = errs + 1 endif enddo print *, errs, ' errors found' if( errs ) call exit(errs) end program
saved as f1.f90 and compiled using
pgfortran -acc -fast -Minfo=accel -g f1.f90
attempting to capture data results in the following output.
nvprof f1.exe 0 errors found ==3648== NVPROF is profiling process 3648, command: f1.exe ==3648== Profiling application: f1.exe ==3648== Profiling result: No kernels were profiled. ==3648== API calls: No API activities were profiled. ==3648== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.
Running the file with
gives the following output
0 errors found PGI: "acc_shutdown" not detected, performance results might be incomplete. Please add the call "acc_shutdown(acc_device_nvidia)" to the end of your application to ensure that the performance results are complete. launch CUDA kernel file=C:\Users\ptheywood\SATGPU\vecaddmod\f1.f90 function=vecaddgpu line=14 device=0 threadid=1 num_gangs=7813 num_workers=1 vector_length=128 grid=7813 block=128 Accelerator Kernel Timing data C:\Users\ptheywood\SATGPU\vecaddmod\f1.f90 vecaddgpu NVIDIA devicenum=0 time(us): 3,708 13: compute region reached 1 time 14: kernel launched 1 time grid:  block:  device time(us): total=0 max=0 min=0 avg=0 13: data region reached 1 time 13: data copyin transfers: 5 device time(us): total=2,449 max=1,213 min=5 avg=489 17: data region reached 1 time 17: data copyout transfers: 1 device time(us): total=1,259 max=1,259 min=1,259 avg=1,259
prior to the final if statement does not resolve the PGI message either.
Version numbers as follows:
pgfortran --version pgfortran 15.10-0 64-bit target on x86-64 Windows -tp haswell The Portland Group - PGI Compilers and Tools Copyright (c) 2015, NVIDIA CORPORATION. All rights reserved. nvprof --version nvprof: NVIDIA (R) Cuda command line profiler Copyright (c) 2012 - 2015 NVIDIA Corporation Release version 7.5.18 (21)
Is there anything that I am missing?