NVHPC timeline question

unrue · October 18, 2021, 8:23am

Dear Nvidia users,

using NVHPC with a very big code, the timeline is quite populated. Is it possible, from a portion of timeline, to understand wich part of code is referring? I see the kernels name, but such kernel are called in more part of the code in my case.

In other words, how to isolate in timeline a portion of the code where I’m working, in order to see the behaviour before/after my changes? IS it possible to analyze I mean, for example the behaviour just in a particular subroutine? I have just a global view, but not understanding where such piece of timeline has a correspondence in the code, it is quite difficult to understand if is working better or not.

Thanks.

MatColgrove · October 18, 2021, 2:49pm

Hi unrue,

I’m not 100% clear what you’re asking, but think you’re asking within an Nsight-Systems profiling timeline if you can track CPU profiling. Nsight-System does have some CPU profiling, but it’s only post-mortem and isn’t included in the time. The volume of samples required to add this support would be unwieldy.

Instead, you’ll want to look at adding NVTX: NVIDIA Tools Extension Library (NVTX) :: NVIDIA Nsight VSE Documentation

NVTX allows you to insert API calls in your code so you insert start and stop points into the profile’s timeline.

You can use NVTX with Fortran as well see: https://developer.nvidia.com/blog/customize-cuda-fortran-profiling-nvtx/
Note that Mass’ NVTX module is included with the compilers so no need to write your own.

Hope this helps,
Mat

unrue · October 18, 2021, 2:58pm

Hi Mat,

thanks. I just would check if in a particular part of the code, I’m having an overlap between computation and memory transfer. So I think such labels satisfy my request. Thanks.

unrue · October 19, 2021, 12:53pm

Hi Mat,

does it works only with NVVP? I’m trying with Nsight System 2021.3.1 but my labels does not appears. In attach the code and visualizations.

add2s2_omp.f (1.9 KB)

MatColgrove · October 19, 2021, 9:26pm

Works for me, but didn’t use Mass’ module, I used the one that comes with the compilers as well as linked with our wrapper library:

% nvfortran add2s2_omp.f -lnvhpcwrapnvtx -mp=gpu
% nsys profile ./a.out
Collecting data...
    7.399999        5.980000      -0.4800002
Processing events...
Capturing symbol files...
Saving temporary "/tmp/nsys-report-24e3-a504-7a0a-c7da.qdstrm" file to disk...

Creating final output files...
Import error: The importation timed out.
Skipping import of the QDSTRM file.
Report file moved to ".report1.qdstrm"

unrue · October 21, 2021, 7:04am

Hi Mat,

from my side does not compile:

nvfortran -L/p/software/juwelsbooster/stages/2020/software/NVHPC/21.9-GCC-10.3.0/Linux_x86_64/21.9/compilers/lib/ -lnvhpcwrapnvtx -mp=gpu -Mcuda=cc80 -Minfo=all add2s2_omp.f -O2 -o add2s2_omp

/p/scratch/prcoe05/fatigati1/nek5000/test/Input/ReTau180/pnpn_omp/add2s2_omp.f:50: undefined reference to nvtx_nvtxstartrange_' /p/software/juwelsbooster/stages/2020/software/binutils/2.36.1-GCCcore-10.3.0/bin/ld: /p/scratch/prcoe05/fatigati1/nek5000/test/Input/ReTau180/pnpn_omp/add2s2_omp.f:67: undefined reference to nvtx_nvtxendrange_’

(the library nvhpcwrapnvtx is present in the assigned path)

/p/software/juwelsbooster/stages/2020/software/NVHPC/21.9-GCC-10.3.0/Linux_x86_64/21.9/compilers/lib/libnvhpcwrapnvtx.a
/p/software/juwelsbooster/stages/2020/software/NVHPC/21.9-GCC-10.3.0/Linux_x86_64/21.9/compilers/lib/libnvhpcwrapnvtx.so

I tried also passing nvtx.f90 in compilation string, same error.

MatColgrove · October 21, 2021, 3:06pm

Do you still have the old “nvtx.mod” file you built from Mass’ nvtx.f90 file in this directory? I’m guessing that it’s picking up this module rather than the one that’s shipped with the compilers.

unrue · October 21, 2021, 3:19pm

Yes exactly. Now it compiles well, but still my labels doe not appears in nsys-ui :/

MatColgrove · October 21, 2021, 4:10pm

Hmm, sorry then I’m not sure. It seems to work fine for me but no idea why it’s not for you.

unrue · October 25, 2021, 8:03am

Hi Mat,

now I see my labels, adding

nvtx.f90 -L/p/software/juwelsbooster/stages/2020/software/NVHPC/21.9-GCC-10.3.0/Linux_x86_64/21.9/cuda/11.0/lib64 -lnvToolsExt

to my compilation string.

Now the problem is, in such labels the code seems to make just cudaFree. Attached the output of nsys.ui. This is my piece of code I want to check:

  call nvtxStartRange("MY_LABEL")

!$OMP TARGET DATA MAP(TOFROM:xbar,bbar,b,alpha) MAP(TO:xx,bb,w) 
!$OMP TARGET DATA use_device_ptr(xbar,xx,bb,bbar,b,w)
  do k = 2,m
     alpha_d = alpha(k)
     call cublasDaxpy(n, alpha_d, xx(:,k), 1, xbar, 1)
     call cublasDaxpy(n, alpha_d, bb(1,k), 1, bbar, 1)
     call cublasDaxpy(n, -alpha_d, bb(1,k), 1, b, 1)
  enddo
!$OMP END TARGET DATA

  do k = 1, m
     if(ifwt) then
        alpha(k) = vlsc3_omp(xx(1,k),w,b,n)
     else
        alpha(k) = vlsc2_omp(xx(1,k),b,n)
     endif
  enddo
!$OMP END TARGET DATA
  call gop(alpha,work,'+  ',m)

!$OMP TARGET DATA MAP(TOFROM:xbar,bbar,b) MAP(TO:xx,bb,alpha) 
!$OMP& use_device_ptr(xbar,xx,bb,bbar,b)

  do k = 1,m
     alpha_d = alpha(k)
     call cublasDaxpy(n, alpha_d, xx(:,k), 1, xbar, 1)
     call cublasDaxpy(n, alpha_d, bb(1,k), 1, bbar, 1)
     call cublasDaxpy(n, -alpha_d, bb(1,k), 1, b, 1)
  enddo
!$OMP END TARGET DATA 

  call nvtxEndRange

Some possible reason? Thanks.

MatColgrove · October 25, 2021, 6:55pm

Again, no idea. I went back an tried the earlier example using Mass’ module and replaced the add2s2_omp calls to cublasSaxpy, but still see the nvtx range in the proper spot in the profile.

I’m guessing it’s pilot error or a system issue, but I’d need a full reproducing example to be sure.

unrue · October 26, 2021, 6:26am

Hi Mat,

the problem is with big code I send you. The little example it works. I could provide you all you need to run an example, but my test case is about 1,6 gigabytes and I don’t have smaller test case. It should be possible?

wyphan · November 3, 2021, 7:13pm

When you were profiling the code with Nsight Systems, did you tell it to specifically trace NVTX? For instance, if you want to trace OpenACC, CUDA, and NVTX, as well as disable sampling (to speed up execution) and show the summary table, you would use the following:

$ nsys profile --stats=true --sample=none -t openacc,cuda,nvtx ./app

MatColgrove · November 3, 2021, 8:17pm

I thought nvtx is part of the default trace? OpenACC isn’t, but that just adds the OpenACC runtime routine profiles.

from “nsys --help profile”

    -t, --trace=
       Possible values are 'cuda', 'nvtx', 'cublas', 'cublas-verbose', 'cusparse', 'cusparse-verbose', 'mpi', 'oshmem', 'ucx', 'osrt', 'cudnn', 'opengl', 'opengl-annotations', 'nvvideo', 'openacc', 'openmp', 'vulkan', 'vulkan-annotations' or 'none'.
       Select the API(s) to trace. Multiple APIs can be selected, separated by commas only (no spaces).
       If '<api>-annotations' is selected, the corresponding API will also be traced.
       If 'none' is selected, no APIs are traced.
       Default is 'cuda,nvtx,osrt,opengl'. Application scope

.

unrue · November 5, 2021, 6:35am

Yes,

this is my command line:

nsys profile -f true --trace=cuda,openmp,nvtx -o outputprofile

(OpenACC is not used in my code)

wyphan · November 5, 2021, 1:08pm

I see. What I guess is happening here is mismatched pairs of nvtxStartRange and nvtxEndRange calls. I personally add a comment to each nvtxEndRange call to track which range it is paired with, e.g.:

call nvtxStartRange("My label")
:
do i = 1, N
  call nvtxStartRange("Inner loop")
  :
  ! Main computation code goes here
  :
  call nvtxEndRange ! Inner loop
end do
:
call nvtxEndRange ! My label

unrue · November 8, 2021, 6:52am

Hi Wileam,

thanks for the suggest, but I have just two region, the first one before the loop, the second one after the loop, so I have not mismatch.

Topic		Replies	Views
NVTX with GPU timing? Profiling Linux Targets	9	1543	October 6, 2023
How to control profiling start time using Nsight System gui like --capture-range=cudaProfilerApi in cli Profiling Linux Targets nsight	12	3432	April 4, 2023
CUDA Pro Tip: Generate Custom Application Profile Timelines with NVTX Technical Blog	6	637	September 19, 2022
NVIDIA Tools Extension API (NVTX): Annotation Tool for Profiling Code in Python and C/C++ Technical Blog	1	609	October 17, 2022
Customize CUDA Fortran Profiling with NVTX Technical Blog	4	517	March 13, 2021
CUDA Pro Tip: nvprof is Your Handy Universal GPU Profiler Technical Blog	35	2445	September 5, 2021
How does nsys tool profile cuda libraries, like cublas, cudnn, etc.? Profiling Linux Targets cudnn , profiling	12	1045	October 25, 2023
Question about profiling nccl kernels with Nsight Compute Nsight Compute	19	4345	August 24, 2023
New 20.7 version , where is the detail release bugfix? nvc, nvc++ and nvfortran	10	1313	September 28, 2020
Profiling only partially works nvc, nvc++ and nvfortran	10	1332	July 21, 2020

NVHPC timeline question

Related topics