I’m new to nvidia HPC compilers. My goal is starting to move to GPU (we have just acquired à A100GPU). I try first to profile a fortran HPC code with the -pg option of pgfortran but I can’t build the binary. I get
/usr/bin/ld: cannot find -lpgnod_prof_g
and they are no libpgnod_prof_g* file under /opt/nvidia
Installed from rpm in CentOS8, 20.9 or 21.2 versions.
Any help welcome.
Sorry about that. Gprof profiling was disabled for a bit after we disabled the old pgprof profiling. It should be available again in an upcoming release.
In the meantime, you can try using Nsight-Systems (nsys profile -b dwarf <a.out>). The CPU profile can be viewed by opening the generated profile in the GUI and selecting the CPU view (top-down, bottom-up, flat) in the “Events” tab. Note that it’s helpful to compile your code with “-g” or “-gopt” to have the Dwarf symbols available to the profiler.
thanks for your answer. Indeed I have a workshop this week and a GPU Hackathon with… nvidia support! I was trying to be ready before starting this morning…
I’ve used nsys but it do not seams to be very stable… With my parallel code it ends with a memory error in the provided openMPI libs (fails on 32, 16, 8 and 4 processes). It runs on 2 processes buts seams to be stuck after my mpi_finalize call…
The code alone runs what ever the number of cores.
[tenibre-gpu-0:432348] Signal: Bus error (7)
[tenibre-gpu-0:432348] Signal code: Non-existant physical address (2)
[tenibre-gpu-0:432348] Failing at address: 0x7fe1c4b2c280
[tenibre-gpu-0:432348] [ 0] /opt/nvidia/hpc_sdk/Linux_x86_64/ 20.9 /comm_libs/openmpi/openmpi-3.1.5/lib/libopen-pal.so.40(+0x9cc3a)[0x7fe1ea2ebc3a]
Sorry, I’m not sure what’s going on here. Nsight-systems does have issues when the profile gets very large. You might need to increase the sampling period or run fewer time steps.
From “nsys profile --help”:
--sampling-period= Possible values are integers between 4000000 and 125000. The number of CPU Instructions Retired events counted before a CPU instruction pointer (IP) sample is collected. If configured, call stacks may also be collected. The smaller the sampling period, the higher the sampling rate. Note that lower sampling periods will increase overhead and significantly increase the size of the result file(s). Default is '1000000'. Application scope.
Another fall back CPU profiler is “perf”, though I haven’t used it much myself nor know if it will handle multiple ranks. Or if you have access to Score-P or TAU, they usually can handle large MPI programs.
I’m assuming you’re attending the HZDR/Julich Hackathon? Julia and Mazhgan usually try to have folks from the profiler team available if you have questions, though with the time zone differences I’m not sure.
One thing I thought of is are you running “nsys profile mpirun” or “mpirun nsys profile”? Using nsys before mpirun is the correct form.
--sampling-period is only available in 21.2 version, not in 20.9 and I have another problem with the 21.2 SDK version.
One of the problems was that nsys was filling /tmp with gigabytes of datas. I set TMP, TMPDIR and TEMP to alternative storage and create a link from /tmp/nvidia to a larger storage area. I’ve launch the profiling on only one process with:
mpirun -np 1 nsys profile --stats=true --sample=cpu ./my_app: -np 15 ./my_app
and i have generated a report. Now I have to explore the datas…
I got a similar error whenever I enable the
-Minstrument compiler flag:
/usr/bin/ld: cannot find -lpgnod_prof_inst
Is this related?
Yes. -Minstrument will use the same instrumentation as is done with -pg.