Nsight does not recognize Cuda Fortran programs compiled by mpif90

My team previously had a set of mpi parallel Fortran programs compiled using Ifort for fluid mechanics solutions. At present, we reconstructed the original solver using Cuda Fortran, retained the corresponding interface of mpi, compiled it using mpif90 compiler, and successfully realized GPU parallel computation. However, there was a bottleneck in computing performance. We tried to use Nsight System for performance testing, but currently Nsight could not recognize the Cuda part of our program and could not conduct detailed analysis.
Is this the mpi part of the program affected, should I delete mpi? Or for some other reason. I need help.

Are you seeing an error or is the CUDA Fortran kernels not showing up in the profile?

Nsight System should have no issue with profiling CUDA Fortran kernels so if the kernels are not showing up in profile, it’s more likely that they are not actually running on the GPU.

I’d recommend double checking that the CUDA Fortran code is getting invoked and check the return codes from the kernel launches to ensure that they are running successfully.

Is this the mpi part of the program affected, should I delete mpi? Or for some other reason. I need help.

No need to remove MPI. It would have no effect on the GPU code. Note that Nsight-Systems can profile your MPI code as well when using the “–trace mpi” option.

Thank you so much for your reply.

  1. As you said, CUDA Fortran kernels cannot be displayed correctly in profile. However, I used “nvidia-smi” to check my programs, and the GPU occupancy rate when the program was running can indicate the successful use of the GPU. At the same time, the computing speed is 10 times faster than the single-core CPU.
  2. The profile does not currently show any analysis of MPI or CUDA Fortran. I will try to recompile by adding the “-trace mpi” option.

Thanks for your help again.
Hongwei

“–trace” is a command line option to nsys, so no need to recompile. You just need to rerun the profile.

Again, nsys should be able to profile CUDA Fortran kernels so I suspect something else is going on. What might be helpful, is if you can share more details such as the command line you’re using to profile the code. It should look something like:

nsys profile --trace=mpi,cuda mpirun -np 2 a.out

Thanks for your reminder. I think the possible reason is that I have a problem with my command line Settings.


As shown in the figure above, my working directory is“/home/hongwei/gpu/LESPDF/03_Ma_0.8_DNS/examples”. The compiled executable file “LESFDF.exe” exists in the directory. Normally, I would use the script “simulation.sh” to invoke this executable file while the program is running.

The contents of simulation.sh are : “
#!/bin/bash

script for LESFDF.exe run

HOMDIR=~/gpu/LESPDF
EXEDIR=$HOMDIR/Programming/LESAPP3D/bin
EXEFIL=LESFDF.exe
export MPI_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/comm_libs/openmpi/openmpi-3.1.5/bin

setup Nums of Processor

nProcs=1

echo “"
echo “Start LESFDF.exe simulation”
echo “Numbers of Processors : $nProcs”
echo "

WRKDIR=~/gpu/LESPDF/03_Ma_0.8_DNS/examples

cd $WRKDIR

mkdir download

cd download

mkdir HPDF_FLD HPDF_PTCL LES_FLD

cd …

prepare LESFDF.exe file

if [ ! -x “$EXEFIL” ]; then
cp $EXEDIR/$EXEFIL ./
echo “$EXEFIL is updated”
fi

reset stack

ulimit -s unlimited

running program

$MPI_PATH/mpirun --use-hwthread-cpus -np $nProcs ./$EXEFIL > SCREEN &
” .

As I described, usually I run my “simulation.sh” script directly. So I didn’t add anything else to the command line. Just enter the “simulation.sh” script to call. I would like your help to modify this command line.

According to your suggestion last time, I preliminarily modified the command line and tested it, and got the following results:




The diagnosis report suggested that I was paranoid level and I didn’t quite understand. Adding the “–environment” option doesn’t seem to work.

Since I run exclusively on remote headless systems, I’ve not launched a profile from the GUI, though I wouldn’t think you’d want to include the “nsys profile …” as part of the target application. Try removing everything before “mpirun”. What I’m not sure about is if the GUI’s environment will know where to find mpirun or the LESFDF.exe binary hence you’re not profiling the application at all.

If you don’t mind, lets try collecting the profile from the command line by updating your shell script to use nsys and running “simmulation.sh”.

nsys profile --trace=cuda,mpi --force-overwrite=true -o myprofile $MPI_PATH/mpirun --use-hwthread-cpus -np $nProcs ./$EXEFIL > SCREEN &

Rename the output file from “myprofile” to something more descriptive.

Then run “nsys stats myprofile.nsys-rep” to see the text version of the profile, or open the report in the GUI to see the timeline.

If this works, then I may need to send you over to the Nsight-Systems forum for help with configuring the GUI.

Dear Mat,
Thank you so much for your help! It works!
Now I can generate a profile file about my CUDA Fortran solver from the command line “nsys” and view the report using the GUI. I can analyze the proportion of corresponding kernel functions in the report, thank you very much for your help!
Hongwei