Hi, I’m currently trying to use Nsight Systems to profile a GPU application on 4 A100 GPUs. The application uses 4 OpenMPI ranks. I am running it through the command line. Looking at the log files of the job, I can see that the profiler did get initialised in each MPI rank. However, whether or not a qdrep file is created in the end seems to be random, and in most cases none is created. The structure of the command is as follows:
nsys profile --trace=cuda,mpi,osrt,nvtx --output=/outputdir mpirun -np 4 -npernode 4 [mpi options] ./application [application options]
I have also tried to invert the order of the nsys command and mpirun, which I understand has the effect of creating one qdrep file per rank, but still no files created in the end. Are there any known solutions to this issue? Many thanks.