Nsight Systems not creating qdrep file

Hi, I’m currently trying to use Nsight Systems to profile a GPU application on 4 A100 GPUs. The application uses 4 OpenMPI ranks. I am running it through the command line. Looking at the log files of the job, I can see that the profiler did get initialised in each MPI rank. However, whether or not a qdrep file is created in the end seems to be random, and in most cases none is created. The structure of the command is as follows:

nsys profile --trace=cuda,mpi,osrt,nvtx --output=/outputdir mpirun -np 4 -npernode 4 [mpi options] ./application [application options]

I have also tried to invert the order of the nsys command and mpirun, which I understand has the effect of creating one qdrep file per rank, but still no files created in the end. Are there any known solutions to this issue? Many thanks.

As an update, I have traced the issue to the fact that the process never seems to finish. The error log tells me in the end:

CUDA: mca_common_cuda_fini, never completed initialization so skipping fini, ref_count is now 1

According to OpenMPI’s github,

    /* This call is in here to make sure the context is still valid.
     * This was the one way of checking which did not cause problems
     * while calling into the CUDA library.  This check will detect if
     * a user has called cudaDeviceReset prior to MPI_Finalize. If so,
     * then this call will fail and we skip cleaning up CUDA resources. */

Therefore, it seems that running without the profiler attached finishes the job normally, but for some reason when the profiler is attached, the job hangs presumably on MPI_Finalize(), and never properly exits, thereby never creating a report. Any ideas what the profiler might be doing to cause this?