This is another “might be a bug, might be a feature” question. In my ongoing testing (viz. this forum and my current near-spamming of it), a driver-kernel I made up seemed to show an interesting oddity. Usually, in my code, I tend to use MPI_Wtime as a Fortran timer since it’s fairly portable, looks at wall time, and is one of those few MPI bits that doesn’t care about MPI that much.
But, I noticed that when I compile my driver with FC=mpif90 rather than FC=pgifortran, the -ta=nvidia,time timing information isn’t outputted. My compile string is:
$(FC) -fast -Kieee -Minfo=all,accel -r4 -Mextend -Mpreprocess -Ktrap=fp -ta=nvidia,time
I then run my code, which is uniprocessor, without mpirun or anything, just ./command.exe. With pgfortran, I get the Accelerator Kernel Timing data, with mpif90, I do not. I’ve even removed all the bits of MPI from ‘use mpi’ to ‘MPI_Finalize(ierr)’ and this is consistent even then.
I suppose my question now is, is this expected behavior? That is, because of the nature of mpif90, the wrapper is shuttling the timing data to /dev/null or some other stream? Or should mpif90 output this data, albeit without any guarantee of when and where and how many processes do so if you run on >1 process (like most WRITEing with MPI)?