Timing for NVSHMEM program

Hi, All

How could I do timing on a NVSHMEM program?
Since there are multiple threads, put the timer on each PE will return multiple time measurements.

Should I just do MAX(timePE1, timePE2, ...., timePEn) to get the overall kernel execution time for NVSHMEM?

Thanks