Hi, All
How could I do timing on a NVSHMEM program?
Since there are multiple threads, put the timer on each PE will return multiple time measurements.
Should I just do
MAX(timePE1, timePE2, ...., timePEn)
to get the overall kernel execution time for NVSHMEM?
Thanks