Time travel with PGIs system_clock implementation

ofuhrer1 · February 26, 2010, 11:33am

Hi all,

For a long time there has been a problem with timing our application (the COSMO numerical weather prediction and climate model) using the Portland Group compiler, since negative/strange values were sometimes present in the timing output. Now that I’ve finally found some time to investigate this issue, I’ve stumbled upon two of the sources of the problem. PGI’s system_clock command does not seem to be standard conformant. Two facts violate the standard (see example below):

The clock count is not always monotonically increasing
When a overflow in the counter occurs, the clock count does not restart with 0 but with -count_max

Any suggestions?
Oli

=================================

olifu@rosa1:~[226]> cat test_clock.f90
program test_clock
implicit none
integer :: icount,ir,im,iold
real*8 :: rtime
call system_clock (count=icount,count_rate=ir,count_max=im)
iold=icount
write(*,'(5A15)') 'current','old','rate (1/s)','max','deltaT (s)'
do
  ! get current count
  call system_clock (count=icount)
  ! normal way to compute elapsed time in COSMO
  if (icount >= iold) then
    rtime = dble(icount-iold)/dble(ir)
  else
    rtime = dble(im-(iold-icount))/dble(ir)
  end if
  ! only output when something strange or counter overflow
  if (icount < iold) then
    write(*,'(4I15,F15.7)') icount,iold,ir,im,rtime
  end if
  ! save for next loop iteration
  iold = icount
end do
end program test_clock

olifu@rosa1:~[227]> pgf90 test_clock.f90 -o test_clock

olifu@rosa1:~[228]> ./test_clock
        current            old     rate (1/s)            max     deltaT (s)
        2122468        2122469        1000000     2147483647   2147.4836460
        2990456        2990457        1000000     2147483647   2147.4836460
       16206258       16206259        1000000     2147483647   2147.4836460
       16806248       16806250        1000000     2147483647   2147.4836450
       17538238       17538239        1000000     2147483647   2147.4836460
       17738234       17738235        1000000     2147483647   2147.4836460
...
     2011928352     2011928353        1000000     2147483647   2147.4836460
     2042503892     2042503894        1000000     2147483647   2147.4836450
     2095707096     2095707097        1000000     2147483647   2147.4836460
     2142302397     2142302398        1000000     2147483647   2147.4836460
    -2147483648     2147483647        1000000     2147483647  -2147.4836480
    -2129233250    -2129233249        1000000     2147483647   2147.4836460
    -2128229264    -2128229263        1000000     2147483647   2147.4836460

MatColgrove · February 26, 2010, 8:54pm

Hi Oli,

The clock count is not always monotonically increasing

My best guess is that your process is being moved by the OS from one core to another. This would lead to this type of behavior. Try binding to a single core to see if it still occurs.

If it does, please let me know what system you’re using and compiler version. So far I’ve tired 8.0, 9.0, and 10.2 on a variety of systems, but haven’t been able to recreate it.

When a overflow in the counter occurs, the clock count does not restart with 0 but with -count_max

This is a bug. I’ve created a report (TPR#16667) and sent it to our engineers.

Note that in next weeks 10.3 release, we’ll have support for an INTEGER*8 version of SYSTEM_CLOCK (when compiled with -i8). This will increase the MAXCOUNT to 9223372036854775807.

Thanks,
Mat

ofuhrer1 · February 26, 2010, 9:58pm

Hi Matt,

Thanks for the quick reply.

The clock count is not always monotonically increasing
My best guess is that your process is being moved by the OS from one core to another. This would lead to this type of behavior. Try binding to a single core to see if it still occurs.

Your absolutely correct. Setting processor affinity (see below) removed the jitter for problem 1.

olifu@rosa3:~[159]> taskset -p 0x00000001 17913
pid 17913's current affinity mask: 3
pid 17913's new affinity mask: 1

I’ve been running this on a Cray XT5. You’re explanation also explains why part of the problem only occurred when running on the service nodes. The compute nodes have core affinity for each process setup by default.

When a overflow in the counter occurs, the clock count does not restart with 0 but with -count_max
This is a bug. I’ve created a report (TPR#16667) and sent it to our engineers.

Perfect. For the moment I will have to use a workaround. As we are not admins on these machines I’ll forward the information to the people in charge of the PGI compiler installation.

Thanks,
Oli

tull · May 5, 2010, 9:17pm

Your problem with SYSTEM_CLOCK not resetting to zero after MAXCOUNT
reached has been corrected in the current 10.4 release.

Thanks again for your submission.

regards,
dave

Topic		Replies	Views
Questions about system_clock() Legacy PGI Compilers	2	2649	January 5, 2020
call system_clock Legacy PGI Compilers	2	10924	April 5, 2005
CALL SYSTEM_CLOCK and COUNT_RATE Legacy PGI Compilers	2	3419	October 1, 2009
Code execution time measurements - strange Legacy PGI Compilers	2	5348	October 4, 2010
clock()? Legacy PGI Compilers	1	2056	June 14, 2011
How to get system time for a RNG seed? Legacy PGI Compilers	4	6514	October 12, 2009
Determining running timing problem Legacy PGI Compilers	16	7067	May 10, 2012
is mclock() supported in PGI fortran? Legacy PGI Compilers	1	2819	October 26, 2012
clock() function on titanX CUDA Programming and Performance	22	2483	March 12, 2022
Profiler timing routine? Legacy PGI Compilers	2	4309	June 13, 2012

Time travel with PGIs system_clock implementation

Related topics