Comparing CPU and GPU codes : which time function to use ?

Hello,

I have a CPU code and a GPU code which gives me the same results, and I would like to compare them in terms of speed. But I don’t know what time function should I use for both.

In fact, I am more interested for comparing the time of a for loop, between the CPU and the GPU code. The for loop of the CPU code is obviously full of functions running on the CPU, but the for loop of the GPU code contains functions running on the CPU and kernels running on the GPU (+transfers).

So what time function(s) could I use to compare time performance between a CPU and a GPU code ? For example, does cpu_time will work for the GPU code since there’s kernel running on the GPU ?

Thank you for your attention.

Here is a timer dclock_64.s written in assembler that is fairly good.

real*8 dclock, time1, time2

time1=dclock()

section to time

time2=dclock() - time1 ! time2 is the time for the section.

pgfortran foo.f90 dclock_64.s -o foo
to compile and link.

dave


        .file   "dclock-hammer.s"
        .align    8
        .data
# .clock:  .double 0.0000000075          # 133MHz
# .clock:  .double 0.00000000666667      # 150 MHz
# .clock:  .double 0.0000000060          # 166 MHz
# .clock:  .double 0.0000000050          # 200 MHz
# .clock:  .double 0d4.28571429183673480000e-09 # 233.3333 MHz
# .clock:  .double 0.00000000333333      # 300 MHz
# .clock:  .double 0.000000002857142     # 350 MHz
# .clock:  .double 0.0000000025          # 400 MHz
# .clock:  .double 0.00000000222223      # 450 MHz
# .clock:  .double 0.000000002           # 500 MHz
# .clock:  .double 0.0000000018181818    # 550 MHz
# .clock:  .double 0.0000000016666667    # 600 MHz
# .clock:  .double 0.0000000015384615    # 650 MHz
# .clock:  .double 0.0000000014285714    # 700 MHz
# .clock:  .double 0.0000000013658095    # 733 MHz
# .clock:  .double 0.0000000013333333    # 750 MHz
# .clock:  .double 0.0000000012500000    # 800 MHz
# .clock:  .double 0.000000001           # 1.0 GHz
# .clock:  .double 0.000000000750187     # 1.33GHz
# .clock:  .double 0.000000000714        # 1.4 GHz
# .clock:  .double 0.000000000666        # 1.5 GHz
# .clock:  .double 0.000000000625        # 1.6 GHz
# .clock:  .double 0.00000000059         # 1.7 GHz
# .clock:  .double 0.0000000005556       # 1.8 GHz
# .clock:  .double 0.0000000005          # 2.0 GHz
# .clock:  .double 0.000000000455        # 2.2 GHz
# .clock:  .double 0.000000000417        # 2.4 GHz
 .clock:  .double 0.000000000376        # 2.66 GHz
# .clock:  .double 0.000000000357        # 2.8 GHz
# .clock:  .double 0.0000000003333       # 3.0 GHz
# .clock:  .double 0.0000000003125       # 3.2 GHz
# .clock:  .double 0.0000000002777       # 3.6 GHz

.low:   .long 0x00000000
.high:  .long 0x00000000
        .text

        .globl   _DCLOCK, dclock, _dclock, _dclock_, dclock_
_DCLOCK:
dclock:
_dclock:
_dclock_:
dclock_:
        .byte   0x0f, 0x31

        movl    %eax, .low(%RIP)
        movl    %edx, .high(%RIP)

        fildll  .low(%RIP)
        fmull   .clock(%RIP)
        fstpl   -24(%rsp)
        movsd   -24(%rsp), %xmm0
        ret[i][/i]

It seems not to work for me, I get this error at the end of my compiling :

dclock_64.s:
dclock_64.s: Assembler messages:
dclock_64.s:56: Error: invalid character '[' in mnemonic

Is it a problem with this line perhaps ? :

ret[i][/i]

Yes the last line should be

ret

__ added at the end is probably the effect of inserting “Code” .

I will key an eye on that in the future.

dave

Nice it works very well ! It is more accurate than the cpu_time() function in fortran (for the gpu code), thanks !