Hi
I am facing performance problems of my code. I have several subroutine calls in a main program. all the subroutines are ported to the GPU. The time spent in the loops of the subroutines should be all in the order of 10^-3 to 10^-4 seconds. When I compile the program and run it several times, there are always three loops that are VERY slow (order of 1-2 seconds). Strange thing is that this times are jumping around, meaning that for one execution loop3 in subroutine1 has 2sec while as when I compile and execute the program again then loop6 in subroutine1 has about 2s…
Please check the following output:
**********************************************************
Time for Sub1,Loop1 [s] = 8.2000000000000001E-005
Time for Sub1,Loop2 [s] = 2.0699999999999999E-004
Time for Sub1,Loop3 [s] = 2.107871000000000 !!!
Time for Sub1,Loop4 [s] = 1.5999999999999999E-004
Time for Sub1,Loop5 [s] = 8.0599999999999997E-004
Time for Sub1,Loop6 [s] = 8.3400000000000002E-003
Time for Sub1,Loop7 [s] = 2.4399999999999999E-004
Time for Sub1,Loop8 [s] = 1.0300000000000000E-004
Time for Sub1,Loop9 [s] = 8.2999999999999998E-005
Time for Sub1,Loop10[s] = 2.2551999999999999E-002
**********************************************************
Time for Sub2,Loop1 [s] = 1.7500000000000000E-004
Time for Sub2,Loop2 [s] = 8.3999999999999995E-005
Time for Sub2,Loop3 [s] = 7.0999999999999991E-005
Time for Sub2,Loop4 [s] = 2.7799999999999998E-004
Time for Sub2,Loop5 [s] = 1.3799999999999999E-004
Time for Sub2,Loop6 [s] = 6.3000000000000000E-005
**********************************************************
Time for Sub3,Loop1 [s] = 8.6379999999999998E-003
Time for Sub3,Loop2 [s] = 0.2743080000000000 !!!
Time for Sub3,Loop3 [s] = 8.5999999999999990E-005
**********************************************************
Time for Sub4,Loop1 [s] = 1.5999999999999999E-004
Time for Sub4,Loop2 [s] = 1.7799999999999999E-004
Time for Sub4,Loop3 [s] = 2.636425000000000 !!!
**********************************************************
*********************************************************
Time for Sub1,Loop1 [s] = 6.7000000000000002E-005
Time for Sub1,Loop2 [s] = 1.5233000000000000E-002
Time for Sub1,Loop3 [s] = 1.1600000000000000E-004
Time for Sub1,Loop4 [s] = 1.3799999999999999E-004
Time for Sub1,Loop5 [s] = 2.4163000000000000E-002
Time for Sub1,Loop6 [s] = 3.7700000000000000E-004
Time for Sub1,Loop7 [s] = 1.9899999999999999E-004
Time for Sub1,Loop8 [s] = 6.9999999999999994E-005
Time for Sub1,Loop9 [s] = 6.4999999999999994E-005
Time for Sub1,Loop10[s] = 1.978296000000000 !!!
**********************************************************
Time for Sub2,Loop1 [s] = 1.6899999999999999E-004
Time for Sub2,Loop2 [s] = 7.4999999999999993E-005
Time for Sub2,Loop3 [s] = 7.3999999999999996E-005
Time for Sub2,Loop4 [s] = 2.5799999999999998E-004
Time for Sub2,Loop5 [s] = 1.3500000000000000E-004
Time for Sub2,Loop6 [s] = 6.7999999999999999E-005
**********************************************************
Time for Sub3,Loop1 [s] = 8.6379999999999998E-003
Time for Sub3,Loop2 [s] = 0.2743080000000000 !!!
Time for Sub3,Loop3 [s] = 8.5999999999999990E-005
**********************************************************
Time for Sub4,Loop1 [s] = 8.0800000000000002E-004
Time for Sub4,Loop2 [s] = 1.442023000000000 !!!
Time for Sub4,Loop3 [s] = 8.7599999999999993E-004
**********************************************************
Is it possible that from time to time the launch of a new kernel might lead to a 1 to 2 second breakdown of the program? That the GPU stands still? To me it seems like arbitrary behaviour where the slow loop is. I studyied also the compiler message carefouly and there is no device-host data transfer during execution!
Thank you very much!!!