Accelerating Exponentiation

As I am debugging my code, a thought occurred to me about the code I was looking at. Namely, should I accelerate code (with the pragmas, or perhaps with CUDA at all) that involves exponentiation? Say I have a code fragment like:

do ik=2,6
 do k=0,np
  do i=1,m
   aa(i,k,ik) = aa(i,k,ik-1)**6
  enddo
 enddo
enddo

Would this be worth accelerating?

Or should I “unroll” the exponentiation so that it involves only multiplications:

do ik=2,6
 do k=0,np
  do i=1,m
   aa(i,k,ik) = aa(i,k,ik-1)*aa(i,k,ik-1)*aa(i,k,ik-1)*aa(i,k,ik-1)*aa(i,k,ik-1)*aa(i,k,ik-1)
  enddo
 enddo
enddo

It’s possible the compiler would do this automatically, but perhaps not. And perhaps the Accelerator pragmas prefer to see multiplies instead of powers (since multiply is a “simple” floating point instruction)? And, perhaps, this is the kind of thing that a GPU just shouldn’t do?!

As you can tell, I’m not a computer engineer, but a scientist by trade, so I’m still getting used to this “thinking” about my programming rather than just transcribing equations and brute forcing.

Hi Matt,

I’m going to guess that the “aaaaaa*…” version is a bit faster in this case but you’ll most likely want to do some experimentation.

Another thing that you’ll want to try experimenting with, is making “ik” the inner most loop. Because of the backward dependency (ik-1), the ik loop needs to be run sequentially. Having it as the outer most loop, it will be run sequentially on the host and launch the CUDA kernel multiple times. Having it as the inner most loop, the sequential section will be run within the CUDA kernel on the GPU.

Though, with the small loop count it may not matter so definitely try it both ways.

Hope this helps,
Mat