log() intrinsic uses too many registers

I want to implement this formula in kernel
z = a ^ b
with z, a, b are all double precision. If I use
z = a ** b

there is one 3 more register is required.

If i use
tmp = log(a)
z = exp(tmp * b)

the log() requires about 10 registers for itself. I’m not quite sure why log() use too many registers. Is there a better way?

Thanks,
Tuan

Hi Tuan,

I’ll need a bit more information since I’m not sure what you’re basing your conclusion on. Can you please explain why you think pow uses 3 more registers than required and that log uses too may registers?

  • Mat

Hi Mat,
That was based on the PTXAS output information when I compile the program with -Mcuda=ptxinfo. Is this supposed to be a reliable info?

Tuan

Hi Tuan,

I need an example of what you’re seeing. Also please explain why you believe “there is one 3 more register is required”. Finally, please explain what you mean by “the log() requires about 10 registers for itself”.

  • Mat

My little test program shows the log uses 11 less registers then pow.

% cat testlog.cuf 
module cuda_gen
use cudafor
real*8, device, allocatable:: a_dev(:)
contains

attributes(global) subroutine testme (N,a,b)
use cudafor
integer, value :: N
real*8, value :: a,b
integer ix
#ifdef USE_LOG
real(8) :: tmp
#endif
ix =	(blockidx%x-1)*blockdim%x + threadidx%x
if (ix.lt.N) then
#ifdef USE_LOG
  tmp = log(a)
  a_dev(ix)=exp(tmp*b)
#else
  a_dev(ix)=a**b
#endif
endif

end  subroutine testme

end module cuda_gen

% pgf90 -Mcuda=ptxinfo,keepgpu -Mpreprocess -c testlog.cuf
ptxas info    : Compiling entry function 'testme' for 'sm_13'
ptxas info    : Used 16 registers, 24+16 bytes smem, 96 bytes cmem[0], 60 bytes cmem[1]
ptxas info    : Compiling entry function 'testme' for 'sm_20'
ptxas info    : Used 33 registers, 56 bytes cmem[0], 96 bytes cmem[2], 20 bytes cmem[16]
% pgf90 -Mcuda=ptxinfo,keepgpu -Mpreprocess -c testlog.cuf -DUSE_LOG
ptxas info    : Compiling entry function 'testme' for 'sm_13'
ptxas info    : Used 14 registers, 24+16 bytes smem, 96 bytes cmem[0], 56 bytes cmem[1]
ptxas info    : Compiling entry function 'testme' for 'sm_20'
ptxas info    : Used 22 registers, 56 bytes cmem[0], 96 bytes cmem[2], 20 bytes cmem[16]