complex sine?

Hi,

I am trying to port a scientific code to a GPU (Nvidia GTX 580). It has been working fine so far, but now I get this error after including the sine of a complex number. It appeared to me from the documentation that sine and cosine of complex numbers is supported.

jand@liard:~/CUDA/plane_wave_forward$ pgfortran -Mcuda cuda_plane_reflcoeff.f90
PGF90-F-0000-Internal compiler error. Unsupported procedure 0 (cuda_plane_reflcoeff.f90: 166)
PGF90/x86-64 Linux 11.10-0: compilation aborted

Line 166 in the code:
ztmp = SIN(CMPLX(1.,1.))

Edit: I am using cuda fortran with version 11.10 under ubuntu 10.04 linux

Any help would be greatly appreciated, Jan

Hi Jan,

FYI, I needed to send this on to our compiler team since I’m not sure if Complex SIN has been implemented yet, and if not, when they are planning to add it. No word back yet, but I’ll keep you posted once I know more.

Thanks,
Mat

Thanks very much.

Jan

Hi Mat,

so far, I have just implemented the complex sine myself which is fine too (a real intrinsic could be quite a bit more efficient I guess).

One problem I ran into now is that it’s not clear to me if I can write a function (e.g., complex sine) which I can call from the device kernel. Can you point me to some examples for calling functions from a kernel?

Thanks, Jan

If you want to run a function from an attributes(global) subroutine, you can make an attributes(device) function that is in the same module. attributes(device) procedures can’t be PURE, ELEMENTAL, or RECURSIVE, but other than that, they are just a normal function that is compiled for use on the device.

Hi Jan,

Can you point me to some examples for calling functions from a kernel?

The “sgemm.cuf” example that comes with the compilers uses a basic “device” subrouitne which might help. It’s located in “$PGI//<pgi_release>/etc/samples” (i.e. /opt/pgi/linux86-64/12.3/etc/samples/sgemm.cuf on 64-bit Linux).

  • Mat

Thanks theMatt and Mat. The examples were helpful, I have it working now.

-Jan

Hi Mat,

I have implemented my own trigonometric functions for complex numbers now. However, please still let me know what plans are to implement them as intrinsics. I am interested, since I am not very good at writing ultra efficient code. The kind of computations I am doing have hundreds of thousands of calls to TAN, ATAN, SIN, ASIN, COS, ACOS, all for complex numbers. If would be great to have efficient intrinsics available for these in future releases.

Cheers, Jan

Hi,

I was wondering if there were any updates on complex number support for intrinsic functions.

Thanks, Jan

This isn’t official, but sin at least seems to work in accelerator pragma space:

(246) > cat test.F90 
program test

implicit none

complex, dimension(10,10) :: theta,sintheta,hostsintheta
integer :: i, j

do i = 1, 10
   do j = 1, 10
      theta(i,j) = cmplx(real(i),real(j))
   end do
end do

hostsintheta = sin(theta)

!$acc region
do i = 1, 10
   do j = 1, 10
      sintheta(i,j) = sin(theta(i,j))
   end do
end do
!$acc end region

write (*,*) 'sin(theta) on accelerator: ', sintheta(1,:)
write (*,*) 'sin(theta) on host       : ', hostsintheta(1,:)

end program test
(247) > pgfortran -ta=nvidia,time -Minfo=all test.F90
test:
     16, Generating copyout(sintheta(:,:))
         Generating copyin(theta(:,:))
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
     17, Loop is parallelizable
     18, Loop is parallelizable
         Accelerator kernel generated
         17, !$acc do parallel, vector(10) ! blockidx%x threadidx%x
         18, !$acc do parallel, vector(10) ! blockidx%y threadidx%y
             CC 1.0 : 15 registers; 40 shared, 160 constant, 56 local memory bytes
             CC 2.0 : 20 registers; 8 shared, 140 constant, 4 local memory bytes
(248) > ./a.out
 sin(theta) on accelerator:   (1.298458,0.6349639)  (3.165779,1.959601)  
 (8.471646,5.412681)  (22.97909,14.74480)  (62.44553,40.09216)  
 (169.7379,108.9861)  (461.3929,296.2565)  (1254.195,805.3091)  
 (3409.255,2189.057)  (9267.316,5950.475)
 sin(theta) on host       :   (1.298458,0.6349639)  (3.165779,1.959601)  
 (8.471645,5.412681)  (22.97909,14.74480)  (62.44552,40.09216)  
 (169.7379,108.9861)  (461.3929,296.2564)  (1254.195,805.3091)  
 (3409.255,2189.057)  (9267.316,5950.475)

Accelerator Kernel Timing data
/home/mathomp4/F90Files/ComplexSin/test.F90
  test
    16: region entered 1 time
        time(us): total=314182 init=313134 region=1048
                  kernels=27 data=145
        w/o init: total=1048 max=1048 min=1048 avg=1048
        18: kernel launched 1 times
            grid: [1]  block: [10x10]
            time(us): total=27 max=27 min=27 avg=27

Hi Jan,

Appears that this was just an oversight. We got these in for the Accelerator Model but missed CUDA Fortran. Our engineers will add it to our pre-release source base tomorrow and assuming testing goes well, will be available in 12.6.

  • MAt

Great, thanks very much for the update.

Jan