Problems with the allocate statements in device subroutines

OceanCloud · September 25, 2013, 2:14pm

Hi,

I have been troubling by the allocate statements in the device subroutines for quite a few days, and I can’t figure it out, however.

I give the compile environment firstly, I guess the compile option is the key to this problem.

Compile Environment:

PGI Visual Fortran 13.8 x64, Windows 7 x64, Visual Studio 2010

Fortran | Command Line
-gopt -Bstatic -Mbackslash -Mcuda=nofma,cuda5.0,cc35,cc3x,rdc -I"c:\program files\pgi\win64\13.8\include" -I"C:\Program Files\PGI\Microsoft Open Tools 11\include" -I"C:\Program Files (x86)\Windows Kits\8.0\Include\shared" -I"C:\Program Files (x86)\Windows Kits\8.0\Include\um" -Minform=inform

Linker | Command Line
-gopt -Bstatic -Mcuda=nofma,cuda5.0,cc35,cc3x,rdc -o “D:\Research\Programming\Routine\CUDA Fortran\test\x64\Debug\test.exe”

I wrote a test routine as follows:

module a_test

contains
  
  attributes(global) subroutine kernel(a,n)

    implicit none
    integer,value   :: n
    integer,device  :: a(n)
    integer :: i
    integer,device,allocatable :: b(:)
        
    if (threadidx%x==1) then
       allocate(b(n))
    end if
    
	call  syncthreads()
          
    i=threadIdx%x
    b(i)=i
    a(i)=b(i)
	return
  end subroutine kernel

end module

program prog

   use a_test
   use cudafor
   implicit none

   integer,parameter  :: n=128
   integer,device     :: a_d(n)
   integer                   :: a(n)

   call kernel<<<1,128>>>(a_d,n)
   a=a_d
  
   write(*,*)a(10:20)

end program

When I ran it, errors occured at link procedure.

error message:

Compiling Project  ...
Linking...
pgnvd-Error-Required tool nvlink was not found
pgnvd... looked for nvlink at c:\program files\pgi\win64/2013/cuda/5.0/bin\nvlink
child process exit with signal 1: c:\program files\pgi\win64\13.8\bin\pgnvd.exe
test build failed.

I searched the nvlink file on the whole disk and copyed it to the file folder, I ran it another time, and errors occured again.

error message:

Linking...
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bty4b5LcTr5vB.s: Assembler messages:
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bty4b5LcTr5vB.s:4: Warning: value 0xffed truncated to 0xed
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bty4b5LcTr5vB.s:4: Warning: value 0xffba truncated to 0xba
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bty4b5LcTr5vB.s:5: Warning: value 0xff80 truncated to 0x80
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bty4b5LcTr5vB.s:6: Warning: value 0xffb3 truncated to 0xb3
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bty4b5LcTr5vB.s:13: Warning: value 0xff80 truncated to 0x80
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bty4b5LcTr5vB.s:15: Warning: value 0xffb7 truncated to 0xb7
test build failed.

Have I set the wrong compiler options and how can I do with the nvlink file?

MatColgrove · September 27, 2013, 4:50pm

Hi OceanCloud,

Apologies for the late response. I was getting input from engineering an wanting to test the released 13.9 (using CUDA 5.5) before responding. Unfortunately, I’m still seeing various errors when building with either CUDA 5.0 or 5.5. Hence, I added a problem report (TPR#19604) and sent it on to engineering. There is no work around at this point.

On a side note, the code isn’t correct in that you have a local copy of “b” but only allocate it for one thread. I’m assuming you’re either just testing allocation, or that you really want “b” to be a shared array with a length of “n”. If it’s the latter, then the better way to do this is to use a shared automatic array and then pass in the size of the array as the third argument in the kernel launch.

For example:

module a_test

 contains

   attributes(global) subroutine kernel(a,n)

     implicit none
     integer,value   :: n
     integer,device  :: a(n)
     integer :: i
     integer,shared :: b(n)


    call  syncthreads()

     i=threadIdx%x
     b(i)=i
     a(i)=b(i)
    return
   end subroutine kernel

 end module

 program prog

    use a_test
    use cudafor
    implicit none

    integer,parameter  :: n=128
    integer,device     :: a_d(n)
    integer                   :: a(n)

    call kernel<<<1,128,n*4>>>(a_d,n)
    a=a_d

    write(*,*)a(10:20)

 end program

Mat

OceanCloud · September 29, 2013, 2:27am

Many thanks, Mat

I just want to test the allocate statement.

The reason for the allocation problem is the compile itself, and I’d better do it under the Linux system, isn’t it?

MatColgrove · September 30, 2013, 5:02pm

We have better support for this on Linux (the nvlink error goes away), but I’m still seeing the same errors that I see on Windows. Why this particular small example fails while others do not, I’m not sure. I’ll need to wait for engineering to investigate TPR#19604.

Mat

tull · July 25, 2014, 12:16am

TPR 19604 - UF: CUDA Fortran device allocation fails to compile on Windows

should now work in the 14.6 and current 14.7 release.

regards,
dave