Dynamic global memory allocation

Hi guys, I want to test dynamic global memory allocation. the code is that:

module a_test
  integer,device,allocatable :: b(:)
  contains
    attributes(global) subroutine kernel(a,n)
    integer,value   :: n
        integer,device  :: a(n)
        integer  :: i
        !integer,device,allocatable :: b(:)
        if(threadidx%x==1) then
           allocate(b(n))
        endif
        call  syncthreads()
          i=threadIdx%x
        b(i)=i
        a(i)=b(i)
     end subroutine
end module

program prog
   use a_test
   use cudafor
   implicit none

   integer,parameter  :: n=128
   integer,device     :: a_d(n)
   integer  :: a(n)

   call kernel<<<1,128>>>(a_d,n)
   a=a_d
  
   write(*,*)a(10:20)
end program

I compile the code ,using PVF13.7. ‘’-Mcuda" value is “-Mcuda=cc3.05,cuda5.0,rdc” But there are some problems.

C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s: Assembler messages:
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s:4: Warning: value 0xffed truncated to 0xed
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s:4: Warning: value 0xffba truncated to 0xba
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s:4: Warning: value 0xff98 truncated to 0x98
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s:5: Warning: value 0xff80 truncated to 0x80
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s:6: Warning: value 0xfff6 truncated to 0xf6
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s:13: Warning: value 0xff80 truncated to 0x80
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s:15: Warning: value 0xfff3 truncated to 0xf3
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s:22: Warning: value 0xff80 truncated to 0x80
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s:23: Warning: value 0xfff1 truncated to 0xf1
pgacclnk: spawnv failed: C:\Users\Administrator\AppData\Local\Temp\pgcudafat2d54bcRJcHcR3D.o

I compile the code under CentOS using PGI Workstation, that is OK. But under window OS, I can’t compile it successfully. I don’t know why. Please help me
ps: Is thera any tool to debug CUDA Fortran code?

Hi uestc0626,

Apologies for the late response. I was getting input from engineering an wanting to test the released 13.9 (using CUDA 5.5) before responding. Unfortunately, I’m still seeing various errors when building on Windows with either CUDA 5.0 or 5.5. Hence, I added a problem report (TPR#19604) and sent it on to engineering. There is no work around at this point.

What is your goal with this example? Are you just testing device allocation or do you really want “b” to be a shared array with a length of “n”? If it’s the latter, then the better way to do this is to use a shared automatic array and then pass in the size of the array as the third argument in the kernel launch.

For example:

module a_test 

  contains 

    attributes(global) subroutine kernel(a,n) 

      implicit none 
      integer,value   :: n 
      integer,device  :: a(n) 
      integer :: i 
      integer,shared :: b(n) 


     call  syncthreads() 

      i=threadIdx%x 
      b(i)=i 
      a(i)=b(i) 
     return 
    end subroutine kernel 

  end module 

  program prog 

     use a_test 
     use cudafor 
     implicit none 

     integer,parameter  :: n=128 
     integer,device     :: a_d(n) 
     integer                   :: a(n) 

     call kernel<<<1,128,n*4>>>(a_d,n) 
     a=a_d 

     write(*,*)a(10:20) 

  end program
  • Mat

TPR 19604 - UF: CUDA Fortran device allocation fails to compile on Windows

should now work in the 14.6 and current 14.7 release.

regards,
dave