Dynamic global memory allocation

linux0626 · September 26, 2013, 4:51am

Hi guys, I want to test dynamic global memory allocation. the code is that:

module a_test
  integer,device,allocatable :: b(:)
  contains
    attributes(global) subroutine kernel(a,n)
    integer,value   :: n
        integer,device  :: a(n)
        integer  :: i
        !integer,device,allocatable :: b(:)
        if(threadidx%x==1) then
           allocate(b(n))
        endif
        call  syncthreads()
          i=threadIdx%x
        b(i)=i
        a(i)=b(i)
     end subroutine
end module

program prog
   use a_test
   use cudafor
   implicit none

   integer,parameter  :: n=128
   integer,device     :: a_d(n)
   integer  :: a(n)

   call kernel<<<1,128>>>(a_d,n)
   a=a_d
  
   write(*,*)a(10:20)
end program

I compile the code ,using PVF13.7. ‘’-Mcuda" value is “-Mcuda=cc3.05,cuda5.0,rdc” But there are some problems.

C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s: Assembler messages:
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s:4: Warning: value 0xffed truncated to 0xed
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s:4: Warning: value 0xffba truncated to 0xba
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s:4: Warning: value 0xff98 truncated to 0x98
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s:5: Warning: value 0xff80 truncated to 0x80
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s:6: Warning: value 0xfff6 truncated to 0xf6
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s:13: Warning: value 0xff80 truncated to 0x80
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s:15: Warning: value 0xfff3 truncated to 0xf3
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s:22: Warning: value 0xff80 truncated to 0x80
C:\Users\Administrator\AppData\Local\Temp\pgnvd3bzu6blDNefEUF.s:23: Warning: value 0xfff1 truncated to 0xf1
pgacclnk: spawnv failed: C:\Users\Administrator\AppData\Local\Temp\pgcudafat2d54bcRJcHcR3D.o

I compile the code under CentOS using PGI Workstation, that is OK. But under window OS, I can’t compile it successfully. I don’t know why. Please help me
ps: Is thera any tool to debug CUDA Fortran code?

MatColgrove · September 27, 2013, 4:56pm

Hi uestc0626,

Apologies for the late response. I was getting input from engineering an wanting to test the released 13.9 (using CUDA 5.5) before responding. Unfortunately, I’m still seeing various errors when building on Windows with either CUDA 5.0 or 5.5. Hence, I added a problem report (TPR#19604) and sent it on to engineering. There is no work around at this point.

What is your goal with this example? Are you just testing device allocation or do you really want “b” to be a shared array with a length of “n”? If it’s the latter, then the better way to do this is to use a shared automatic array and then pass in the size of the array as the third argument in the kernel launch.

For example:

module a_test 

  contains 

    attributes(global) subroutine kernel(a,n) 

      implicit none 
      integer,value   :: n 
      integer,device  :: a(n) 
      integer :: i 
      integer,shared :: b(n) 


     call  syncthreads() 

      i=threadIdx%x 
      b(i)=i 
      a(i)=b(i) 
     return 
    end subroutine kernel 

  end module 

  program prog 

     use a_test 
     use cudafor 
     implicit none 

     integer,parameter  :: n=128 
     integer,device     :: a_d(n) 
     integer                   :: a(n) 

     call kernel<<<1,128,n*4>>>(a_d,n) 
     a=a_d 

     write(*,*)a(10:20) 

  end program

Mat

tull · July 25, 2014, 12:15am

TPR 19604 - UF: CUDA Fortran device allocation fails to compile on Windows

should now work in the 14.6 and current 14.7 release.

regards,
dave

Topic		Replies	Views
Problems with the allocate statements in device subroutines Legacy PGI Compilers	4	6963	July 25, 2014
Cannot allocate variables from used modules Legacy PGI Compilers	2	3453	November 9, 2010
CUDA Fortran : device variable in module Legacy PGI Compilers	7	17396	February 19, 2010
Global device variables CUDA Fortran Legacy PGI Compilers	2	4718	April 10, 2015
Automatically Combine Allocation and Memory Copies Legacy PGI Compilers	4	3199	November 30, 2023
Error running simple CUDA Fortran program Legacy PGI Compilers	9	21316	February 26, 2010
The output is wrong! it seems gpu doesnt do the work Legacy PGI Compilers	3	1454	October 31, 2018
CUD Fortran - Device allocatable variable in and c_f_pointer Legacy PGI Compilers	2	3651	April 15, 2011
Used of shared memory in device function Legacy PGI Compilers	18	19990	September 17, 2015
CUDA Fortran strange behaviour using automatic arrays and kinds in kernels Legacy PGI Compilers	3	3981	February 11, 2020

Dynamic global memory allocation

Related topics