Hello!
I write a very simple example to test the dynamic parallelism by PGI Visual Fortran 13.10, the source codes are listed below:
Kernel.cuf:
module Kernel
real,allocatable,device :: a_d(:,:)
integer,parameter :: m=10,n=4
contains
attributes(global) subroutine FatherKernel()
implicit none
integer :: i,istat
i=blockDim%x*(blockIdx%x-1)+threadIdx%x
if (i<=m) then
call ChildKernel<<<2,2>>>(i)
istat=cudaDeviceSynchronize()
end if
end subroutine
attributes(global) subroutine ChildKernel(row)
implicit none
integer,value :: row
integer :: i
i=blockDim%x*(blockIdx%x-1)+threadIdx%x
if (i<=n) then
a_d(row,i)=row*10+i
end if
end subroutine
end module
Main.f90:
program Main
use Kernel
use cudafor
implicit none
real,allocatable :: a(:,:)
integer :: istat
allocate(a(m,n),a_d(m,n))
a=0.0
a_d=a
call FatherKernel<<<2,5>>>()
istat=cudaDeviceSynchronize()
a=a_d
print *,a
deallocate(a,a_d)
pause
stop
end program
The compile command is:
-Bstatic -Mbackslash -Mcuda=cc35,ptxinfo,rdc -I"C:\Program Files\CULA\R17\include" -I"C:\Program Files (x86)\PGI\win32\2013\cuda\5.0\include" -I"C:\Program Files (x86)\PGI\win32\2013\cuda\5.5\include" -I"c:\program files (x86)\pgi\win32\13.10\include" -I"C:\Program Files\PGI\Microsoft Open Tools 11\include" -I"C:\Program Files (x86)\Windows Kits\8.0\Include\shared" -I"C:\Program Files (x86)\Windows Kits\8.0\Include\um" -fastsse -ta=nvidia,cc35 -Minform=warn -Minfo=accel
and:
-Bstatic -Mcuda=cc35,ptxinfo,rdc -ta=nvidia,cc35 -o “C:\Users\KanGuangyuan\Desktop\DynamicParallelism\DynamicParallelism\Win32\Release\DynamicParallelism.exe” cudadevrt.lib
However, the compelation failed, the error message is:
Compiling Project …
Kernel.cuf
ptxas : info : 96 bytes gmem
ptxas : info : Function properties for cudaMalloc
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas : info : Function properties for cudaFuncGetAttributes
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas : info : Compiling entry function ‘kernel_fatherkernel_’ for ‘sm_35’
ptxas : info : Function properties for kernel_fatherkernel_
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas : info : Used 17 registers, 320 bytes cmem[0]
ptxas : info : Compiling entry function ‘kernel_childkernel_’ for ‘sm_35’
ptxas : info : Function properties for kernel_childkernel_
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas : info : Used 7 registers, 324 bytes cmem[0]
Main.f90
Linking…
reading response file C:\Users\KANGUA~1\AppData\Local\Temp\pgfortran2apGEbTLdblpRC.lnk
child process exit with signal 2: c:\program files (x86)\pgi\win32\13.10\bin\pgnvd.exe
DynamicParallelism build failed.
and a error message box pumped up, with the prompt message:
“nvlink.exe has stopped working”
How to solve this? Can PGI Visual Fortran compiler utilize the dynamic parallelism?
Nightwish