Hi all,
I am experimenting with the kernel loop directives, or the so called CUF kernels in Fortran for a naive matrix-by-matrix multiplication of arbitrary sizes. I use pgfortran from the PGI/18.4 Community Edition. I copy-paste the code, the compilation arguments and the error below.
I receive the following compilation error, which I have hard time to comprehend. Seems like the compiler manages to do part of the job, but not the whole! Do you have any thoughts here?
Thanks
Ehsan
Code:
program main
! use cudafor
implicit none
integer, parameter :: sp = selected_real_kind(6)
integer, parameter :: dp = selected_real_kind(15)
integer, parameter :: n = 5500, m = 3400, p = 4000
real(dp) :: a(n, m), b(m, p), c(n, p), builtin(n, p)
real(dp), device :: a_dev(n, m), b_dev(m, p), c_dev(n, p), val_dev
real(dp) :: val, err
real(sp) :: tic, toc, dt
integer :: i, j, k
call random_number(a)
call random_number(b)
call cpu_time(tic)
a_dev = a; b_dev = b ! Host-to-Device transfer
!$cuf kernel do (2) <<<(*,*) , (*,*)>>>
do j = 1, p
do i = 1, n
val_dev = 0d0
do k = 1, m
val_dev = val_dev+a_dev(i,k)*b_dev(k,j)
enddo
c_dev(i, j) = val_dev
enddo
enddo
c = c_dev ! Device to Host transfer
call cpu_time(toc)
dt = toc - tic
err = maxval(abs(matmul(a, b) - c))
write(*, '(a, e23.16, a, f8.4)') 'max error occured = ', err, &
'; dt [sec] = ', dt
end program main
The compilation step:
export CUFFLAGS='-Mcuda=cc6.0,cuda8.0 -Minfo=all'
pgfortran -g -O2 $CUFFLAGS -Minfo -c matmul_cuf.f90 -o matmul_cuf.o
main:
24, CUDA kernel generated
24, !$cuf kernel do <<< (*,*), (32,4) >>>
36, maxval reduction inlined
Generated vector simd code for the loop containing reductions
Generated 2 prefetch instructions for the loop
nvvmCompileProgram error: 9.
Error: /node_scratch/20825499.moab.tier2.leuven.vsc/pgcudafor8aie0ZbnCBfK.gpu (115, 10): parse stored value and pointer type do not match
PGF90-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (matmul_cuf.f90: 1)
PGF90/x86-64 Linux 18.4-0: compilation aborted