gfortran has host_data support now, so I wanted to test DGEMM from cuBLAS. Based on the test case posted here
https://gcc.gnu.org/ml/gcc-patches/2016-08/msg00976.html
I wrote a sample code for DGEMM using cuBLAS. The test case above tests SAXPY from cuBLAS. I could run it as well as DAXPY.
program test
use iso_c_binding
implicit none
integer(c_int), parameter :: N = 10
integer(c_int) :: i, j
real(c_double) :: x(N, N), y(N, N), z(N, N)
character(kind=c_char) :: flag
interface
subroutine cublasdgemm(transa, transb, m, n, k, alpha, A, lda, B, &
ldb, beta, C, ldc) bind(c, name="cublasDgemm")
use iso_c_binding
character(kind=c_char) :: transa, transb
integer(kind=c_int), value :: m, n, k
real(c_double), value :: alpha
type(*), dimension(*) :: A
integer(kind=c_int), value :: lda
type(*), dimension(*) :: B
integer(kind=c_int), value :: ldb
real(c_double), value :: beta
type(*), dimension(*) :: C
integer(kind=c_int), value :: ldc
end subroutine cublasdgemm
end interface
do i = 1, N
do j = 1, N
x(i, j) = 4.0 * i
y(i, j) = 3.0 + j
z(i, j) = 0.0
end do
end do
flag = 'N'
!$acc data copyin (x, y) copy (z)
!$acc host_data use_device (x, y, z)
call cublasdgemm(flag, flag, n, n, n, 1.0_c_double, x, n, y, n, 0.0_c_double, z, n)
!$acc end host_data
!$acc end data
write(*, *) z
call dgemm(flag, flag, n, n, n, 1.0_c_double, x, n, y, n, 0.0_c_double, z, n)
write(*, *) z
end program test
Unfortunately I get this error.
** On entry to DGEMM parameter number 1 had an illegal value
And the numbers are all zero.
It seems to me that there’s some mismatch in the character data type. But I can’t figure it out. I put the DGEMM call with the same variables at the end and it works perfectly.
Thanks for any help.
COMPILATION:
To compile this I use gfortran 6.2 compiled based on the instructions at this link
https://github.com/olcf/OLCFHack15
I then copy from
/usr/local/cuda/src/
the files
fortran_common.h
fortran.h
fortran.c
and then do
gcc -Wall -g -I/usr/local/cuda/include -I/usr/local/cuda/src -DCUBLAS_GFORTRAN -c fortran.c
to get the fortran.o file for cuBLAS interface.
Then I do
gfortran -Wall -g test.f90 fortran.o -fopenacc -foffload=nvptx-none -foffload=-O3 -O3 -o gpu.x -L/usr/local/cuda/lib64 -lcublas -lcudart -lblas
This the process I used to successfully run the saxpy example in the first link.