OpenACC with DGEMM call error in gfortran

gfortran has host_data support now, so I wanted to test DGEMM from cuBLAS. Based on the test case posted here

https://gcc.gnu.org/ml/gcc-patches/2016-08/msg00976.html

I wrote a sample code for DGEMM using cuBLAS. The test case above tests SAXPY from cuBLAS. I could run it as well as DAXPY.

program test

      use iso_c_binding

      implicit none

      integer(c_int), parameter :: N = 10
      integer(c_int) :: i, j
      real(c_double) :: x(N, N), y(N, N), z(N, N)
      character(kind=c_char)     :: flag

      interface
         subroutine cublasdgemm(transa, transb, m, n, k, alpha, A, lda, B, &
                 ldb, beta, C, ldc) bind(c, name="cublasDgemm")
           use iso_c_binding
           character(kind=c_char)     :: transa, transb
           integer(kind=c_int), value :: m, n, k
           real(c_double), value      :: alpha
           type(*), dimension(*)      :: A
           integer(kind=c_int), value :: lda
           type(*), dimension(*)      :: B
           integer(kind=c_int), value :: ldb
           real(c_double), value      :: beta
           type(*), dimension(*)      :: C
           integer(kind=c_int), value :: ldc

         end subroutine cublasdgemm

      end interface

      do i = 1, N
         do j = 1, N
           x(i, j) = 4.0 * i
           y(i, j) = 3.0 + j
           z(i, j) = 0.0
         end do
      end do

      flag = 'N'

      !$acc data copyin (x, y) copy (z)

      !$acc host_data use_device (x, y, z)
      call cublasdgemm(flag, flag, n, n, n, 1.0_c_double, x, n, y, n, 0.0_c_double, z, n)
      !$acc end host_data

      !$acc end data

      write(*, *) z

      call dgemm(flag, flag, n, n, n, 1.0_c_double, x, n, y, n, 0.0_c_double, z, n)

      write(*, *) z

    end program test

Unfortunately I get this error.

** On entry to DGEMM  parameter number 1 had an illegal value

And the numbers are all zero.

It seems to me that there’s some mismatch in the character data type. But I can’t figure it out. I put the DGEMM call with the same variables at the end and it works perfectly.

Thanks for any help.

COMPILATION:

To compile this I use gfortran 6.2 compiled based on the instructions at this link

https://github.com/olcf/OLCFHack15

I then copy from

/usr/local/cuda/src/

the files

fortran_common.h
fortran.h
fortran.c

and then do

gcc -Wall -g -I/usr/local/cuda/include -I/usr/local/cuda/src -DCUBLAS_GFORTRAN -c fortran.c

to get the fortran.o file for cuBLAS interface.

Then I do

gfortran -Wall -g test.f90 fortran.o -fopenacc -foffload=nvptx-none -foffload=-O3 -O3 -o gpu.x -L/usr/local/cuda/lib64 -lcublas -lcudart -lblas

This the process I used to successfully run the saxpy example in the first link.