Trouble with cublas and matrix inverse in Fortran

Hello!

I am trying to do a matrix inverse using CUDA Fortran. I’ve been working at it all night to no avail. Here is the program:


! Compile with “pgfortran -Mcuda=cuda8.0 inv1.cuf -Mcudalib=cublas -lblas”
PROGRAM inv1
use cudafor
use cublas





real,allocatable,device :: A(:),C(:)
INTEGER, allocatable,device :: ipiv(:)
INTEGER istat, status
type(cublasHandle) :: h
INTEGER :: n,k1,nk
INTEGER :: lda
INTEGER :: ldc
INTEGER, device :: info(:)
INTEGER :: batchCount
INTEGER DI, I, J
Real time1,time2
real :: mat(25000),mata(25000)
external cublasSgetrfBatched,cublasSgetriBatched
istat=cudaSetDevice(0)


! input matrix data to be inversed
OPEN(1,FILE=“a1.txt”)
READ(1,) DI
n=DI
lda=DI
ldc=DI
write(
,*) 'Matrix size is : ',DI/2,‘x’,DI/2
nk = DI

k1=1
DO I=1,DI
DO J=1,DI
READ(1,*)mata(k1)
k1=k1+1
END DO
END DO
close(1)
! Allocate


! Allocate(A(nk))
! Allocate(C(nk))

A=mata

! initialize cublas
print *, ‘Cublas starts’
! cublas SOLVES
allocate(ipiv(DI**2))

! start
call CPU_Time(time1)
print *, ‘Starting CUBLAS (Host interface)’
status = cublasCreate(h)
! call cublas
status = cublasSgetrfBatched(h, nk/2, A, nk/2, ipiv, info,batchCount)
status = cublasSgetriBatched(h, nk/2, A, nk/2, ipiv, C, DI,info, batchCount)
! stop
mata = C
status=cudaDeviceSynchronize()
call CPU_Time(time2)
print *, 'Time spent GPU: ', time2-time1

print *,mata(1:4)
status = cublasDestroy(h)

END


And my error messages are:
[hodgess@gpu047 ~]$ pgf90 inv1.cuf -Mcuda=cuda8.0 -Mcudalib=cublas -lblas -mcmodel=medium
PGF90-S-0188-Argument number 3 to cublassgetrfbatched: type mismatch (inv1.cuf: 61)
PGF90-S-0188-Argument number 3 to cublassgetribatched: type mismatch (inv1.cuf: 62)
PGF90-S-0188-Argument number 6 to cublassgetribatched: type mismatch (inv1.cuf: 62)
0 inform, 0 warnings, 3 severes, 0 fatal for inv1

The argument 3 is A. I have tried it as a matrix and as a long vector. Am totally stumped and giving up. Any help much appreciated.

Sincerely,
Erin

Are you just doing an inverse on 1 matrix? Then you shouldn’t be using the batched routines. You can see some Fortran specific documentation here:

https://www.pgroup.com/resources/docs/18.3/x86/pgi-cuda-interfaces/index.htm