cublasDmatinvBatched typemismatch

Hi,

I’m a beginner in OpenACC, CUDA and making a Fortran program using OpenACC. In the program, I’d like to use “cublasDmatinvBatched” to compute an inverse matrix of an N-by-N (N=6 or more) square matrix. After compiling, type mismatch error appeared.
NVFORTRAN-S-0188-Argument number 3 to cublasdmatinvbatched: type mismatch (InverseMatrix.f90: 28)
NVFORTRAN-S-0188-Argument number 5 to cublasdmatinvbatched: type mismatch (InverseMatrix.f90: 28)
Could you tell me how to solve it?
The program I made is shown below and I compiled with
“nvfortran -acc -Minfo=accel -O2 -cudalib=cublas InverseMatrix.f90 -o InverseMatrix”.


module Inverse
use cudafor
use openacc
use cublas
implicit none
contains

SUBROUTINE MATINV(N,A,Ainv)

type(cublasHandle) :: handle			!input
integer :: N											!input
double precision :: A(N*N)				!input
integer :: lda										!input
double precision :: Ainv(N*N)			!output
integer :: lda_inv								!input
integer :: infomatinv(N)					!output
integer :: batchCount = 1					!input
integer :: istat

lda = N
lda_inv = N

istat = cublasCreate(handle)

!$acc data copyin(N,A,lda,lda_inv) copyout(Ainv,infomatinv)
!$acc host_data use_device(N,A,lda,Ainv,lda_inv)

istat = istat + cublasDmatinvBatched(handle, N, A, lda, Ainv, lda_inv, infomatinv, batchCount)

!$acc end host_data
!$acc end data

istat = istat + cublasDestroy(handle)

RETURN
END SUBROUTINE

END module Inverse

Program InverseMatrix
use Inverse
implicit none

integer,parameter :: N = 6
integer :: i
double precision :: A(N*N),Ainv(N*N)

DO i=1,N*N
	A(i) = i
END DO

call MATINV(N,A,Ainv)
read(*,*)

END Program InverseMatrix


Hi shokun_k_7154,

Calling cuBLAS “batched” routines are bit more work since they take an array of C device pointers as arguments rather than CUDA Fortran “device” arrays.

Greg did a nice “how-to” on this a few years ago that may be helpful: CUDA Pro Tip: How to Call Batched cuBLAS routines from CUDA Fortran | NVIDIA Developer Blog

Note that Greg explicitly writes out the cuBLAS interfaces since he did this prior to us shipping the cublas interface module. You wont need to do this, rather you can see the interface we provide in our CUDA Fortran Interfaces documentation at: NVIDIA Fortran CUDA Library Interfaces Version 21.9 for ARM, OpenPower, x86

Hope this helps,
Mat

I remade a simple program using “cublasDmatinvBatched”.
I could compile it, but I couldn’t the value of “Ainv”.
Could you tell me how to get the value of it?

Program InverseMatrix
use cudafor
use cublas
use iso_c_binding
implicit none

integer,parameter :: N=3,	lda=N, lda_inv=N, batchCount=1
double precision :: A(N,N,batchCount), Ainv(N,N,batchCount)
type(c_devptr) :: devPtrA(batchCount)
type(c_devptr) :: devPtrAinv(batchCount)
type(cublashandle) :: h
integer :: infomatinv(N)
integer :: i, k, istat


A(:,:,:)=0.0d0

do k=1,batchCount
	do i=1,N
		A(i,i,k) = 2.0d0*k
	end do
end do

do k = 1, batchCount
	devPtrA(k) = c_devloc(A(:,:,k))
	devPtrAinv(k) = c_devloc(Ainv(:,:,k))
end do

istat = cublasCreate(h)
istat = cublasDmatinvBatched(h, N, devPtrA, lda, devPtrAinv, lda_inv, infomatinv, batchCount)
istat = cublasDestroy(h)

write(*,*) "A=", A
write(*,*) "Ainv=", Ainv

End Program InverseMatrix

The interface is expecting device arrays. While “devPtrA” contains device pointers, it is itself a host array. Hence you need to pass in a “type(c_devptr), device ::devPtrA_d(batchCount)” array instead. Same for “devPtrAinv” and an “integer, device” array for “infomatinv”.

Greg shows this in his example, but I updated your code as well with this change:

Program InverseMatrix
use cudafor
use cublas
use iso_c_binding
implicit none

integer,parameter :: N=3,       lda=N, lda_inv=N, batchCount=1
double precision :: A(N,N,batchCount), Ainv(N,N,batchCount)
type(c_devptr) :: devPtrA(batchCount)
type(c_devptr) :: devPtrAinv(batchCount)
type(c_devptr),device :: devPtrA_d(batchCount)
type(c_devptr),device :: devPtrAinv_d(batchCount)
type(cublashandle) :: h
integer :: infomatinv(N)
integer,device :: infomatinv_d(N)
integer :: i, k, istat

A(:,:,:)=0.0d0

do k=1,batchCount
        do i=1,N
                A(i,i,k) = 2.0d0*k
        end do
end do

do k = 1, batchCount
        devPtrA(k) = c_devloc(A(:,:,k))
        devPtrAinv(k) = c_devloc(Ainv(:,:,k))
end do
devPtrA_d(k)=devPtrA(k)
devPtrAinv_d(k)=devPtrAinv(k)

istat = cublasCreate(h)
istat = cublasDmatinvBatched(h, N, devPtrA_d, lda, devPtrAinv_d, lda_inv, infomatinv_d, batchCount)
istat = cublasDestroy(h)

write(*,*) "A=", A
write(*,*) "Ainv=", Ainv
End Program InverseMatrix

Thank you for making the program, but I couldn’t get the values of the inverse matrix in the program.
Also, I’d like to make a fortran program using OpenACC.
How do you think I should correct this program?

InverseMatrix3.f90 (1005 Bytes)

Thank you for making the program, but I couldn’t get the values of the inverse matrix in the program.

I think I forgot to add “device” versions of A and Ainv since I was more focused on the mechanics of getting the device pointers.

Using OpenACC to managed the device data would look something like the following:

% cat InverseMatrix.f90
Program InverseMatrix
use cudafor
use openacc
use cublas
use iso_c_binding
implicit none

integer,parameter :: N=3, lda=N, lda_inv=N, batchCount=1
double precision :: A(N,N,batchCount), Ainv(N,N,batchCount)
type(c_devptr) :: devPtrA(batchCount)
type(c_devptr) :: devPtrAinv(batchCount)
type(cublashandle) :: h
integer :: infomatinv(N)
integer :: i, k, istat

A(:,:,:)=0.0d0

do k=1,batchCount
        do i=1,N
                A(i,i,k) = 2.0d0*k
        end do
end do
!$acc enter data copyin(A) create(Ainv)
do k = 1, batchCount
         devPtrA(k) = acc_deviceptr(A(:,:,k))
         devPtrAinv(k) = acc_deviceptr(Ainv(:,:,k))
end do
!$acc enter data copyin(devPtrA,devPtrAinv,infomatinv)

istat = cublasCreate(h)
!$acc host_data use_device(devPtrA, devPtrAinv, infomatinv)
istat = cublasDmatinvBatched(h, N, devPtrA, lda, devPtrAinv, lda_inv, infomatinv, batchCount)
!$acc end host_data
istat = cublasDestroy(h)

!$acc update self(A,Ainv)
write(*,*) "A=", A
write(*,*) "Ainv=", Ainv

!$acc exit data delete(A,Ainv,devPtrA,devPtrAinv,infomatinv)
End Program InverseMatrix

% nvfortran -cuda -cudalib=cublas -acc InverseMatrix.f90 ; a.out
 A=    2.000000000000000         0.000000000000000
    0.000000000000000         0.000000000000000         2.000000000000000
    0.000000000000000         0.000000000000000         0.000000000000000
    2.000000000000000
 Ainv=   0.5000000000000000         0.000000000000000
    0.000000000000000         0.000000000000000        0.5000000000000000
    0.000000000000000         0.000000000000000         0.000000000000000
   0.5000000000000000