Calling cuBlas from a Fortran program

Joseph_A · June 2, 2020, 8:07am

Hi,

I’m trying to call cuBlas from a Fortran program, but somehow the codes does not compile.
The error message is:

PGF90-S-0084-Illegal use of symbol cublasdcopy - attempt to use a SUBROUTINE as a FUNCTION (main.f90: 14)

What is wrong with this code?

Thank you for your help

The code is:

 PROGRAM test
       use cublas
       implicit none
       integer n,i,ierr
       type(cublasHandle) :: h
       real*8,device,allocatable :: x(:)
       real*8,device,allocatable :: y(:)
       real*8,device,allocatable :: z(:)
       n=10e6
       allocate(x(n))
       allocate(y(n))
       allocate(z(n))
       h = cublasGetHandle()
       ierr = cublasDcopy(h,10,x,1,y,1)
      end PROGRAM test

makefile

CC=pgfortran

OBJS=main.o
OPTS=-mp -tp=skylake -fast -mcmodel=medium -m64 -cpp -acc -Minfo=acc -ta=tesla:cc70 -Mcuda -Mcudalib=cublas

%.o: %.f90
        ${CC} ${OPTS} -c $<

all: myProgram
myProgram: main.o
        ${CC} ${OPTS} -o myProgram main.o
myProg:main.o
        ${CC} ${OPTS} -c $<

MatColgrove · June 2, 2020, 2:30pm

Hi Peter85,

Yes, this is a bit confusing.

cuBlas changed their interfaces a bit ago. When you use “cublas”, you’re using the v1 interface where “cublasdcopy” is a subroutine that does not include a handle as the first argument. Though if you use “cublasdcopy_v2” instead, then you’re using the v2 interface where it’s a function with a handle. Alternatively, you can use “cublas_v2” instead of “cublas”, in which case “cublasdcopy” will be using the v2 interface.

The complete interfaces can be found our CUDA Fortran Library Interfaces Guide (https://www.pgroup.com/resources/docs/18.3/pdf/pgi18cudaint.pdf). In particular see pages 36 and 106.

Hope this helps,
Mat

Joseph_A · June 3, 2020, 8:35am

Thanks for the info. I will try it! Yes, it is confusing. Is it recommended to use the the v2 interface?

MatColgrove · June 3, 2020, 9:17pm

Is it recommended to use the the v2 interface?

Yes.

Joseph_A · June 5, 2020, 7:49am

I’m now trying to use cublasDgemmStridedBatched_v2, but I get the same error message (cublasDcopy_v2 only works).

 PROGRAM test
       use cublas_v2
       implicit none
       integer n,i,ierr
       type(cublasHandle) :: h
       real*8,device,allocatable :: x(:)
       real*8,device,allocatable :: y(:)
       real*8,device,allocatable :: z(:)
       real*8 a,b
       n=10
       a=1.0d0
       b=1.0d0
       allocate(x(n))
       allocate(y(n))
       allocate(z(n))
       h = cublasGetHandle()
       ierr = cublasDcopy_v2(h,10,x,1,y,1)
       ierr = cublasDgemmStridedBatched_v2(h,CUBLAS_OP_N,CUBLAS_OP_N,&
               1,1,1,a, x,1,1,y,1,1,0,b ,z,1,1,1)
       write(*,*)"Programend"
      end PROGRAM test

MatColgrove · June 5, 2020, 3:10pm

Hi Peter,

You have an extra argument in the call and why the generic procedure can’t be resolved. To fix remove the “0” in “1,1,0,b ,z”.

-Mat

Joseph_A · June 8, 2020, 1:16am

Thank you very much! It worked! I oversaw this extra parameter.

Joseph_A · June 8, 2020, 7:56am

I have another question regarding mixing cublas and OpenACC.
Do I have to call cudaDeviceSynchronize() after I called a cublas function if
I have OpenACC kernels after the cuBLAS call? Do cuBLAS and OpenACC both use the same stream?

Thank you for your help!

  !$acc host_data use_device(dBlocks_gpu,r,s)
  ierr = cublasDgemmStridedBatched_v2(h,CUBLAS_OP_N,CUBLAS_OP_N,&
                                   bSize,1,bSize,&
                                   1.0d0,dBlocks_gpu,bSize,mSize,&
                                   r,bSize,bSize, 0.0d0, s,bSize,bSize, n/bSize)
  ierr = cudaDeviceSynchronize()
  !$acc end host_data
   
  ! More OpenACC loops

MatColgrove · June 8, 2020, 1:13pm

The cuBlas call will block waiting for the return code. So while it doesn’t hurt, adding the cudaDeviceSynchronize isn’t needed.

-Mat

brentl · June 8, 2020, 4:39pm

Mat, this isn’t necessarily true. For absolute safeness, you can run cublas and your openacc kernels on the same stream. If you use an openacc async number of 5, for instance, you can do this:
ierr = cublasSetStream(h, acc_get_cuda_stream(5))
If you use the default stream everywhere, you will be fine. Or add cudaDeviceSynchronize as you said.

Topic		Replies	Views
Problem running test_cublas sample Legacy PGI Compilers	7	6272	May 2, 2013
PGF90-S-0155-Could not resolve generic procedure cublasdgemm Legacy PGI Compilers	4	3940	November 10, 2018
openacc with cublas batched routine in fortran Legacy PGI Compilers	7	8243	January 27, 2017
How to call cublas library into my cuda fortran code? Legacy PGI Compilers	5	5960	December 8, 2011
NVFORTRAN-S-0155-Could not resolve generic procedure for cublas nvc, nvc++ and nvfortran	1	115	July 28, 2024
Error using cublas on OpenACC Legacy PGI Compilers	5	593	April 18, 2023
cublas part 2 Legacy PGI Compilers	2	770	September 3, 2019
cuBLAS Dgemm "Could not Resolve Generic Procedure nvc, nvc++ and nvfortran cuda , hpc	7	1488	December 17, 2021
Calling cuda with cublas_v2 from fortran CUDA Programming and Performance	4	581	March 2, 2017
Using stream and cublas with cuda fortran Legacy PGI Compilers	7	11345	April 27, 2016

Calling cuBlas from a Fortran program

Related topics