How to call cublas library into my cuda fortran code?

Hi,
I am a new user of CUDA Fortran.
I use the same cuda fortran code in this forum post:https://forums.developer.nvidia.com/t/how-to-use-or-call-cublas-library/131834/1

I compile with “pgfortran -Mcuda -o test_cublasSgemm_gpu test_cublasSgemm.F90 C:\cuda\lib\cublas.lib”.
Then I got an error
“test_cublasSgemm.obj : error LNK2019: unresolved external symbol cublasSgemm referenced in function MAIN_
test_cublasSgemm_gpu.exe : fatal error LNK1120: 1 unresolved externals”


Based on Mat’s recommendation,

"because of Win32 calling conventions, you need to add an “@” followed by the size in bytes of the argument list to symbol decoration. For sgemm this is “@52”. "
“Also, Win64 uses a different calling convention and does not need the “@”.”

But I am using win7 64-bit. What’s Win64 calling convention?

Please help.

Thanks,
Luzhang

Hi Luzhang,

One possibility is that the entry symbol name is not “cublasSgemm” in the library you’re using. If this is the case, then you’ll need to adjust the name in the ISO C Binding clause.

Though, more likely you have a version conflict. By default, the PGI 2011 compilers generate CUDA 3.2 objects. If you are using a CUDA 4.0 cublas library, then this could cause issues. To fix, use the “-Mcuda=4.0” flag.

Finally, you can try using the cublas libraries that we ship with the compilers. They seem to work for me.

Note that I assume that you modified the code to remove the condition compilation (i.e. the #ifdef) since I get undefined external references to “sgemm_” without it.

  • Mat
PGI$ pgf90 -D_CUDAFOR -Mcuda=4.0 -V11.9 cublasSgemm.f90 -Mpreprocess C:\\Program\ Files\\PGI\\win64\\2011\\cuda\\4.0\\lib64\\cublas.lib
cublasSgemm.f90:
PGI$
PGI$
PGI$ pgf90 -D_CUDAFOR -Mcuda=4.0 -V11.9 cublasSgemm.f90 -Mpreprocess C:\\Program\ Files\\PGI\\win64\\2011\\cuda\\4.0\\lib64\\cublas.lib -o test40.exe
cublasSgemm.f90:
PGI$ pgf90 -D_CUDAFOR -Mcuda -V11.9 cublasSgemm.f90 -Mpreprocess C:\\Program\ Files\\PGI\\win64\\2011\\cuda\\3.2\\lib64\\cublas.lib -o test32.exe
cublasSgemm.f90:
PGI$ test40.exe
 Enter N:
1024
 Checking results....
 Total Time:    1.7000001E-02
 Total SGEMM gflops:     126.3226
 Done....
PGI$ test32.exe
 Enter N:
1024
 Checking results....
 Total Time:    1.5000000E-02
 Total SGEMM gflops:     143.1656
 Done....

Hi Mat,
Thank you for reply.

Based on your recommendation, I compiled with

PGI$ pgf90 -D_CUDAFOR -Mcuda=3.2 -V11.7 cublasSgemm.F90  C:\PGI\win64\2011\cuda\3.2\lib64\cublas.lib -o test32.exe 
PGI$ test32.exe 
 Enter N: 
1024 
 Checking results.... 
 Total Time:    2.7000000E-02 
 Total SGEMM gflops:     79.53643 
 Done....



PGI$ pgf90 -D_CUDAFOR -Mcuda=4.0 -V11.7 cublasSgemm.F90  C:\PGI\win64\2011\cuda\4.0\lib64\cublas.lib -o test40.exe 
PGI$ test40.exe 
 Enter N: 
1024 
0: ALLOCATE: 4194304 bytes requested; ststus= 35<CUDA driver version is insufficient for CUDA runtime version

Would you please tell me how I can solve this problem?

Then I tried to use the cublas libraries that ship with the compilers. I modified the code by removing the interface and the condition compilation (i.e. the #ifdef).

program cublasSgemm
use cudafor
use cublas
real, device, allocatable, dimension(:,:) :: dA, dB, dC
......
call sgemm('n','n', n, n, n, alpha, dA, n, dB, n, beta, dC, n)
......
end

Is it right?

thanks,
Luzhang

Hi Luzhang,

The error messages says:

CUDA driver version is insufficient for CUDA runtime version

Hence, you need to update your CUDA device driver in order to use CUDA 4.0. Goto http://developer.nvidia.com/cuda-toolkit-40 and look for the “Developer Driver” links.

Then I tried to use the cublas libraries that ship with the compilers. I modified the code by removing the interface and the condition compilation (i.e. the #ifdef).
Is it right?

No. You must have an explicit interface when calling CUDA routines. Without an interface, F77 calling conventions are used and will result in errors.

Hope this helps,
Mat

Hi Mat,
Thank you for reply.

It works after I update my CUDA device driver.

Now I still have a question. I remove the interface and the condition compilation (i.e. the #ifdef) by palcing the “use cublas” statement in the host-code.

program cublasSgemm 
use cudafor 
use cublas 
real, device, allocatable, dimension(:,:) :: dA, dB, dC 
...... 
call sgemm('n','n', n, n, n, alpha, dA, n, dB, n, beta, dC, n) 
...... 
end

It works!

Then I use

call cublasSgemm('n','n', n, n, n, alpha, dA, n, dB, n, beta, dC, n)

It gets the same result as

call sgemm('n','n', n, n, n, alpha, dA, n, dB, n, beta, dC, n)

Would you please tell me the reason?
If I need to use the other CUBLAS routines, any informations or suggestions about it?

thanks,
Luzhang

Would you please tell me the reason?

When using the cublas module, “sgemm” is a generic interface which maps to cublasSgemm. Though unlike cublasSgemm which must be called device arrays, “sgemm” can be called with either host or device arrays.

If I need to use the other CUBLAS routines, any informations or suggestions about it?

The complete list of CUBLAS routines can be found at http://developer.download.nvidia.com/compute/DevZone/docs/html/CUDALibraries/doc/CUBLAS_Library.pdf. It my understanding that our cublas module has an interfaces for all currently available cublas routines. We also have an example on how to use cublas in chapter 5 of the CUDA Fortran Users Guide http://www.pgroup.com/doc/pgicudafortug.pdf

Hope this helps,
Mat