Problem running test_cublas sample

Hi,

I have installed and am using PGI Visual Fortran. I would like to get as much familiar as possible with CUDA Fortran to be able to use it in my project.
I tried to run the sample provided in PGI package for testing cublas routine sgemm. The sample is provided in the folder “…\PGI\win64\13.3\samples\cudafor\test_cublasSgemm.F90”.
The code of this sample is as follows:

!
!
!      Copyright 2009-2010, STMicroelectronics, Incorporated.
!      All rights reserved.
!
!        STMICROELECTRONICS, INCORPORATED PROPRIETARY INFORMATION
! This software is supplied under the terms of a license agreement
! or nondisclosure agreement with STMicroelectronics and may not be
! copied or disclosed except in accordance with the terms of that
! agreement.

!
! An example of how to call the cublas single precision matrix multiply
! routine cublasSgemm
!
! Build for running on the host:
!   pgfortran -o test_cublasSgemm_host test_cublasSgemm.F90 -lblas
!
! Build for running on the gpu:
!   pgfortran -Mcuda -D_CUDAFOR -o test_cublasSgemm_gpu test_cublasSgemm.F90 -lcublas
!


program test_cublasSgemm
#ifdef _CUDAFOR
use cudafor

interface
  subroutine sgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc ) bind(c,name='cublasSgemm')
   use iso_c_binding
   integer(c_int), value :: m, n, k, lda, ldb, ldc
   real(c_float), device, dimension(m,n) :: a, b, c
   real(c_float), value :: alpha, beta
   character(kind=c_char), value :: transa, transb
  end subroutine sgemm
end interface

real, device, allocatable, dimension(:,:) :: dA, dB, dC
#endif

real, allocatable, dimension(:,:) :: a, b, c
real :: alpha = 1.0e0
real :: beta  = 0.0e0
real :: t1, t2, tt, gflops
integer :: i, j, k

print *, "Enter N: "
read(5,*) n

allocate(a(n,n), b(n,n), c(n,n))
#ifdef _CUDAFOR
allocate(dA(n,n), dB(n,n), dC(n,n))
#endif

a = 2.0e0
b = 1.5e0
c = -9.9e0

call cpu_time(t1)

#ifdef _CUDAFOR
dA = a
dB = b
if (beta .ne. 0.0) then
  dC = c
endif

call sgemm('n', 'n', n, n, n, alpha, dA, n, dB, n, beta, dC, n)
c = dC
#else
call sgemm('n', 'n', n, n, n, alpha, a, n, b, n, beta, c, n)
#endif

call cpu_time(t2)

print *, "Checking results...."

do j = 1, n
  do i = 1, n
    if (c(i,j) .ne. (3.0e0*real(n))) then
      print *, "error:  ",i,j,c(i,j)
    endif
  enddo
enddo

gflops = (real(n) * real(n) * real(n) * 2.0) / 1000000000.0
tt = t2 - t1
print *, "Total Time: ",tt
print *, "Total SGEMM gflops: ",gflops/tt
print *, "Done...."

end

I added the cublas.lib to “Properties\Linker\Input\Additional Dependencies” and also its location to “Properties\Linker\General\Additional Library Directories”. Both CUDA Fortran and Target NVIDIA Accelerator are also enabled.

Know, upon building the solution, I get the following error:

“test_cublasSgemm.obj : error LNK2019: unresolved external symbol sgemm_ referenced in function MAIN_
C:\Omid’s project\My SOCP Codes\Test\CUBLASTest\CUBLASTest\x64\Debug\CUBLASTest.exe : fatal error LNK1120: 1 unresolved externals
CUBLASTest build failed.”

I am working with PGI Visual Fortran 2010, on windows 7 x64.
Could anybody please help me with this issue?

Many thanks,

Omid

Hi Omid,

Did you add the CUBLAS library to your link? (i.e. -lcublas)

  • Mat

Hi Mat,

Did you add the CUBLAS library to your link? (i.e. -lcublas)

As I mentioned before, I added “cublas.lib” to “Properties\Linker\Input\Additional Dependencies” and also its location to “Properties\Linker\General\Additional Library Directories”.

Is that what you mean? If not, would you please tell me how to do that?

Thank you,

Omid

Hi Omid,

As I mentioned before, I added “cublas.lib” to “Properties\Linker\Input\Additional Dependencies” and also its location to “Properties\Linker\General\Additional Library Directories”.

My fault, I missed this. This is correct.

I suspect you didn’t compile with “-D_CUDAFOR” so that the code isn’t getting the interface included. Try adding “_CUDAFOR” to your property page “Fortran->Preprocessor->Preprocessor Definitions”

  • Mat

Hi Mat,

Thank you for your reply.

Try adding “_CUDAFOR” to your property page “Fortran->Preprocessor->Preprocessor Definitions”

This solves the problem and I can run the example now.
Many thanks for that.

Also, Would you be able to help me with some other issues?

Firstly, I don’t really understand what is the difference between the two “CUBLAS” examples provided in PGI package. One is the “test_cublasSgemm.F90” located in “C:\Program Files\PGI\win64\13.3\samples\cudafor” and the other one is “cublasTestSgemm.F90” located in “C:\Program Files\PGI\win64\2013\cuda\CUDA-Fortran-SDK”. The former includes an interface for “sgemm” routine, while the latter includes a “use cublas” statement with no interface. could you please quickly brief me on the difference of these two?

Secondly, my final intention is to use the “CuSparse” library. Is there any example of using that with PGI compiler I can have a look at? 'cause the “cusparse” is much more complicated than “cublas”.

I appreciate your help.

Omid

The former includes an interface for “sgemm” routine, while the latter includes a “use cublas” statement with no interface. could you please quickly brief me on the difference of these two?

That’s the difference. One shows how to use an interface and one uses the cublas interface module. The cudblas module is something PGI created as a convenience for users especially since NVIDIA kept changing the interface to CUBLAS.

Secondly, my final intention is to use the “CuSparse” library. Is there any example of using that with PGI compiler I can have a look at? 'cause the “cusparse” is much more complicated than “cublas”.

You’ll need to either add an interface for the CUSPARSE routines or use the C interface file NVIDIA provides. http://docs.nvidia.com/cuda/cusparse/index.html#topic_14

  • Mat

The cudblas module is something PGI created as a convenience for users especially since NVIDIA kept changing the interface to CUBLAS.

Do we have access to this module? where can I find it to make some changes? Is a similar module provided by PGI for CUSPARSE?

Thank you,

Omid

Do we have access to this module? where can I find it to make some changes?

The cublas module file (.mod) file can be found in the include directory. However, we do not ship the source for this file.

Is a similar module provided by PGI for CUSPARSE?

No. Ideally NVIDIA would do one since they manage CUSPARSE. Though, you should be able to use the interface file they provide. It’s a F77 style interface, but you should be able to skip the cuda_malloc/cuda_free calls and just pass in CUDA Fortran device variables directly to the routine. Granted, I haven’t tried it myself, but in theory it should just work.

  • Mat