About cublas Drop-in replacement?

It is possible to do a “drop-in” replacement of existing blas calls to cublas calls?

examples:
call DGEMM(…) replace with:
call cublas_DGEMM(…)

and compile with: fortran.o cudart cublas

If it is not the case, then why not?

You really should read Appendix A of the CUBLAS user guide, where fortran interoperability is discussed in detail. The short answer is, no you can’t do that, you need to turn on the “thunking” interface when you build, but there are a lot of reasons why you shouldn’t port host BLAS code using the “thunking” interface for anything serious.

Thunking means “copy to the device - compute - copy back”. So there’s overhead, especially for sequences of calls. For DGEMM and BLAS-3 and large input sizes, the overhead won’t kill your speedups. Otherwise, you have to do something that involves coding, but you can do that with just CUBLAS, no need to learn CUDA. cublasSetMatrix(), cublasGetVector() etc are your friends. Essentially, you know better than the thunking stuff when to actually transfer data.

Hi avidday. Would you mind expanding on the thought “that there are a lot of reasons why you shouldn’t port host BLAS code using the “thunking” interface for anything serious”?

Thanks

Malcolm

When using CUBLAS_USE_THUNKING , a direct replacement should work, right?

It does not for ZGEMM or CGEMM but is OK for DGEMM, why?

I use

#define CUBLAS_USE_THUNKING 1

in the fortran.c program wrapper.

Would using

gcc -D__CUBLAS_USE_THUNKING

do the same?

With complex BLAS functions, you are relying on completely non-standardized interoperability between the FORTRAN COMPLEX type and cuComplex, whose type and functionality will vary depending on what compiler and platform you are using. The default is to try and use the C99/C++ complex type, but whether that actually works is completely compiler dependent.

No, but -DCUBLAS_USE_THUNKING should.

Thank you for the info but:

That is not good. Is there a test suite somewhere for the complete cuBLAS library with both interfaces C and Fortran?

I just notice that some double-precision functions are not included in the cublas library: namely ZCOPY and ZAXPY.
Will they be included in a further version?
This is a possible reason that I cannot get my program to work because I have to mix cublas and blas function in the same program. THis is just a guess on my part.

Not a guess anymore. Mixing does not work in my case.

When is the complete CUBLAS library coming?

Good question!

MMB