CUDA Pro Tip: How to Call Batched cuBLAS routines from CUDA Fortran

Originally published at:

CUDA Fortran for Scientists and Engineers shows how high-performance application developers can leverage the power of GPUs using Fortran. When dealing with small arrays and matrices, one method of exposing parallelism on the GPU is to execute the same cuBLAS call on multiple independent systems simultaneously. While you can do this manually by calling multiple cuBLAS…

Great article! Just one question; is this possible without PGI's CUDA Fortran? I've been trying to implement these batched routines from Fortran, but I don't have access to the PGI compiler. I've been able to make some progress with Fortran interfaces to CUDA/cuBLAS library routines, but I can't get the batched routines to work.

Thanks for the feedback and your question, Austin.

To use batched CUBLAS routines from regular Fortran, you'd need to write CUDA C to code to manage the array of device pointers and then write some C stubs callable from Fortran.

Hello Greg. I am trying to compile a code using cublas from Fortran VS 2013 adding the linker -lcublas but I haven't been able to do it. Could you please give any recommedation on this? Thank you very much.

Hi Ivonne:

Rather than using "-lcublas" try adding "cublas.lib" to the link. "-lcublas" looks for "libcublas.lib" which doesn't exist.

Another option is to use "-defaultlib:cublas"

I hope one of these resolves your issue.

- Greg