F90 accelerator directives + cublas dgemv

Would anyone have an example of using PGI accelerator directives in Fortran 90 used in conjunction with a cublas dgemv call?

From searching the forum, I believe this is possible. But was not able to find any examples.

I’d imagine that it would be possible to place the array data on the device using the data region directives, placing a call to cublas degemv, and then doing a copy out.

Hi Sarom,

The PGI Accelerator Model will recognise CUDA Fortran device variables. So here, you would call CUBLAS degemv using CUDA Fortran and then just use the device array in the compute region. No need to use a data region.

Note the CUDA Fortran SDK has an example of calling sgemm. (/opt/pgi/linux86-64/2012/cuda/CUDA-Fortran-SDK/cublasTestSgemm.F90).

  • Mat

Hi Mat,

Does the compiler flag

ta=nvidia,wait

have any affect on asynchronous routines like cublasSetVectorAsync if it is set to wait?

And is cublasDgemv_v2 an asynchronous call?

I have several Dgemv to do and I’m interested in trying to overlap communication with computation and take advantage of multiple streams.

have any affect on asynchronous routines like cublasSetVectorAsync if it is set to wait?

No.

And is cublasDgemv_v2 an asynchronous call?

I don’t know for sure, but would doubt it. What does the CUBLAS docs say?

  • Mat

After some digging through the web and some testing. Control immediate returns to the CPU after a cublas call.