Linking CUBLAS/CUFFT to existing applications?

I would like to ask if it is possible to have already an application that uses blas and fftw and recompile it so that it uses CUBLAS and CUFT rather than blas and fft libraries from intel-mkl, acml and so on, without changing the source code? If this is possible would someone expect a considerable enhancmenet in performance?

Thanks a lot,

For CUFFT, you would either have to change the source or create a wrapper. The types and functions are named slightly differently and CUFFT expects the data to already be on the GPU.

Performance may be better or worse, depending on the FFT size and how often data is copied between RAM and GPU memory. Typically you need to keep as much of the data and computation on the GPU as possible to get performance increases.

Thanks a lot. Two more questions though.

  1. Do you happen to know if this the case for CUBLAS as well?

  2. Is creating a wrapper something straightfoward to do ? Can I find information in the Programming Guide or somehwere else. (Our aim is to see if we can accelerate already existing applications without intervening to the source code, which is written by others)

Thanks a lot again,


CUBLAS has two different interfaces:

Thunking ( define CUBLAS_USE_THUNKING when compiling fortran.c):
allow interfacing to existing Fortran applications without any changes to the application. During each call, the wrappers allocate GPU memory, copy source data from CPU memory space to GPU memory space, call CUBLAS, and finally copy back the results to CPU memory space and deallocate the GPGPU memory. As this process causes verysignificant call overhead, these wrappers are intended for light testing,not for production code.

Non-Thunking (default):
intended for production code, substitute device pointers for vector and matrix arguments in all BLAS functions. To use these interfaces, existing applications need to be modified slightly to allocate and deallocate data structures in GPGPU memory space (using CUBLAS_ALLOC and CUBLAS_FREE) and to copy data between GPU and CPU memory spaces (using CUBLAS_SET_VECTOR, CUBLAS_GET_VECTOR, CUBLAS_SET_MATRIX, and CUBLAS_GET_MATRIX).

The names are still different from the standard BLAS.
You could find more info at: