cufft/cublas wrappers for fortran how to create cufft wrappers?

thorgh · May 26, 2011, 9:36am

Hello everyone.

I use fortran (Intel Fortran Compiler) for my scientific computations and recently I have started using CUDA in order to speed them up. I don’t have any problems with cublas, but in order to feed gemm with data I need to perform some fourier transforms. The next logical step is to calculate all the fourier transforms on GPU. Unfortunately there are no fortran wrappers for cufft provided with CUDA Toolkit. I have tried to write something similar to cublas fortran wrappers, but I failed, since I have very little knowledge of “C”.

So, first I have a question about cublas memory allocation and copying data functions (cublas_alloc, cublas_(s/g)et_vector). Would it be possible to use them in order to feed data to cufft functions? (I think it would not constitute a problem, but I’d like to hear confirmation from somebody more experienced).

Also, maybe someone already tried to write wrappers of this kind and can share? What I only need is cufft_plan1d, cufft_destroy and cufftExecZ2Z. For simplicity I would like to use it altogether with cublas fortran wrappers.

As I said I have tried to do this on my own, but to be honest I don’t know how to handle “plan” (cufftHandle type). In the code below I have left out almost all parts concerning plan (not to leave blanks I’ve put “cufftHandle_plan” in those places), since everything I’ve tried was obviously wrong and ended in sigsegv.

(fortran.c)

int CUFFT_EXECZ2Z (_cufftHandle_plan_, const devptr_t *idata, devptr_t *odata,const int *direction)

{

    cuComplex *i = (cuComplex *)(*idata);

    cuComplex *o = (cuComplex *)(*odata);

    return (int)cufftExecZ2Z(_cufftHandle_plan_, i,o,*direction);

}

int CUFFT_PLAN1D (_cufftHandle_plan_, const int *nx, const int *type, const int *count)

{

    return (int) cufftPlan1d(_cufftHandle_plan_, *nx, *type, *count);

}

int CUFFT_DESTROY (unsigned int *plan)

{

    return (int) cufftDestroy (plan);

}

(fortran.h)

int CUFFT_PLAN1D (unsigned *plan, const int *nx, const int *type, const int *count);

int CUFFT_DESTROY (const unsigned int *plan);

int CUFFT_EXECZ2Z (const unsigned int *plan, const devptr_t *idata, devptr_t *odata,const int *direction);

I would be grateful for any help, but please take into account in your explanations that I’m not that much familiar with “C”.

mfatica · May 26, 2011, 1:58pm

I wrote a CUFFT wrapper for CUDA Fortran some time ago, but you should be able to adapt them to the Intel compiler:

thorgh · May 28, 2011, 12:05pm

Thank you for reply. I have tried to compile your wrapper before I started this topic, but the allocation of device memory is compiler specific there, so I would have to make many modifications. Instead I wrote single “C” function to which I pass 1D input/output arrays (input array contains N functions to transform) and call cufft routines from there. Unfortunately my data are zero-padded (about 8 times more zeros than actual data) and calculations and memory transfers take quite long time. Furthermore I need only part of the output results, so calculating the fourier transform from definition (which I perform with gemm) is two times faster in my case (but still slower than using fft on CPU).

As I can see fft routines which do not need zero-padded input in order to achieve higher sampling in frequency domain are almost nonexistent…

mfatica · May 28, 2011, 5:46pm

Don’t transfer zero data and add padding on the GPU.

If you have N functions to transform, each one of length M, stored on the CPU in a matrix A_cpu(M,N)
you can define a matrix A_gpu(Mp,N) where Mp is the new padded length,

You can either:

zero A_gpu and use cudaMemcpy2D to copy A_cpu in A_gpu, since you can define a stride.

or

transfer A_cpu to a temporary array on GPU and then use a custom kernel to fill A_gpu

Once you have A_gpu, use the standard CUFFT and then transfer the results back ( if you need the full range use cudaMemcpy otherwise you can select a subset with cudaMemcpy2D)

Topic		Replies	Views
Fortran and cuFFT CUDA Programming and Performance	8	17849	September 19, 2009
Calling CUDA C from fortran CUDA Programming and Performance	4	934	December 4, 2021
cuFFT cufftPlan1d and cufftExecR2C issues GPU-Accelerated Libraries	4	2382	July 13, 2016
memory issue in calling cufft routines from fortran CUDA Programming and Performance	0	2910	April 21, 2010
CUDA Noob: "cufft: ERROR: CUFFT_INVALID_PLAN" CUDA Programming and Performance	4	4409	December 5, 2008
Cufft with nvfortran compiles and runs but result is zero nvc, nvc++ and nvfortran cuda	2	156	June 4, 2024
Modified version of the CUFFT example Legacy PGI Compilers	6	7639	February 21, 2012
Does cufftPlan3d allocate additional memory? Why? CUDA Programming and Performance	1	1088	April 7, 2009
CUDA 3.2 RC Fortran Wrapper CUDA Programming and Performance	2	1233	October 1, 2010
CUDA FFT different from Matlab FFT CUDA Programming and Performance	32	9336	March 29, 2011

cufft/cublas wrappers for fortran how to create cufft wrappers?

Related topics