Fortran and cuFFT

jam11 · August 13, 2009, 2:26am

What is the best way to call the cuFFT functions from an existing fortran program which uses the fftw3 library calls.

The last problem I am having is that the fortran compiler is case-insensitive for the generated function names. I can get rid of the underscore with a compiler option but all functions are lower-case only so they are not similar to the cuFFT library names. There is a fortran wrapper for the cublas library but not for the cuFFT library.
Is there a fortran wrapper for the cuFFT library planed?

OOps!
Forget about that, I was not thinking…
nvcc is still needed

jam11 · August 14, 2009, 2:53pm

Ok. I did progress a bit on this problem.

However, I have an error message which is not in the latest cuFFT user guide. Maybe it is somewhere else.

In calling cufftPlan3D:

cufft: Failed to find applicable transform

Can someone show me what this error means?

jam11 · August 14, 2009, 4:44pm

And why does this piece compile properly?

From SDK2.3

from simpleCUFFT:

// CUFFT plan

cufftHandle plan;

cufftSafeCall(cufftPlan1d(&plan, new_size, CUFFT_C2C, 1));

this is too many arguments if I try in my code.

mfatica · August 14, 2009, 4:55pm

cufftPlan1d(&plan, new_size, CUFFT_C2C, 1)

That is the correct calling sequence for 1D transform.

jam11 · August 14, 2009, 5:21pm

Silly me. I did not notice the 1d and I am working with 3d.

How about my previous question on the error message?

mfatica · August 14, 2009, 5:28pm

Are you taking in account that Fortran is passing arguments by reference and C by value?
Take a look at the Fortran example that calls CUDA code in this presentation (http://gpgpu.org/wp/wp-content/uploads/2009/06/03-Toolkit.pdf )

jam11 · August 14, 2009, 6:03pm

The arguments passing should be OK. The presentation is very short on the F->CUDA. I would have to change the subroutine too much to follow this approach and I don’t know how much ripple effect it would have on the others subs. I think I already modify as much as I could at this stage.

jam11 · August 19, 2009, 2:03pm

Back to Fortran to Cuda calls

Which is the correct way?

Fortran
real*8 ::r(0:2,0:3,0:2)
…call fun ( i,j,k , r)
Cuda
void fun_ ( int *i , int *j, int *k, cufftReal *r);
In this case, the array r is 1d and I have to use cufftPlan1d

Is the elements order correct? Will the fft be OK?

Fortran
real*8 ::r(0:2,0:2,0:2)
…call fun ( i,j,k , r)
Cuda

void fun_ ( int *i , int *j, int *k, cufftReal r[2][3][2]);
In that case, the array r is 3d so I can use cufftPlan3d, but I cannot have variables to specify the array sizes. Can I use r in the function?

Do I have to modify the elements order to agree with fortran? will the fft be OK?

On the return form the fft, do I have to modify the array order before going back the the calling fortran program?

jam11 · September 19, 2009, 3:20pm

And the solution is :

do the host memory allocation on the fortran side using this:

[codebox]module cuda_alloc

interface

! cudaMallocHost

integer (C_INT) function cudaMallocHost(buffer, size) bind(C,name=“cudaMallocHost”)

use iso_c_binding

implicit none

type (C_PTR) :: buffer

integer (C_LONG), value :: size

end function cudaMallocHost

! cudaFreeHost

integer (C_INT) function cudaFreeHost(buffer) bind(C,name=“cudaFreeHost”)

use iso_c_binding

implicit none

type (C_PTR), value :: buffer

end function cudaFreeHost

end interface

end module cuda_alloc

[/codebox]

call the cuFFT program using this subroutine :

[codebox]

subroutine r2c_3d( Nx, Ny, Nz, r, c)

use iso_c_binding

use cuda_alloc

implicit none

integer Nx, Ny, Nz,res

real*8 :: r(0:Nx-1, 0:Ny-1, 0:Nz-1)

complex*16 :: c(0:Nx/2, 0:Ny-1, 0:Nz-1)

complex*16 , parameter :: size1=dcmplx(0.0,0.0)

real*8 , parameter :: size2=dble(0.0)

type(C_PTR) :: cptr_cd, cptr_rd

complex*16 , dimension(:,:,:), pointer :: cd

real*8 , dimension(:,:,:), pointer :: rd

res = cudaMallocHost ( cptr_cd,NxNyNz*sizeof(size1)/2+1 )

call c_f_pointer ( cptr_cd,cd , [Nx/2+1,Ny,Nz] )

res = cudaMallocHost ( cptr_rd, NxNyNz*sizeof(size2) )

call c_f_pointer ( cptr_rd,rd , [Nx,Ny,Nz ] )

rd = r

call fft_3d_r2c( Nx, Ny, Nz, rd, cd)

c= cd

res = cudaFreeHost(cptr_rd)

res = cudaFreeHost(cptr_cd)

end subroutine r2c_3d

subroutine c2r_3d(Nx, Ny, Nz, c, r)

use iso_c_binding

use cuda_alloc

implicit none

integer Nx, Ny, Nz,res

real*8 :: r(0:Nx-1, 0:Ny-1, 0:Nz-1)

complex*16 :: c(0:Nx/2, 0:Ny-1, 0:Nz-1)

complex*16 , parameter :: size1=dcmplx(0.0,0.0)

real*8 , parameter :: size2=dble(0.0)

type(C_PTR) :: cptr_cd, cptr_rd

complex*16 , dimension(:,:,:), pointer :: cd

real*8 , dimension(:,:,:), pointer :: rd

res = cudaMallocHost ( cptr_cd,NxNyNz*sizeof(size1) )

call c_f_pointer ( cptr_cd,cd , [Nx/2+1,Ny,Nz] )

res = cudaMallocHost ( cptr_rd, NxNyNz*sizeof(size2) )

call c_f_pointer ( cptr_rd,rd , [Nx,Ny,Nz ] )

cd = c

call fft_3d_c2r( Nx, Ny, Nz, cd ,rd)

r = rd

res = cudaFreeHost(cptr_rd)

res = cudaFreeHost(cptr_cd)

end subroutine c2r_3d

[/codebox]

3 ) and finally this is the .cu code

[codebox]

#include <stdio.h>

#include <stdlib.h>

#include “cufft.h”

#include “cuda.h”

#include “cutil_inline.h”

extern “C” void fft_3d_r2c_( int *nx ,int *ny, int *nz, cufftDoubleReal *h_a, cufftDoubleComplex *h_b)

{

unsigned long size1 = sizeof(cufftDoubleReal)* (nx)(ny)(*nz); // calulate memory size

    unsigned long size2 = sizeof(cufftDoubleComplex)* (*nx/2+1)*(*ny)*(*nz);                       //  calulate memory size

cufftDoubleReal *d_ic=NULL; // define pointer to device mem input

    cufftDoubleComplex *d_oc=NULL;                                                                 //  define pointer to device mem output

    cufftHandle plan;

cutilSafeCall( cudaMalloc((void **)&d_ic, size1)); // allocate mem on device

cutilSafeCall( cudaMalloc((void **)&d_oc, size2)); // allocate mem on device

cutilSafeCall( cudaMemcpy(d_ic,h_a, size1 , cudaMemcpyHostToDevice)); //copy host array to device

cufftSafeCall( cufftPlan3d(&plan, (*nz),(*ny),(*nx) ,CUFFT_D2Z )); // perform 3d fft complex to complex

cufftSafeCall( cufftExecD2Z(plan, d_ic, d_oc )); // out of place

cutilSafeCall( cudaMemcpy(h_b,d_oc, size2 , cudaMemcpyDeviceToHost)); // copy back to host array

cudaFree(d_ic);

    cudaFree(d_oc);

    cufftDestroy(plan);

}

extern “C” void fft_3d_c2r_( int* nx ,int *ny, int *nz, cufftDoubleComplex *h_a, cufftDoubleReal *h_b)

{

    unsigned long size1 = sizeof(cufftDoubleComplex)* (*nx)*(*ny)*(*nz);                           //  calulate memory size

    unsigned long size2 = sizeof(cufftDoubleReal)* (*nx)*(*ny)*(*nz);                              //  calulate memory size

    cufftDoubleComplex *d_ic=NULL;                                                                 //  define pointer to device mem input

    cufftDoubleReal *d_oc=NULL;                                                                    //  define pointer to device mem output

    cufftHandle plan;

cutilSafeCall( cudaMalloc((void **)&d_ic, size1)); // allocate mem on device

cutilSafeCall( cudaMalloc((void **)&d_oc, size2)); // allocate mem on device

cutilSafeCall( cudaMemcpy(d_ic,h_a, size1 , cudaMemcpyHostToDevice)); //copy host array to device

cufftSafeCall( cufftPlan3d(&plan, *nz,*ny,*nx ,CUFFT_Z2D )); // perform 3d fft complex to complex

cufftSafeCall( cufftExecZ2D(plan, d_ic, d_oc )); // out of place

cutilSafeCall( cudaMemcpy(h_b,d_oc, size2 , cudaMemcpyDeviceToHost)); // copy back to host array

cudaFree(d_ic);

    cudaFree(d_oc);

    cufftDestroy(plan);

}

[/codebox]

Note that only the latest gnufortran ( after aug , 2009) would work properly.

It is critical to swap the indices nx and nz in cufftPlan3d.

Topic		Replies	Views
CUFFT FOR FORTRAN HOW TO USE CUDA Programming and Performance	6	10890	September 13, 2010
Questions about cuFFT for 3D matrix, arrayFire GPU-Accelerated Libraries	5	1645	October 12, 2021
CUFFT bug in Cuda 4.0 Release Candidate 2 CUDA Programming and Performance	8	1637	May 5, 2011
Using cuSolverDN in FORTRAN code GPU-Accelerated Libraries	8	4200	November 18, 2015
Problems using CUDA with Fortran CUDA Programming and Performance	8	5806	February 11, 2010
fortran cuda interface passing pointer from fortran and allocating memory on device CUDA Programming and Performance	8	9926	May 14, 2010
cudaMemcpy error Segmentation fault when executed CUDA Programming and Performance	7	3139	September 6, 2009
cuFFT and fftw CUDA Programming and Performance	10	4138	August 25, 2010
CUDA FFT different from Matlab FFT CUDA Programming and Performance	32	9303	March 29, 2011
cufftPlan2d() returns CUFFT_INVALID_VALUE, should be impossible CUDA Programming and Performance	13	15655	January 24, 2011

Fortran and cuFFT

Related topics