Problem with calculation of spectral derivatives using CUFFT

Hello everyone,

I am interested in calculating higher order spectral derivatives using CUFFT in Fortran. While the first derivative is correctly calculated, higher derivatives are producing oscillations at the edges of my domain.

You can see that a couple of points are off at 0 and 6.28 and that behavior is even worse for higher derivatives.

The relevant part of my code is the following where I am trying to calculate the second derivative of the 1D array a_d

``````! Execute FFTs
istat=cufftExecD2Z(plan,a_d,b_d,CUFFT_FORWARD)
if(istat.ne.0) write(*,*) 'problem with forward fft'

!second derivative
!\$cuf kernel do <<<ceiling(real(nz)/1024)),1024,0,stream>>>
do k=1,nz

if(k.ge.1.and.k.le.nz/2) then
kz=dble(k-1)
else
kz=dble(-nz+k-1)
endif

if(k.eq.nz/2+1) kz=0.d0

b_d(i)=b_d(i)*cmplx(0.d0,kz)*cmplx(0.d0,kz)

enddo

istat=cufftExecZ2D(plan2,b_d,a_d,CUFFT_INVERSE)
if(istat.ne.0) write(*,*) 'problem with inverse fft'
``````

a_d is expected to have the second derivative after the inverse fft and after normalization. The core of the result is good but as you can see on the plot above there are issues at the edges.

I would like to ask if you have seen that behavior before using the cufft library and if you have any ideas what may cause that problem.

VT

Your ExecZ2D and D2Z calls look suspicious ( there should be no direction since it is implicit). Without a full code, it is difficult to find the source of your problem, you may have set up the FFT plans incorrectly.

Thank you mfatica. Here is my code::

``````program main2
use cufft
use iso_c_binding
implicit none
double precision, dimension(:,:,:), allocatable :: val_tmp
double precision, dimension(:,:,:), allocatable :: val
double precision :: dummy
double precision, parameter :: pi=dacos(-1.d0)
double precision, parameter :: z=2.d0*pi,zr=2.d0*pi,zl=0.d0,b_domain=2.d0*pi/(zr-zl)
integer :: i,j,k
integer, parameter :: nx=100,ny=100,nz=100
double precision, parameter :: dz=z/dble(nz-1)
integer :: ierr,iproc,source,tag,istat
integer :: ndims=3,iii
integer, dimension(3) :: nxyz_glb
double precision, allocatable              :: a(:),deriv1(:),deriv2(:)
double precision, allocatable,device       :: val_d(:,:,:)
double complex, allocatable                :: b(:)
double precision, allocatable,device       :: a_d(:)
double precision                           :: kz
double complex, allocatable,device         :: b_d(:),b2_d(:)
integer, parameter                         :: total_ffts=nx*ny,n=nx*ny*nz
integer                                    :: plan, plan2
!type(cudaEvent)                            :: start, stop
real                                       :: time, timesum
integer         :: stream,dev
integer, allocatable :: map_cpu_2_gpu(:)
integer:: ngpus ,deviceid
integer:: localrank,localcomm

!create plan for the fft
istat=cufftPlan1D(plan,nz,CUFFT_D2Z,1)
if(istat.ne.0) write(*,*) 'problem with forward fft plan'

!crteate plan  the ifft
istat=cufftPlan1D(plan2,nz,CUFFT_Z2D,1)
if(istat.ne.0) write(*,*) 'problem with ifft plan'

allocate(a(nz))

! create signal with wavenumbers 1 and 4
do i=1,nz
a(i)=cos(2.d0*pi*dble(i-1)/(dble(nz-1)))+cos(4*2.d0*pi*dble(i-1)/(dble(nz-1)))
enddo

! allocate arrays where fft and ifft will be performed 1-d
allocate(a_d(nz),b_d(nz))

allocate(deriv1(nz),deriv2(nz))

a_d=a

! Execute FFTs
istat=cufftExecD2Z(plan,a_d,b_d,CUFFT_FORWARD)
if(istat.ne.0) write(*,*) 'problem with forward fft'

!2nd derivative
!\$cuf kernel do <<<ceiling(real(nz)/1024),1024>>>
do i=1,nz

k=i
if(k.ge.1.and.k.le.nz/2) then
kz=dble(k-1)
else
kz=dble(-nz+k-1)
endif

if(k.eq.nz/2+1) kz=0.d0

b_d(i)=b_d(i)*cmplx(0.d0,kz)*cmplx(0.d0,kz)

enddo

istat=cufftExecZ2D(plan2,b_d,a_d,CUFFT_INVERSE)
if(istat.ne.0) write(*,*) 'problem with inverse fft'

deriv2=a_d
!need to divide by nz

open(9,file='fft.dat')
do k=1,nz
write(9,*) dz*(k-1),  deriv2(k)/dble(nz)
enddo
close(9)

end
``````

VT

Your code does not compile, the interfaces to cufftexecd2z/z2d are wrong, as I told you before you have an extra argument at the end that is not needed ( cufft already knows the direction from d2z or z2d name).

! Execute FFTs
istat=cufftExecD2Z(plan,a_d,b_d)
istat=cufftExecZ2D(plan2,b_d,a_d)

it is all you need.

There are few other issues in your code:

1. If you transform with D2Z, your input is a real array(NZ) but the output is a complex (NZ/2+1).

! allocate arrays where fft and ifft will be performed 1-d
allocate(a_d(nz),b_d(nz/2+1))

1. The wave numbers should be multiplied by 2*pi and the loop should go only up to nz/2+1. Also leave the configuration call to the compiler, 1024 threads may not be a good choice

!\$cuf kernel do <<<*,*>>>
do i=1,nz/2+1
k=i
if(k.ge.1.and.k.le.nz/2) then
kz=2.d0*pi*dble(k-1)
else
kz=2.d0*pi*dble(-nz+k-1)
endif

1. Your initial array is causing the issue at the boundary. If you initialize like this,
you will get a good match ( deriv_ref is the analytical derivative).

! create signal with wavenumbers 1 and 4
do i=1,nz
a(i)=cos(2.d0*pi*dble(i-1)/(dble(nz)))+cos(4*2.d0*pi*dble(i-1)/(dble(nz)))
deriv_ref(i)=-4.d0*pi*pi*cos(2.d0*pi*dble(i-1)/(dble(nz)))-64.d0*pi*pi*cos(4*2.d0*pi*dble(i-1)/(dble(nz)))
enddo

1. I would also put the scaling 1/nz in the derivative.

If you compare the computed derivative to the reference, you will see an absolute error of 10^-5

Thank you mfatica. Your suggestions worked.
VT

On a second thought, the signal shouldn’t be initialized as
cos(2.d0pidble(i-1)/(dble(nz-1)))+cos(42.d0pi*dble(i-1)/(dble(nz-1))) so we can generate a signal in the whole domain 2pi?

Do a simple experiment.
Your signal by construction should have a transform where only k=1 and k=4 have a peak of 1/2 ( or since cufft does not scale N/2)., corresponding to the two cosines.
Now transform your definition and the one I used and see which one is giving the expected results.