Problem with calculation of spectral derivatives using CUFFT

Hello everyone,

I am interested in calculating higher order spectral derivatives using CUFFT in Fortran. While the first derivative is correctly calculated, higher derivatives are producing oscillations at the edges of my domain.
Capture

You can see that a couple of points are off at 0 and 6.28 and that behavior is even worse for higher derivatives.

The relevant part of my code is the following where I am trying to calculate the second derivative of the 1D array a_d

! Execute FFTs
    istat=cufftExecD2Z(plan,a_d,b_d,CUFFT_FORWARD)
    if(istat.ne.0) write(*,*) 'problem with forward fft'


   !second derivative
    !$cuf kernel do <<<ceiling(real(nz)/1024)),1024,0,stream>>>
    do k=1,nz

      if(k.ge.1.and.k.le.nz/2) then
        kz=dble(k-1)
      else
        kz=dble(-nz+k-1)
      endif

      if(k.eq.nz/2+1) kz=0.d0

      b_d(i)=b_d(i)*cmplx(0.d0,kz)*cmplx(0.d0,kz)

    enddo

    
    istat=cufftExecZ2D(plan2,b_d,a_d,CUFFT_INVERSE)
    if(istat.ne.0) write(*,*) 'problem with inverse fft'

a_d is expected to have the second derivative after the inverse fft and after normalization. The core of the result is good but as you can see on the plot above there are issues at the edges.

I would like to ask if you have seen that behavior before using the cufft library and if you have any ideas what may cause that problem.

Thank you in advance
VT

Your ExecZ2D and D2Z calls look suspicious ( there should be no direction since it is implicit). Without a full code, it is difficult to find the source of your problem, you may have set up the FFT plans incorrectly.

Thank you mfatica. Here is my code::

program main2
  use cufft
  use iso_c_binding
  implicit none
  double precision, dimension(:,:,:), allocatable :: val_tmp
  double precision, dimension(:,:,:), allocatable :: val 
  double precision :: dummy
  double precision, parameter :: pi=dacos(-1.d0)
  double precision, parameter :: z=2.d0*pi,zr=2.d0*pi,zl=0.d0,b_domain=2.d0*pi/(zr-zl)
  integer :: i,j,k
  integer, parameter :: nx=100,ny=100,nz=100
  double precision, parameter :: dz=z/dble(nz-1) 
  integer :: ierr,iproc,source,tag,istat
  integer :: ndims=3,iii
  integer, dimension(3) :: nxyz_glb
  double precision, allocatable              :: a(:),deriv1(:),deriv2(:)
  double precision, allocatable,device       :: val_d(:,:,:)
  double complex, allocatable                :: b(:)
  double precision, allocatable,device       :: a_d(:)
  double precision                           :: kz
  double complex, allocatable,device         :: b_d(:),b2_d(:)
  integer, parameter                         :: total_ffts=nx*ny,n=nx*ny*nz
  integer                                    :: plan, plan2
  !type(cudaEvent)                            :: start, stop
  real                                       :: time, timesum
  integer         :: stream,dev
  integer, allocatable :: map_cpu_2_gpu(:)
  !type (cudaDeviceprop):: prop
  integer:: ngpus ,deviceid
  integer:: localrank,localcomm

 
  !create plan for the fft
  istat=cufftPlan1D(plan,nz,CUFFT_D2Z,1)
  if(istat.ne.0) write(*,*) 'problem with forward fft plan'

  !crteate plan  the ifft
  istat=cufftPlan1D(plan2,nz,CUFFT_Z2D,1)
  if(istat.ne.0) write(*,*) 'problem with ifft plan'

  allocate(a(nz))
  
  ! create signal with wavenumbers 1 and 4
  do i=1,nz
    a(i)=cos(2.d0*pi*dble(i-1)/(dble(nz-1)))+cos(4*2.d0*pi*dble(i-1)/(dble(nz-1)))
  enddo

! allocate arrays where fft and ifft will be performed 1-d
   allocate(a_d(nz),b_d(nz))

   allocate(deriv1(nz),deriv2(nz))

   a_d=a

    ! Execute FFTs
    istat=cufftExecD2Z(plan,a_d,b_d,CUFFT_FORWARD)
    if(istat.ne.0) write(*,*) 'problem with forward fft'


   !2nd derivative
    !$cuf kernel do <<<ceiling(real(nz)/1024),1024>>>
    do i=1,nz


      k=i
      if(k.ge.1.and.k.le.nz/2) then
        kz=dble(k-1)
      else
       kz=dble(-nz+k-1)
      endif

      if(k.eq.nz/2+1) kz=0.d0


      b_d(i)=b_d(i)*cmplx(0.d0,kz)*cmplx(0.d0,kz)

    enddo


    istat=cufftExecZ2D(plan2,b_d,a_d,CUFFT_INVERSE)
    if(istat.ne.0) write(*,*) 'problem with inverse fft'

    deriv2=a_d
    !need to divide by nz

     open(9,file='fft.dat')
         do k=1,nz
           write(9,*) dz*(k-1),  deriv2(k)/dble(nz)
         enddo
close(9)

     
 end

I really appreciate your help
VT

Your code does not compile, the interfaces to cufftexecd2z/z2d are wrong, as I told you before you have an extra argument at the end that is not needed ( cufft already knows the direction from d2z or z2d name).

! Execute FFTs
istat=cufftExecD2Z(plan,a_d,b_d)
istat=cufftExecZ2D(plan2,b_d,a_d)

it is all you need.

There are few other issues in your code:

  1. If you transform with D2Z, your input is a real array(NZ) but the output is a complex (NZ/2+1).

! allocate arrays where fft and ifft will be performed 1-d
allocate(a_d(nz),b_d(nz/2+1))

  1. The wave numbers should be multiplied by 2*pi and the loop should go only up to nz/2+1. Also leave the configuration call to the compiler, 1024 threads may not be a good choice

!$cuf kernel do <<<*,*>>>
do i=1,nz/2+1
k=i
if(k.ge.1.and.k.le.nz/2) then
kz=2.d0*pi*dble(k-1)
else
kz=2.d0*pi*dble(-nz+k-1)
endif

  1. Your initial array is causing the issue at the boundary. If you initialize like this,
    you will get a good match ( deriv_ref is the analytical derivative).

! create signal with wavenumbers 1 and 4
do i=1,nz
a(i)=cos(2.d0*pi*dble(i-1)/(dble(nz)))+cos(4*2.d0*pi*dble(i-1)/(dble(nz)))
deriv_ref(i)=-4.d0*pi*pi*cos(2.d0*pi*dble(i-1)/(dble(nz)))-64.d0*pi*pi*cos(4*2.d0*pi*dble(i-1)/(dble(nz)))
enddo

  1. I would also put the scaling 1/nz in the derivative.

If you compare the computed derivative to the reference, you will see an absolute error of 10^-5

Thank you mfatica. Your suggestions worked.
VT

On a second thought, the signal shouldn’t be initialized as
cos(2.d0pidble(i-1)/(dble(nz-1)))+cos(42.d0pi*dble(i-1)/(dble(nz-1))) so we can generate a signal in the whole domain 2pi?

Do a simple experiment.
Your signal by construction should have a transform where only k=1 and k=4 have a peak of 1/2 ( or since cufft does not scale N/2)., corresponding to the two cosines.
Now transform your definition and the one I used and see which one is giving the expected results.