Segmentation fault (core dumped)

I’m a newbie to CUDA Fortran.I have searched the forum for this error “Segmentation fault (core dumped)” and read some replies but it didn’t solve the problem.If anyone can provide me with hints or solutions to the problem,it would be grateful.

The source code is:

module precision

! Precision control

integer, parameter, public :: Single = kind(0.0) ! Single precision

integer, parameter, public :: Double = kind(0.0d0) ! Double precision

integer, parameter, public :: fp_kind = Double

!integer, parameter, public :: fp_kind = Single

module cufft

integer, public :: CUFFT_FORWARD = -1

integer, public :: CUFFT_R2C = Z'2a' ! Real to Complex (interleaved)

integer, public :: CUFFT_D2Z = Z'6a' ! Double to Double-Complex

interface cufftPlan1d

subroutine cufftPlan1d(plan, nx, type, batch) bind(C,name='cufftPlan1d')

use iso_c_binding

integer(c_int):: plan

integer(c_int),value:: nx, batch,type

end subroutine cufftPlan1d

end interface cufftPlan1d

interface cufftDestroy

subroutine cufftDestroy(plan) bind(C,name='cufftDestroy')

use iso_c_binding

integer(c_int),value:: plan

end subroutine cufftDestroy

end interface cufftDestroy

interface cufftExecD2Z

subroutine cufftExecD2Z(plan, idata, odata, direction) &

bind(C,name='cufftExecD2Z')

use iso_c_binding

use precision

integer(c_int),value:: direction

integer(c_int),value:: plan

complex(fp_kind),device:: idata(*),odata(*)

end subroutine cufftExecD2Z

end interface cufftExecD2Z

end module cufft

program fft_test

use precision

use cufft

complex(fp_kind) ,allocatable:: a(:),b(:)

complex(fp_kind),device,allocatable:: a_d(:),b_d(:)

integer:: n

integer:: plan

integer :: x

n=2048

a=1

! allocate arrays on the host

allocate (a(n),b(n))

! allocate arrays on the device

allocate (a_d(n))

allocate (b_d(n))

DO x=1,1024

a(x)=a(x)+a(x)*6+1

!copy arrays to device

a_d=a

! Print initial array

print *, "Array A:"

print *, a(x)

! Initialize the plan

call cufftPlan1D(plan,n,CUFFT_D2Z,1)

! Execute FFTs

call cufftExecD2Z(plan,a_d,b_d,CUFFT_FORWARD)

! Copying results back to host

b=b_d

! Printing initial array

print *, "Array B"

print *, b(x)

!releasing memory on the host

deallocate (a,b)

!releasing memory on the device

deallocate (a_d,b_d)

! Destroying the plan

call cufftDestroy(plan)

end DO

end program fft_test

Very generically speaking a segfault indicates that there is a memory access inside the host portion of the code that is to an address that does not belong to the process. This could be an access out of bounds (e.g. the codes allocates an array of N elements, then accesses element N+1), or in the context of CUDA, an access from host code to device data other than an implicit copy. Your first task will be to narrow down which portion of the code gives rise to the error. If a debugger is available, run inside the debugger. It will stop at the point the segfault occurs. Otherwise you can place WRITEs in your code to see where the program terminates.

I have scanned through the code and did not spot any obvious problems, I am not familiar with the CUFFT bindings though. You might want to try and simplify to a program without CUFFT calls to make sure the framework works, then add calls back in one by one.

A colleague with significantly more experience with CUDA Fortran makes the following observations:

(1) Recent versions of CUDA Fortran already provide a CUFFT module which can be used with

use cufft

(2) The code as posted has a syntax error, in that module precision is not properly closed with

end module precision

This should have resulted in a syntax error reported by the compiler (and in fact does so with the CUDA Fortran we have installed here).

In addition to those mentioned above, there are a few other errors.

The array “a” is assigned a value of “1” before it is allocated.

cufft*() are functions, not subroutines, and have an integer return values.

You declare the data as complex, but are doing real-to-complex transforms. Fortran is strongly typed and does not permit such things – the compiler will complain about not finding a generic procedure. Also, CUFFT_FORWARD is used as an argument only for complex-to-complex transforms (C2C or Z2Z), for real-to-complex and complex-to-real transforms the direction is clear.

The following:

module precision

  ! Precision control

integer, parameter, public :: Single = kind(0.0) ! Single precision

  integer, parameter, public :: Double = kind(0.0d0) ! Double precision

!integer, parameter, public :: fp_kind = Double

  integer, parameter, public :: fp_kind = Single

end module precision

program fft_test

  use precision

  use cufft

  implicit none

  complex(fp_kind) ,allocatable:: a(:),b(:)

  complex(fp_kind),device,allocatable:: a_d(:),b_d(:)

integer :: n

  integer:: plan

  integer :: x, ierr

n=2048

  ! allocate arrays on the host and device

  allocate (a(n),b(n),a_d(n),b_d(n))

  a=1

DO x=1,1024

     a(x)=a(x)+a(x)*6+1

  end DO

!copy arrays to device

  a_d=a

! Initialize the plan

  ierr = cufftPlan1D(plan,n,CUFFT_C2C,1)

! Execute FFTs

  ierr = cufftExecC2C(plan,a_d,b_d,CUFFT_FORWARD)

  ierr = cufftExecC2C(plan,b_d,b_d,CUFFT_INVERSE)

! Copying results back to host

  b=b_d

! Printing initial array

  print *, 'Max error: ', maxval(abs(a-b/n))

!releasing memory on the host and device

  deallocate (a,b,a_d,b_d)

! Destroying the plan

  ierr = cufftDestroy(plan)

end program fft_test

compiles and runs fine:

~/Desktop $ pgf90 cufftForum.cuf -lcufft

~/Desktop $ ./a.out 

 Max error:    3.4385216E-06

I changed it to single precision since I am using a laptop.

Thank you.The code is executing well.

@fortran

@njuffa

I can think i have an older version on the machine and it doesn’t have a built-in cufft module.

I went back and checked and i had actually missed the “end module precision” while writing the code here.

But now the problem that i’m facing is that the results of the FFT that are being published are wrong.I checked the results by coding for the same program in IDL and the CUDA results are showing a lot of deviation.

Any insights into why this is happening would be of great help!

The code is as:

module precision

! Precision control

integer, parameter, public :: Single = kind(0.0) ! Single precision

integer, parameter, public :: Double = kind(0.0d0) ! Double precision

!integer, parameter, public :: fp_kind = Double

integer, parameter, public :: fp_kind = Single

end module precision

module cufft

integer, public :: CUFFT_FORWARD = -1

integer, public :: CUFFT_R2C = Z'2a' ! Real to Complex (interleaved)

integer, public :: CUFFT_D2Z = Z'6a' ! Double to Double-Complex

interface cufftPlan1d

subroutine cufftPlan1d(plan, nx, type, batch) bind(C,name='cufftPlan1d')

use iso_c_binding

integer(c_int):: plan

integer(c_int),value:: nx, batch,type

end subroutine cufftPlan1d

end interface cufftPlan1d

interface cufftDestroy

subroutine cufftDestroy(plan) bind(C,name='cufftDestroy')

use iso_c_binding

integer(c_int),value:: plan

end subroutine cufftDestroy

end interface cufftDestroy

interface cufftExecR2C

subroutine cufftExecR2C(plan, idata, odata, direction) &

bind(C,name='cufftExecR2C')

use iso_c_binding

use precision

integer(c_int),value:: direction

integer(c_int),value:: plan

complex(fp_kind),device:: idata(*),odata(*)

end subroutine cufftExecR2C

end interface cufftExecR2C

end module cufft

program fft_test

use precision

use cufft

real(fp_kind) ::x1

real,parameter :: pi=3.1415

real(fp_kind),allocatable ::a (:)

complex(fp_kind) ,allocatable:: b(:)

complex(fp_kind),device,allocatable:: b_d(:)

real(fp_kind),device,allocatable :: a_d(:)

integer:: n,plan,i

n=1024

! allocate arrays on the host

allocate (a(n),b(n))

! allocate arrays on the device

allocate (a_d(n))

allocate (b_d(n))

DO x=1,1024

     x1=i*pi/180

     a(i)=sin(x1)+cos(x1)

  end DO

!copy arrays to device

a_d=a

! Print initial array

print *, "Array A:"

print *, a

! Initialize the plan

call cufftPlan1D(plan,n,CUFFT_R2C,1)

! Execute FFTs

call cufftExecR2C(plan,a_d,b_d,CUFFT_FORWARD)

! Copy results back to host

b=b_d

print *, "Array B"

print *, b

!release memory on the host

deallocate (a,b)

!release memory on the device

deallocate (a_d,b_d)

! Destroy the plan

call cufftDestroy(plan)

end program fft_test

This code shouldn’t compile, I am not sure how you are getting results.

In the cufftExecR2C() interface both arrays are declared as complex, but for a real-to-complex transform the input array should be real.

Also, real-to-complex/complex-to-real transforms shouldn’t have CUFFT_FORWARD as an argument.

If you want to have the functionality of switching between single and double precision via changing only fp_kind, you can define a generic interface for cufftExec that will resolve to either cufftExecR2C if the arrays are single precision or cufftExecD2Z when they are double precision:

interface cufftExec

     subroutine cufftExecR2C(plan, idata, odata) &

          bind(C,name='cufftExecR2C')

       use iso_c_binding

       use precision

       integer(c_int), value:: plan

       real(Single), device :: idata(*)

       complex(Single), device:: odata(*)

     end subroutine cufftExecR2C

subroutine cufftExecD2Z(plan, idata, odata) &

          bind(C,name='cufftExecD2Z')

       use iso_c_binding

       use precision

       integer(c_int), value:: plan

       real(Double),device:: idata(*)

       complex(Double),device:: odata(*)

     end subroutine cufftExecD2Z

  end interface cufftExec

In the main code you would just call cufftExec after initializing the appropriate plan:

if (kind_fp == Single) then

     call cufftPlan1D(plan,n,CUFFT_R2C,1)  

  else

     call cufftPlan1D(plan,n,CUFFT_D2Z,1)  

  end if

  call cufftExec(plan,a_d,b_d)

Hope this helps.