Hello,
I’ve recently run into the following error:
Unspecified launch failure
from cudaGetErrorString(cudaGetLastError())
I realize that this is usually to be considered a “segmentation fault,” but I can’t explain it that way either. Here are snippets of the code in question:
The kernel:
attributes(global) subroutine fft_kernel( Nq1, Nq2, Ngrid, Na, Nmode, Nind, Nline, AqqRealDev, AqqImgDev, phaseDev, TermIndexDev, AindDev )
implicit none
integer, parameter :: nspace = 3
integer, value :: Ngrid, Na, Nmode, Nind, Nq1, Nq2, Nline
integer :: ii, jj, kk, inz
real*4 :: phasefactor
real*4, dimension(Ngrid, Na, Na) :: phaseDev
real*4, dimension(Nline,0:2*nspace) :: TermIndexDev
real*4, dimension(-Nind:Nind) :: AindDev
real*4, dimension(Nmode,Nmode,Nmode) :: AqqRealDev
real*4, dimension(Nmode,Nmode,Nmode) :: AqqImgDev
real*4 :: tmp
inz = blockIdx%x * blockDim%x + threadIdx%x
ii = TermIndexDev(inz, 1)
jj = TermIndexDev(inz, 2)
kk = TermIndexDev(inz, 3)
phasefactor = phaseDev(Nq1, TermIndexDev(inz, 4), TermIndexDev(inz, 6)) + phaseDev(Nq2, TermIndexDev(inz, 5), TermIndexDev(inz, 6))
tmp = AindDev(TermIndexDev(inz, 0)) * cos(phasefactor)
AqqRealDev(ii, jj, kk) = AqqRealDev(ii,jj,kk) + tmp
end subroutine fft_kernel
I’ve verified that the code executes up until the last assignment to AqqRealDev. I tried changing the last few lines to:
tmp = AindDev(TermIndexDev(inz, 0)) * cose(phasefactor)
tmp2 = AqqRealDev(ii,jj,kk)
tmp3 = tmp2 + tmp
AqqRealDev(ii,jj,kk) = tmp3
If I comment out the last line, the code executes without error. If I run it as above, I get the unspecified launch failure again.
Ideas? Have I missed something glaringly obvious?