Unspecified launch failure

Greg_Poirier · October 5, 2010, 3:47pm

Hello,

I’ve recently run into the following error:

from cudaGetErrorString(cudaGetLastError())

I realize that this is usually to be considered a “segmentation fault,” but I can’t explain it that way either. Here are snippets of the code in question:

The kernel:

    attributes(global) subroutine fft_kernel( Nq1, Nq2, Ngrid, Na, Nmode, Nind, Nline, AqqRealDev, AqqImgDev, phaseDev, TermIndexDev, AindDev )

        implicit none
        integer, parameter :: nspace = 3
        integer, value  :: Ngrid, Na, Nmode, Nind, Nq1, Nq2, Nline
        integer :: ii, jj, kk, inz
        real*4                              :: phasefactor
        real*4, dimension(Ngrid, Na, Na) :: phaseDev
        real*4, dimension(Nline,0:2*nspace) :: TermIndexDev
        real*4, dimension(-Nind:Nind) :: AindDev
        real*4, dimension(Nmode,Nmode,Nmode) :: AqqRealDev
        real*4, dimension(Nmode,Nmode,Nmode) :: AqqImgDev
        real*4                          :: tmp

        inz = blockIdx%x * blockDim%x + threadIdx%x
        ii = TermIndexDev(inz, 1)
        jj = TermIndexDev(inz, 2)
        kk = TermIndexDev(inz, 3)

        phasefactor = phaseDev(Nq1, TermIndexDev(inz, 4), TermIndexDev(inz, 6)) + phaseDev(Nq2, TermIndexDev(inz, 5), TermIndexDev(inz, 6))

        tmp = AindDev(TermIndexDev(inz, 0)) * cos(phasefactor)
        AqqRealDev(ii, jj, kk) = AqqRealDev(ii,jj,kk) + tmp

    end subroutine fft_kernel

I’ve verified that the code executes up until the last assignment to AqqRealDev. I tried changing the last few lines to:

    tmp = AindDev(TermIndexDev(inz, 0)) * cose(phasefactor)
    tmp2 = AqqRealDev(ii,jj,kk)
    tmp3 = tmp2 + tmp
    AqqRealDev(ii,jj,kk) = tmp3

If I comment out the last line, the code executes without error. If I run it as above, I get the unspecified launch failure again.

Ideas? Have I missed something glaringly obvious?

Greg_Poirier · October 6, 2010, 6:06am

Further information…

I compiled with device emulation mode and made sure that everything was okay in the debugger. All of my device variables have reasonable memory addresses… they’re indexable… If I run the compiled Fortran CUDA code, it executes the kernel, returns from it, and then seg faults when it’s copying from device back to host. So, I can only assume that at some point it’s still seg faulting, but I can’t figure out where. It looks like it’s maybe an off-by-one or something random somewhere. I’ll post more after I narrow it down.

As always, tips are appreciated.

Topic		Replies	Views
Unspecified launch error CUDA Programming and Performance	2	1299	January 29, 2010
Need help with cuda error: "unspecified launch failure" CUDA Programming and Performance	0	923	July 28, 2011
Unspecified launch failure CUDA Programming and Performance	2	5735	May 24, 2009
unspecified launch failure CUDA Programming and Performance	2	1153	March 18, 2009
Other causes of Unspecified Launch Failues CUDA Programming and Performance	2	2580	May 15, 2010
unspecified launch failure CUDA Programming and Performance	1	4507	October 12, 2008
Unspecified launch failure 4 kernel calls CUDA Programming and Performance	11	5201	April 2, 2008
unspecified launch failure: This error is in cudaMemcpy CUDA Programming and Performance	9	23320	February 16, 2010
cudaSafeCall() Runtime API error in file <main.cu>, line 76 : unspecified launch failure I am CUDA Programming and Performance	2	11101	July 6, 2009
unspecified launch failure CUDA Programming and Performance	1	2250	May 20, 2009

Unspecified launch failure

Related topics