Seg fault during second cuda kernel call

sanf · January 26, 2012, 7:17am

Hi,

The code gets segmentation fault in the initial stage of execution. After debugging the code with cuda-gdb, it shows following message:

The debugging info is as follows:

(cuda-gdb) step 

Single stepping until exit from function cudaSetupArgument, 

which has no line number information. 

[Launch of CUDA Kernel 0 (AssignInputs<<<(1,1,1),(1,1,1)>>>) on Device 0] 

[Launch of CUDA Kernel 1 (AssignNodes<<<(1,1,1),(1,1,1)>>>) on Device 0] 

Program received signal CUDA_EXCEPTION_10, Device Illegal Address. 

[Switching focus to CUDA kernel 1, grid 2, block (0,0,0), thread (0,0,0), device 0, sm 1, warp 1, lane 0] 

0x000000000d5a9520 in AssignNodes<<<(1,1,1),(1,1,1)>>> () 

(cuda-gdb) step 

Single stepping until exit from function _Z11AssignNodesP5_rootPdPi, 

which has no line number information. 

Program terminated with signal CUDA_EXCEPTION_10, Device Illegal Address. 

The program no longer exists.

Can somebody tell what went wrong here? Is there any tool to debug further?

Thanks

cmaster.matso · January 26, 2012, 11:54am

Could You provide Your kernel and host code, should it be not a close source External Image ?

Regards,
MK

sanf · January 27, 2012, 8:15am

Following is the code snippet:

{

R *r, *r_d;

P *p;

N *n;    // R is the main struct. P & N are sub structures of R. Also, N is linkedlist

// The linked list N is created inside CUDA by using the arrays which are copied from host

double *nAr,*nAr_d;

int numNAr;

nAr = (double *) malloc(1*sizeof(double));

while(!feof(fp))

{

       nAr = (double *)realloc(nAr, (i+1)*sizeof(double));

       fscanf(fp, "%lf", &nAr[i]);

       i=i+1;

}

fclose(fp);

numNAr = i/2;

cudaMalloc( (void **) &nAr_d, numNAr * 2 * sizeof(double));

cudaMemcpy(nAr_d, nAr, numNAr * 2* sizeof(double), cudaMemcpyHostToDevice);

AssignKernel2<<<1,1>>> (root_dev, nAr_d, numNAr);

cudaThreadSynchronize();

....

}

The code for AssignKernel is as follows:

__global__ void AssignKernel2(R *r_d,double *nAr_d,int numNAr_d)

{

        int i,iN, dInc, numNb, nT=0;

        double    bSX, bSY, bSZ;

N *n;

        P *p;

        p = r_d->p;

r_d->nNKM = 0;

p->nC = 0;

r_d->nNQ = (N *)NULL;

n = (N *) malloc(1*sizeof(N));

r_dev->nKs = (N **) malloc(((numNAr_d) + 5) * sizeof(  N *));

for(i=0; i<numNAr_d; i++)

        {

                p->nC = para->nC + 1;

if(nT >= r_d->nNKM)

                {

                        ENKs(r_d, r_d->nNKM + 1);

                }

                n->mT.idx = nT;

n->numNb = 2;

                n->x = nAr_d[2*i];

                n->y = nAr_d[2*i+1];

                n->x = n->x * p->b;

                n->y = n->y * p->b;

}

}

The AssignKernel2 is the 2nd cuda kernel. There is no error in 1st cuda kernel call.

The cuda-gdb shows following information:

215			while(fgetc(fp)!='.');

(cuda-gdb) step 10

dim3 (this=0x7fffe70672b0, vx=1, vy=1, vz=1) at /usr/local/cuda/bin/../include/vector_types.h:497

497	    __host__ __device__ dim3(unsigned int vx = 1, unsigned int vy = 1, unsigned int vz = 1) : x(vx), y(vy), z(vz) {}

(cuda-gdb) step

dim3 (this=0x7fffe70672c0, vx=1, vy=1, vz=1) at /usr/local/cuda/bin/../include/vector_types.h:497

497	    __host__ __device__ dim3(unsigned int vx = 1, unsigned int vy = 1, unsigned int vz = 1) : x(vx), y(vy), z(vz) {}

(cuda-gdb) list

492	/*DEVICE_BUILTIN*/

493	struct dim3

494	{

495	    unsigned int x, y, z;

496	#if defined(__cplusplus)

497	    __host__ __device__ dim3(unsigned int vx = 1, unsigned int vy = 1, unsigned int vz = 1) : x(vx), y(vy), z(vz) {}

498	    __host__ __device__ dim3(uint3 v) : x(v.x), y(v.y), z(v.z) {}

499	    __host__ __device__ operator uint3(void) { uint3 t; t.x = x; t.y = y; t.z = z; return t; }

500	#endif /* __cplusplus */

501	};

(cuda-gdb) step

0x0000000000401800 in __device_stub__Z12AssignKernel1P5_rootPd (__par0=0x0, __par1=0x7fffe70676e0) at Main.cudafe1.stub.c:1

1	#include "crt/host_runtime.h"

(cuda-gdb) step

0x0000000000401810	2	#include "Main.fatbin.c"

(cuda-gdb) step

0x00000000004017a0 in ?? () at ll_kernel.cu:114

warning: Source file is more recent than executable.

114	/*

(cuda-gdb) step

0x00000000004017b0	1	#include "crt/host_runtime.h"

(cuda-gdb) step

0x00002b3bf5963930 in ?? () from /usr/local/cuda/lib64/libcudart.so.4

(cuda-gdb) step

Single stepping until exit from function cudaSetupArgument, 

which has no line number information.

[Launch of CUDA Kernel 0 (AssignKernel1<<<(1,1,1),(1,1,1)>>>) on Device 0]

Number of Ns = 100 

[Launch of CUDA Kernel 1 (AssignKernel2<<<(1,1,1),(1,1,1)>>>) on Device 0]

Failed to read the virtual PC on CUDA device 0 (error=10).

(cuda-gdb) step

Failed to read the virtual PC on CUDA device 0 (error=10).

(cuda-gdb) continue 

Continuing.

Failed to read the virtual PC on CUDA device 0 (error=10).

(cuda-gdb) q

The program is running.  Exit anyway? (y or n) y

Here its not showing any line number, which is causing the error. How to get that info?

The cuda-memcheck output is as follows:

# cuda-memcheck ./gpu_debug.exe 

========= CUDA-MEMCHECK

Number of Ns = 100 

========= Error: process didn't terminate successfully

========= Out-of-range Shared or Local Address

=========     in ll_kernel.cu:AssignKernel2

=========     by thread (0,0,0) in block (0,0,0)

=========

========= ERROR SUMMARY: 1 error

Please let me know how to diagnose it further.

vyas · February 2, 2012, 1:59am

Hi sanf, could you post the entire source of the application? Also, which version of the toolkit and the driver are you using ? Finally, have you tried enabling cuda memcheck within cuda-gdb ? You can do this by typing “set cuda memcheck on” at the cuda-gdb prompt before starting the application.

tera · February 2, 2012, 1:37pm

Compile your code with [font=“Courier New”]-G0[/font] before starting cuda-gdb for more useful output from it. And make sure you do not touch the source code after compiling and before starting cuda-gdb, or cuda-gdb will show you the wrong source code.

Topic		Replies	Views
second kernel call results in segmentation fault and other annoying problems CUDA Programming and Performance	6	2165	March 15, 2009
kernel call issue CUDA Programming and Performance	1	3462	December 23, 2009
Segmentation fault (core dumped) CUDA Programming and Performance	4	13098	May 13, 2017
Segmentation Fault in kernel.cu CUDA Programming and Performance	6	5161	July 13, 2011
CUDA segmentation Fault Error in Cudastream CUDA Programming and Performance	3	2748	April 24, 2017
Cuda-gdb segmentation fault when stepping through program CUDA-GDB nvbugs	4	1868	March 7, 2023
CUDA C++ Segmentation Fault CUDA Programming and Performance	7	14814	October 1, 2017
segmentation fault at the first cudaMalloc with --device-emulation everything was fine CUDA Programming and Performance	10	4326	January 25, 2010
Correct output with emulation mode, wrong with GPU/Execution CUDA Programming and Performance	6	3340	March 25, 2010
Segmentation Fault on calling cudaMalloc - I can't figure out why CUDA Programming and Performance	1	2051	November 12, 2015

Seg fault during second cuda kernel call

Related topics