kernel only executes successfully once, then cudaMemcpy segfaults

sundog314 · March 31, 2009, 3:56pm

Hi all-

Looking for some advice from the gurus…I have a host function that repeatedly calls a kernel. The kernel is executed once at the start of the program, then iteratively in a for-loop. The basic flow is as follows:

/////////////////////////////////////
// initial call
cudaMalloc( (void**)&in, sizeof(in) );
cudaMemcpy( in_d, in, sizeof(in), cudaMemcpyHostToDevice );
cudaMalloc( (void**)&out, sizeof(out) );
cudaMemcpy( out_d, out, sizeof(out), cudaMemcpyHostToDevice );
dim3 dimBlock( BLOCKSIZE );
dim3 dimGrid( GRIDSIZE );
my_kernel<<<dimGrid,dimBlock>>>( in_d, out_d );
cudaMemcpy( out, out_d, sizeof(out), cudaMemcpyDeviceToHost );
cudaFree(in_d);
cudaFree(out_d);

// do something…
result[0] = func(out);

// iterative call
for ( i=0; i<n; i++ ) {
cudaMalloc( (void**)&in, sizeof(in) );
cudaMemcpy( in_d, in, sizeof(in), cudaMemcpyHostToDevice ); // **** segfaults here ****
cudaMalloc( (void**)&out, sizeof(out) );
cudaMemcpy( out_d, out, sizeof(out), cudaMemcpyHostToDevice );
dim3 dimBlock( BLOCKSIZE );
dim3 dimGrid( GRIDSIZE );
my_kernel<<<dimGrid,dimBlock>>>( in_d, out_d );
cudaMemcpy( out, out_d, sizeof(out), cudaMemcpyDeviceToHost );
cudaFree(in_d);
cudaFree(out_d);
// do something…
result[i] = func(out);
}
/////////////////////////////////////

under gdb I get the following backtrace:
Program received signal SIGSEGV, Segmentation fault.
0x000000000080b490 in cudaMemcpy () from /usr/local/cuda/lib/libcudart.so
Missing separate debuginfos, use: debuginfo-install gcc.x86_64 glibc.x86_64 zlib.x86_64
(gdb) where
#0 0x000000000080b490 in cudaMemcpy () from /usr/local/cuda/lib/libcudart.so
#1 0x000000000042060f in ga_host (obs=0x1001100, qs=0x1001000, wv=0x1000f00, xopt=0x7fffc7869b90, fopt=0x7fffc7869d54) at ga_host.cu:278
#2 0x0000000000414064 in invert (proc_id=1, scanline=278, pvec=0x7fffc7869de0) at ga_main.cu:494
#3 0x0000000000414503 in main () at ga_main.cu:70

I can’t seem to figure out why it would crap out only on the first time through the loop, especially since I explicitly free all the device memory I allocate and copy through the host. I’d appreciate any input whatsoever…I’m so close to getting my first CUDA code up and running I can taste it. Thanks!

st3fan82 · March 31, 2009, 4:35pm

sorry i am not a guru. ;)
at first you cudaMalloc pointer “in” but in_d seems to be your device pointer where did you allocate it?
besides cudaMalloc only allocates device memory you shpould use new or malloc for host memory

and before the second call you have deallocated the memory with cudaFree()
I think this should be the problem.

Jamie_K · March 31, 2009, 9:14pm

You have several problems. I think some of your cuda calls are probably returning error codes. Check that first.

Does the first time (outside the loop) give you correct results?

Topic		Replies	Views
second kernel call results in segmentation fault and other annoying problems CUDA Programming and Performance	6	2238	March 15, 2009
Problem with cudaMalloc CUDA Programming and Performance	4	10141	October 29, 2008
cudaFree, segmentation fault CUDA Programming and Performance	4	3677	July 29, 2009
Segmentation fault when using cudaMemcpy CUDA Programming and Performance	0	1892	May 17, 2009
strange problem accessing device memory cudaMalloc and cudaMemcpy CUDA Programming and Performance	0	2312	April 2, 2010
cudaMemcpy from device to host and Segmentation Fault cudaMemcpy Segmentation Fault CUDA Programming and Performance	2	5596	December 9, 2008
cudaMemcpy seg fault Segmentation fault copying array CUDA Programming and Performance	2	7504	July 22, 2011
cudaMemcpy error Segmentation fault when executed CUDA Programming and Performance	7	3198	September 6, 2009
Problems with cudaMemcpy CUDA Programming and Performance	2	2613	February 5, 2013
Segmentation Violation while using cuMemcpyHtoD CUDA Programming and Performance	3	1597	March 18, 2009

kernel only executes successfully once, then cudaMemcpy segfaults

Related topics