Hi all-
Looking for some advice from the gurus…I have a host function that repeatedly calls a kernel. The kernel is executed once at the start of the program, then iteratively in a for-loop. The basic flow is as follows:
/////////////////////////////////////
// initial call
cudaMalloc( (void**)&in, sizeof(in) );
cudaMemcpy( in_d, in, sizeof(in), cudaMemcpyHostToDevice );
cudaMalloc( (void**)&out, sizeof(out) );
cudaMemcpy( out_d, out, sizeof(out), cudaMemcpyHostToDevice );
dim3 dimBlock( BLOCKSIZE );
dim3 dimGrid( GRIDSIZE );
my_kernel<<<dimGrid,dimBlock>>>( in_d, out_d );
cudaMemcpy( out, out_d, sizeof(out), cudaMemcpyDeviceToHost );
cudaFree(in_d);
cudaFree(out_d);
// do something…
result[0] = func(out);
// iterative call
for ( i=0; i<n; i++ ) {
cudaMalloc( (void**)&in, sizeof(in) );
cudaMemcpy( in_d, in, sizeof(in), cudaMemcpyHostToDevice ); // **** segfaults here ****
cudaMalloc( (void**)&out, sizeof(out) );
cudaMemcpy( out_d, out, sizeof(out), cudaMemcpyHostToDevice );
dim3 dimBlock( BLOCKSIZE );
dim3 dimGrid( GRIDSIZE );
my_kernel<<<dimGrid,dimBlock>>>( in_d, out_d );
cudaMemcpy( out, out_d, sizeof(out), cudaMemcpyDeviceToHost );
cudaFree(in_d);
cudaFree(out_d);
// do something…
result[i] = func(out);
}
/////////////////////////////////////
under gdb I get the following backtrace:
Program received signal SIGSEGV, Segmentation fault.
0x000000000080b490 in cudaMemcpy () from /usr/local/cuda/lib/libcudart.so
Missing separate debuginfos, use: debuginfo-install gcc.x86_64 glibc.x86_64 zlib.x86_64
(gdb) where
#0 0x000000000080b490 in cudaMemcpy () from /usr/local/cuda/lib/libcudart.so
#1 0x000000000042060f in ga_host (obs=0x1001100, qs=0x1001000, wv=0x1000f00, xopt=0x7fffc7869b90, fopt=0x7fffc7869d54) at ga_host.cu:278
#2 0x0000000000414064 in invert (proc_id=1, scanline=278, pvec=0x7fffc7869de0) at ga_main.cu:494
#3 0x0000000000414503 in main () at ga_main.cu:70
I can’t seem to figure out why it would crap out only on the first time through the loop, especially since I explicitly free all the device memory I allocate and copy through the host. I’d appreciate any input whatsoever…I’m so close to getting my first CUDA code up and running I can taste it. Thanks!