Bizarre nvcc segfault

Hi all,

I’m retooling a program to run on GPUs and suddenly while bugshooting some kernel failures, my code started segfaulting on a file I wasn’t working on. I have my code rigged so that I can run the program without the CUDA implementation depending on my ./configure flags, and the file fun_function.c (which is the same with/without CUDA) runs just fine if compiled with gcc… compiling with nvcc seems to cause a segfault when I run the program. Here’s the relevant code and the output when it fails… I think you will agree with me that this is very strange (nothing GPU-based is actually called here):

fun_function.c (fun_function() is called from main.c)

fun_function()

{

  int fin = 2;

  int start;

  for (start=1; start<=fin; start++) 

  {

    printf("loop commenced\n");

    [blah blah blah]

    printf("JUST finished loop\n");

    printf("endloop: start = %i, fin = %i\n", start, fin);

  }

  printf("end function");

}

and the terminal output:

loop commenced

JUST finished loop

endloop: start = 1, fin = 2

loop commenced

JUST finished loop

endloop: start = 2, fin = 2

Segmentation fault

There is absolutely no code between printf(“endloop: start = %i, fin = %i\n”, start, fin); and printf(“end function”); so the program seems to be segfaulting on its return to the FOR loop? And again, this piece of code runs perfectly when compiled as

gcc -O3 -c fun_function.c

instead of

nvcc -O3 -arch=sm_20 -c fun_function.c

Is this a bug with nvcc? If anyone has seen anything similar, is there something behind the scenes that could be breaking or do you think I could just reinstall everything CUDA to get rid of this? (I’m running CUDA 4.0 on Ubuntu 10.10 with a GTX 460, using the most up-to-date drivers from the NVIDIA website, 275 or something instead of the 270 something given on the CUDA-specific webpage.)

Thanks!

S

Just tested a fresh Ubuntu/CUDA install and got the same problem.

EDIT: The beginning of a (long) valgrind output:

[…]

endloop: start = 2, fin = 2

==22897== Invalid read of size 4

==22897== at 0x8051D5A: main (in /home/socrates/Documents/program_folder/bin/program)

==22897== Address 0x43546a4 is 8 bytes after a block of size 4 alloc’d

==22897== at 0x40251F2: calloc (vg_replace_malloc.c:467)

EDIT: valgrind was picking up on a memory error elsewhere in the program, and I fixed that so now this is the beginning of the valgrind output:

endloop: start = 2, fin = 2

==27093== Invalid read of size 4

==27093== at 0x40279F6: memcpy (mc_replace_strmem.c:635)

==27093== by 0x48C8A3B: ??? (in /usr/lib/libcuda.so.275.21)

==27093== by 0x48EAA35: ??? (in /usr/lib/libcuda.so.275.21)

==27093== by 0x489A430: ??? (in /usr/lib/libcuda.so.275.21)

==27093== by 0x489A770: ??? (in /usr/lib/libcuda.so.275.21)

==27093== by 0x48774FB: ??? (in /usr/lib/libcuda.so.275.21)

==27093== by 0x4879762: ??? (in /usr/lib/libcuda.so.275.21)

==27093== by 0x486C13C: cuMemcpy3D_v2 (in /usr/lib/libcuda.so.275.21)

==27093== by 0x4073DEB: ??? (in /usr/local/cuda/lib/libcudart.so.4.0.17)

==27093== by 0x405C302: ??? (in /usr/local/cuda/lib/libcudart.so.4.0.17)

==27093== by 0x407F36A: cudaMemcpy3D (in /usr/local/cuda/lib/libcudart.so.4.0.17)

==27093== by 0x805D344: cuda_calloc_3d_array(void****, unsigned int, unsigned int, unsigned int, unsigned int) (array_cuda.cu:80)

Why do I get a segfault from a completely different function between these two printf statements?