I am currently running CUDA on 8800 GTS on SLAMD64 (a 64-bit slackware distribution) using the SUSE Linux Enterprise Desktop driver and Toolkit. When I went from using the driver
NVIDIA-Linux-x86_64-100.14.11-pkg2.run to
NVIDIA-Linux-x86_64-177.73-pkg2.run
and the toolkit:
NVIDIA_CUDA_Toolkit_1.0_sled_x86_64.run to
NVIDIA_CUDA_Toolkit_2.0_sled10sp1_x86_64.run
a CUDA routine that previously ran successfully now has a segmentation error. The routine uses close to 97% of the available memory of the GPU device and when running with valgrind I get
“Conditional jump or move depends on uninitialised value” in cudaMemset and cudaFree that hadn’t occurred before. Was there a change to the driver/Toolkit btw 1.0 and 2.0 that might cause this error? Also are there further things I might do to diagnose the problem?
Unspecified launch error is more often than not a segfault. Check that your memory is allocated via the error codes, and if that doesn’t help compile with -deviceemu and run valgrind.
All my CUDA calls are wrapped with the CUDA_SAFE_CALL() macro. I have received no error codes during memory allocation. I ran the code under valgrind after compiling with deviceemu. It executed successfully. The problem seems to be GPU memory sensitive. When I ran the code with a case where ~470 out of 670 MB of GPU memory were used it ran successfully. With a slightly different case where ~485 MB were used it had the segfault. What should I try next? I can send the code and the input dataset if that would be helpful.