Segmentation Fault in kernel.cu

[font=“Tahoma”]Hi all, I’m still very green in CUDA programming.
I tried to parallelize some modules by using common CUDA steps.

  1. allocating the memory for the device (by using cudaMalloc)
  2. copy from the host to the device (by using cudaMemCpy)
  3. do whatever computation I need in the device
  4. copy it back to the host
  5. free the allocated memory (cudaFree)

I’ve successfully compiled the kernel.cu (creating kernel.o), however when I tried to run the executable file,
there is an error of Segmentation Fault.
Can you help me with this? what is the most common-sense reason that cause a “Segmentation Fault” error in CUDA?
is it the same in C language.
Thanks in advance…
I appreciate your help…

External Image [/font]

Hi!

I think (at 99% External Image you are accessing to unallocated memory.
Please, put your kernel code in a post.

The best method to starting with CUDA is: take the vectorAdd example of the SDK, change the memory accesses, the operations, … and activate the CUDA profiler.

Regards!

Hard to tell without some code.
I think thats not a cuda problem at all :)
Maybe we could solve the problem with the corresponding code fragments.

[font=“Lucida Sans Unicode”]Thanks first for the replies.

Actually I can’t post the code here because it’s about my internship work this summer at a design house near my campus. So it’s kinda privacy and copyrighted.
Moreover, the code doesn’t stand alone. It is used in other files (main.c and other else) and it uses many function from other file (let’s say additional.c).
My friend suggested me to do a rough debugging by sequentially put a “printf” message from line to line in order to find where the segmentation fault occurs.
I’ve done it, I’ve found where it is. It is right before a very big “while” (looping) which contains
cudaMalloc, cudaMemcpy, cudaFree, and the launching of a kernel.
But still I don’t know what to do. I feel I’ve arrange the code in such a way that it’s tidy enough…
Does this relate to the size of the block and grid that I choose? I use 256 thread per block…[/font]

External Image

Hi!

Ups, cudaMalloc() in a Loop? Check if all your cudaMalloc calls have their cudaFree call.

Hmm, I’ve checked every cudaMalloc and cudaMemcpy statement. Make sure that all of them of the correct size. I’ve tried to run it for a certain main.c and it works. However, when I tried to run it along wi
h other main.c (this main.c is the real target) it shows segmentation fault again. Somehow, when I tried to debug it using cuda-gdb, it shows certain line. But I’m not quiet sure whether this is the location of the bug. Moreover, debugging using cuda-gdb is kinda mind-boggling task. It TEXT based. It can only run when all X11 features being killed (and I’ve done it either).
What kind of techniques that you guys usually do when using cuda-gdb? Do the same methods of gdb also apply in cuda-gdb? External Image
Thanks a lot…

Any particular reason why cuda-gdb doesn’t work for you? Is it only because it is text based?

Yes, cuda-gdb is an extension of gdb that works for debugging both the host side and the device side of a cuda program.

Here’s the manual:

If your program is segfaulting, it is most likely due to a mistake in your host code. Running with cuda-gdb should show you where the segfault happens.

If it’s hapening in the cuda library, it is most likely due to an incorrect argument passed to the library. Look at the stack trace and examine the host parameters.