I am trying to use CUDA for hardware acceleration of an image processing application that has been developed. The code is originally written in C++. The issue I am having is that one my code enters the CUDA kernel, I get a segmentation fault. I run it in gdb and I get the following output:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6a27960 (LWP 25161)]
0xb669983e in ?? () from /usr/lib/libcuda.so
I tried to read the documentation about how to debug, but I got confused. I did try to run the code with emu=1 and dbg=1 and the code seems to run, but infinitely.
I read somewhere that my problem might have to do with memory fragmentation since the C wrapper function I call is in a for loop which is called many many times (not sure how many, but it is always being executed). Therefore the calls to cudaMalloc() and cudaFree() are called constantly and thus create fragmentation. The solution sugessted was to write my own malloc and free functions, but I am new to this and have no idea where to start.
ANyone have any ideas what the problem actually might be or have suggestions on how to solve it?