Hi all,
I am currently working on a CUDA-based application written in C++.
What my app does is roughly
1.) acquire a 2-dimensional image
2.) apply some filters to enhance quality
3.) run a detection algorithm, and
4.) create a binary representation of objects missing/present in the image.
Steps 2. - 4. are realized as CUDA kernels; i.e. each is being executed in parallel, respectively.
At some points in my code, I make use of the command “cudaThreadSynchronize”.
What I am experiencing now is that my software crashes after I run the “cycle” (steps 1. - 4.) several times (using the same image as input), in a seemingly random fashion: Sometimes the crash occurs after 3 cycles, sometimes after 50, sometimes after 213.
I spent quite some time debugging it to pinpoint the code line which produces the crash. Apparently, the crash always occurs at the same code line, which looks like this:
CUDA_SAFE_CALL(cudaMemcpy(h_Array,d_Array,size,cudaMemcpyDeviceToHost));
I tried “isolating” that line by adding a “cudaThreadSynchronize” before/after, but that didn’t help either.
I should also mention that when it crashes, memory leaks are reported. However, I spent a lot of time on checking the non-CUDA C++ code for memory leakage, and I’m convinced now that the memory leaks are merely a symptom of the crash, but not the cause.
Does anyone have an idea what could be the cause for this weird, non-deterministic behaviour? Any hint is highly appreciated.
Thanks a lot in advance,
-cfm-