Well…
in my cuda program, there is a kernel function including a “for loop” for float computations.
In general, the for loop iterates many times such as 2^20.
Actually, the exponent depends on the input data. … anyway… very big number.
By the way, my program sometimes does not execute the kernel function normally.
(the MPI version of the program always works. I mean the algorithm or something is correct.)
I have checked…
and now… I know some threads do not finish the “for loop” normally.
That is, they quit the loop suddenly. In each execution, the point where the threads stop working is different.
Run your program under [font=“Courier New”]cuda-memcheck[/font] to find wrong memory accesses. How long does the program run - could you be triggering the watchdog timer?
Because this is my first cuda programming, especially for big data, I have not been this kinds of GPU programming problem.
To be honest, I didn’t know “wathdog timer”, before your comments.