limitations on repeatitive computation?

Hello Experts,
Appreciate if anybody could clarify the following question.

The situation is mentioned below:
I launch a kernel in which I perform following steps:

  • perform a million computations (using random number functions from CURAND) in parallel and write the results in memory locations on device.
  • I repeat the above mentioned step for certain number of iterations.
    Now the question is: the program calculates results successfully for say: 10 million paths * 10 iterations.
    However, if I increase number of iterations, to say 1000, then the program exits … I can see: cudaError_enum at memory location 0x0017f160…
    The question is : why such error?
    Is this error has to do anything with Kernel instruction limitation size?

Thank you.


I’m not aware of any iteration limits; I’d advise double checking for memory leaks

I’m having a problem like that. I’m performing some operations in a big a array inside a kernel and everything seems right. But when I try to iterate the same operations like 100 or more times the program just freezes and sometimes with all my computer.

Did you found something that can causes an instruction overload or something ?

Problems can often be due to a much different cause than they seem. The bug may always be there but only manifests itself for a certain number of iterations.
There is a limit on the maximum # of instructions issued (can’t recall exactly what it is) but this is likely a bug. I’ve ran simulations of a similar kind with more paths and iterations without any problems.
I’ll advise you to run your program through cuda-memcheck. If you compile your program with debug info, cuda-memcheck will even show you the line number where a memory access violation occurs.

Is the CUDA device running your kernel also drawing your GUI? Usually if your computer appears to freeze when running a long kernel, it is because the display can’t update until the kernel is finished. There is a ~5 second watchdog timer that will terminate your kernel if it goes too long. (Note that this does not apply to devices which are not running your monitor.)

Dear all,

Thanks for your suggestions so far. However, the problem still remains.
@Seibert: Yes, I did check that possibility. Therefore, I used my second GPU to perform all the computations. However, the situation is the same.

Since I am using GPU with compute capability 2.1, concurrent exection of the kernels is possible. May be I should check that possibility.
Please let me know how I can achieve that. The documentation says:
“… Programmers can globally disable asynchronous kernel launches for all CUDA applications running on a system by setting the CUDA_LAUNCH_BLOCKING environment variable to 1. This feature is provided for debugging purposes only and should never be used as a way to make production software run reliably…”

So, how can I do for practical purpose (no debugging mode)

I wanted to check memory leaks and therefore, I tried to use CUDA visual profiler. However, I could not do so. For 15 to 17 runs, it’S everything fine. For the 18th run, the visual profiler gives error stating column-1 could not be found !?! The .csv files are of 0 KB !!!

Awaiting your responses.