limitation on recursive operations in a kernel?

Hello Experts,
Appreciate if anybody could clarify the following question.

The situation is mentioned below:
I launch a kernel in which I perform following steps:

  • perform a million computations (using random number functions from CURAND) in parallel and write the results in memory locations on device.
  • I repeat the above mentioned step for certain number of iterations.
    Now the question is: the program calculates results successfully for say: 10 million paths * 10 iterations.
    However, if I increase number of iterations, to say 1000, then the program exits … I can see: cudaError_enum at memory location 0x0017f160…
    The question is : why such error?
    Is this error has to do anything with Kernel instruction limitation size?

Thank you.

Regards,
PanPizza.