Memory failure when using gangs loop

Hi,

I have a nested loop where in the outer loop I want to use gang, worker and in the inner vector parallelization. The program runs fine when I set num_gangs to 1, but gives a failure if it is larger,e.g. 2.

Failing in Thread:6
call to cuStreamSynchronize returned error 700: Illegal address during kernel execution

Unfortunately when I try to make a simple example to replicate the error it does not occur.

Any idea where this could come from would be highly appreciated.

Kind regards,
Rob

Hi Rob,

Hard to tell since this is a generic error but given it only fails when you use more gangs, my best guest would be a heap or stack overflow, or possibly using a large amount of private data.

Does the compute region contain allocation or subroutine calls with automatics?
Are there many subroutine calls with multiple call depths?
Does the kernel have large private arrays?

-Mat

Hi Mat,

There are many subroutine calls with sometimes pretty deep call depths. There are no large private arrays or allocations.
Any idea how to debug it? I tried cuda-gdb but only got a backtrace to a return statement that does only return a boolean, so it didnt seem to be the culprit.

Regards,
Rob

While not definitive, this does point to a stack overflow. Though, when using cuda-gdb, per the CUDA docs, you should see a “stack overflow” error if this was the case.

I’d first start by increasing the stack size by either calling “cudaSetLimits” or via the environment variable “NV_ACC_CUDA_STACKSIZE”. There’s still a hard limit to the stack size. However I’ve not been able to confirm it’s exact size since I believe it’s device dependent, but think it’s 64MB but you can try higher values as well.

Just in case you do have an automatic array declared someplace, you might also want to try a larger heap size by either again using cudaDeviceSetLimit or via the environment variable “NV_ACC_CUDA_HEAPSIZE”. There’s no software limit on the heap size.

If either of these don’t seem to help, I’d start commenting out the calls to try an narrow down which level or particular routine is causing the issue. It’s still possible that the cause is for something else, like an out-of-bounds access error, so if you can narrow down where the error is coming from, it may help determine the root cause.

Hope this helps,
Mat