I am using Totalview to debug some software. It, I assume, is using cuda-memcheck type functionality to check for memory errors.

the debugger has stopped in one of my kernels with an “Lane User Stack Overflow”

What is this implying? the line that it stops on is:

d_usKernel[i*cols + j] = cuCmulf(d_usKernel[i*cols + j], d_rr[i*cols + j]);

with i and j being 863 and 57 respectively, and these arrays are 1025x1024 rowsxcols

So what should I be looking into to fix this problem?

So apparently this is an error that can refer to an out of bounds array. It seems like I shouldnt have this problem with these arrays, since they are allocated as 1024x1024, and I dont have an error doing that operation.