Limitation to number of loop iterations?

fender177 · June 6, 2011, 2:08am

Hi everyone,

I have a simple for loop going from 0 - N (N being 7000 - 10000). The code does some work on a CSR packed matrix (multiplies the elements in each column by their corresponding vector elements), each time incrementing tid (where tid = threadIdx.x + offset) by the number of elements in a particular column (offset). The for loop seems to work just fine - however, when the number of elements in my CSR packed matrix is greater than 1,000,000, it seems to bomb out towards the end of the matrix. And by bomb out, I mean, it starts returning zeros instead of the correct results.

I’m using a GTX 570 with 1.28GB global memory. In this particular instance, shared memory is not being used for calculations - though I plan to move what I can to shared memory after I solve this issue.

Any help or insight would be greatly appreciated.

seibert · June 6, 2011, 4:32pm

Are you checking error codes from CUDA functions? (That way you can tell the kernel has aborted without having to deduce it from output.)

It is quite likely that you have hit the watchdog timer that prevents a kernel from running longer than a few seconds if the same GPU is also being used to render the GUI display.

fender177 · June 6, 2011, 4:45pm

I wasn’t checking error codes using CUDA functions - I’ll check that out tonight and see what the actual error is. I have the windows “feature” that kills GPU programs that run longer than two seconds turned off (I did this when I installed nSight). But, I wouldn’t be surprised if you are correct. After all, the GPU is powering my display as well. Do you know if there is a built-in NVIDIA watchdog timer, beyond that of the Windows watchdog timer?

Thanks for your response, much appreciated!

fender177 · June 6, 2011, 10:10pm

I implemented the following code to capture any errors:

cudaError_t err = cudaGetLastError();

printf("Last error message: %s\n", cudaGetErrorString(err));

But unfortunately, it just reports back “no error”. I placed the above code after the kernel invocation call and again after the memcpy… btw, the program is taking under 100ms to do all of this work.

Thoughts would be appreciated.

Topic		Replies	Views
Loop limit in CUDA kernel ? Too large loop => loop not launched CUDA Programming and Performance	12	6982	June 12, 2011
CUDA limit for loops..? too large number of iterations? CUDA Programming and Performance	28	27589	March 20, 2008
kernel fails over many iterations CUDA Programming and Performance	1	2293	November 25, 2011
loop inside a kernel How many interrations? CUDA Programming and Performance	3	3240	July 20, 2009
Why does this programm crash CUDA Programming and Performance	3	1917	March 12, 2009
limitations on repeatitive computation? CUDA Programming and Performance	5	818	August 14, 2011
CUDA crashing when I iterate too many times CUDA Programming and Performance	0	3868	January 10, 2010
thread sudden death ? CUDA Programming and Performance	4	3887	January 20, 2012
Filter Problem (the launch timed out and was terminated) CUDA Programming and Performance	4	2441	December 15, 2009
CUDA double for loop inside kernel CUDA double for loop limit CUDA Programming and Performance	7	2124	October 14, 2010

Limitation to number of loop iterations?

Related topics