Limitation to number of loop iterations?

Hi everyone,

I have a simple for loop going from 0 - N (N being 7000 - 10000). The code does some work on a CSR packed matrix (multiplies the elements in each column by their corresponding vector elements), each time incrementing tid (where tid = threadIdx.x + offset) by the number of elements in a particular column (offset). The for loop seems to work just fine - however, when the number of elements in my CSR packed matrix is greater than 1,000,000, it seems to bomb out towards the end of the matrix. And by bomb out, I mean, it starts returning zeros instead of the correct results.

I’m using a GTX 570 with 1.28GB global memory. In this particular instance, shared memory is not being used for calculations - though I plan to move what I can to shared memory after I solve this issue.

Any help or insight would be greatly appreciated.

Are you checking error codes from CUDA functions? (That way you can tell the kernel has aborted without having to deduce it from output.)

It is quite likely that you have hit the watchdog timer that prevents a kernel from running longer than a few seconds if the same GPU is also being used to render the GUI display.

I wasn’t checking error codes using CUDA functions - I’ll check that out tonight and see what the actual error is. I have the windows “feature” that kills GPU programs that run longer than two seconds turned off (I did this when I installed nSight). But, I wouldn’t be surprised if you are correct. After all, the GPU is powering my display as well. Do you know if there is a built-in NVIDIA watchdog timer, beyond that of the Windows watchdog timer?

Thanks for your response, much appreciated!

I implemented the following code to capture any errors:

cudaError_t err = cudaGetLastError();

printf("Last error message: %s\n", cudaGetErrorString(err));

But unfortunately, it just reports back “no error”. I placed the above code after the kernel invocation call and again after the memcpy… btw, the program is taking under 100ms to do all of this work.

Thoughts would be appreciated.