CUDA limit for loops..? too large number of iterations?

Fabixel · March 9, 2008, 8:55pm

Hi!

I’m currently working an a CUDA program which essentially executes the same loop over and over.

That is, every thread is doing something like this:

generate random number*
read from shared memory array
write to shared memory array

I’ll just write this in pseudo-code, since everything works quite well for most inputs.

However, if the program runs for too long, my whole memory seems to be obliterated.

Now I’m not really sure if this has anything to do with some intern CUDA looping limit, but currently, it’s my best guess. So if anyone can tell me whether such a limit really exists, the help would be appreciated!
Basically I just need to know whether this could be the source of my problem, or if there is no such limit and I have to look elsewhere.

*the random thing works with clock() calls… I’m not sure whether this can overflow, but this should be irrelevant… I think.

Sarnath · March 10, 2008, 9:11am

Therez a 5-second limit on CUDA kernels.

Are you running on Windows or Linux?

Regarding loop-limit - I dont know. Never seen this discussed in the forum.

kristleifur · March 10, 2008, 10:26am

I’d guess that you have a memory leak or buffer overrun or something of the sort. What about running the emulated version in Valgrind?

kristleifur · March 10, 2008, 10:27am

[deleted, double]

Fabixel · March 10, 2008, 11:19am

@Sarnath: 5 second limit? What exactly do you mean by that? I ask, because the (kernel) program already runs successfully for inputs which produce run times up to ten seconds.
And I’m currently working under Linux.

@kristleifur: I guess you could be right… I’m wondering though why it runs without disturbance on sligthly smaller problems, which are doing exactly the same, just not as often…

But I guess it’s time to get my hands dirty with this and give the emulator a shot.

jordyvaneijk · March 10, 2008, 1:27pm

I think that is the 5 seconds watchdog thingy… I heard that if you execute your program under windows on a GPU that is also feeding you monitor the program is killed after 5 seconds by a watchdog. this can also be the thing under Linux but I don’t know that for sure.

kristleifur · March 10, 2008, 1:31pm

It seems to be the same on Linux - if I program a bad kernel, my system hangs for around 5 seconds.

Sarnath · March 10, 2008, 1:53pm

but i have seen discussions in this forum talking about no-5sec-watchdog on Linux…

jordyvaneijk · March 10, 2008, 2:02pm

If you look on google they say there is some watchdog on linux but I don’t know if it is a standard thing or that you need to install it.

MisterAnderson42 · March 10, 2008, 5:22pm

With a single GPU:

There IS a 5 second watchdog when using X WINDOWS on Linux
There IS NO 5 second watchdog when NOT USING X windows on Linux (text only console w/o X running in the background)

With multiple GPUs:
… I don’t know because I don’t have a multi-GPU system :(

jordyvaneijk · March 11, 2008, 8:01am

Thanks for the clarification. I think I will insall Linux again or just kill my X windows…

Sarnath · March 11, 2008, 9:08am

Therez a way to stop X windows from popping up everytime you bring up your machine… No need to re-install.

Check out /etc/inittab OR ask some Linux expert.

kristleifur · March 11, 2008, 11:22am

GDM or KDM are usually the daemons that keep X11 alive. On my distribution, Ubuntu, I kill X11 by doing ‘sudo /etc/init.d/gdm stop’. If you want to semi-permanently disable X11 on bootup, you’ll have to munge the ‘/etc/rc?.d’ dirs, or your distribution’s equivalent. Check ‘runlevel’ if you want to see what condition your condition is in. (‘/etc/inittab’ is another way of configuring bootup services AFAIK.)

jordyvaneijk · March 11, 2008, 12:38pm

My computer restarts without any warnings just goes black and restarting computer. Is this the Watchdog? I should say he kills the process but here I get a soft reset. Is there any way to change the behavior of the watchdog?

MisterAnderson42 · March 11, 2008, 1:22pm

The watchdog should only kill the process, not reset the computer. I’ve never seen a CUDA program crash reset the whole system. Are you using the latest drivers? Have you upgraded to the latest motherboard BIOS?

Sarnath · March 11, 2008, 1:44pm

See http://forums.nvidia.com/index.php?showtopic=58436 where a linux machine hangs and then reboot coz of CUDA program.

MisterAnderson42 · March 11, 2008, 2:01pm

Yes, well what I meant is that I’ve never seen such a reboot personally with my own eyes. The person in that post is just asking for major problems when they try to run 120 infinite loops all at the same time.

jordyvaneijk · March 11, 2008, 2:04pm

Yes, and I don’t see any explanation for stopping the reboots. Only Wumpus came with something which I don’t quite understand. the termination criteria…

MisterAnderson42 · March 11, 2008, 2:43pm

Nevermind, nobody seems to see my point.

When was the last time you ran 120 processes all running infinite loops on your CPU? Was the system responsive enough that you could actually kill them? The head node on a cluster I use regularly runs up a load average of only 10 by being a file server for the nodes, and even that is enough to make the system so unresponsive that I cannot even run “top” after logging in.

The point is, when you push the system so far out of the realm of normal operating procedure you have to expect something to break. CUDA is very stable and the watchdog performs well when you run a reasonable number of applications/threads at once.

kristleifur · March 11, 2008, 3:20pm

I saw instant reboots when I used driver 169.09. Driver versions 169.07 and 169.12 were OK. Probably unrelated though.

Topic		Replies	Views
Loop limit in CUDA kernel ? Too large loop => loop not launched CUDA Programming and Performance	12	6982	June 12, 2011
The Cuda 5 Second execution-time limit Finding a the way to work around the GDI timeout CUDA Programming and Performance	24	12879	July 26, 2010
Time limit on Linux ? Is there one ? CUDA Programming and Performance	8	22978	September 9, 2007
Limitation to number of loop iterations? CUDA Programming and Performance	3	3471	June 6, 2011
5 seconds limitation is permanent ? CUDA Programming and Performance	9	13991	June 4, 2007
Limitation of the number of iterations in CUDA CUDA Programming and Performance	3	1490	March 23, 2010
Launch Timeouts CUDA Programming and Performance	32	22031	May 4, 2011
loop inside a kernel How many interrations? CUDA Programming and Performance	3	3240	July 20, 2009
problems with cuda on linux CUDA Programming and Performance	13	22308	May 16, 2007
Long Running Kernel Causes Freeze CUDA Programming and Performance	1	1596	March 8, 2009

CUDA limit for loops..? too large number of iterations?

Related topics