Need to remove timeouts and the "launch timed out and was terminated" message

I have been given an assignment to calculate the number of numbers between 0 and 1111111111111111111111111 in which the sum of the digits does not exceed 25 (ie “12345” = 1+2+3+4+5 = 15 which is ok but “999” = 9+9+9 = 27 which is bad). I don’t need help with the solution. I have a solution written in C, but a loop from 0 to a number that large will probably take a YEAR on my PC. I’m trying to use CUDA to see if it will help.

I ported my C code and got it running using CUDA but I keep getting this error after about 5 or 10 seconds:
“cudaSafeCall() Runtime API error : the launch timed out and was terminated.”

How do I turn off timeouts? I want the thing to run, no matter how long it takes. Ideally, I’d like to poll my kernel every 60 seconds to make sure it’s still going, but it should never time out.

-greg-

You cannot turn them off. If your card runs a display manager, it will have a 5 second watchdog timer on any kernel. The only work-arounds are to use a dedicated card for computation or not use a display manager at all (both are possible under linux), or break the work up into small, short running kernels and run them many times. It seems that the second strategy should be possible, just provide a subset of the search space to each kernel and run lots of kernel executions. You can keep the accumulating results in GPU memory and periodically copy them back and write them to disk. Depending on what card you are using, you can even overlap the copying and kernel execution so that the copying and disk IO is effectively free.

You cannot turn them off. If your card runs a display manager, it will have a 5 second watchdog timer on any kernel. The only work-arounds are to use a dedicated card for computation or not use a display manager at all (both are possible under linux), or break the work up into small, short running kernels and run them many times. It seems that the second strategy should be possible, just provide a subset of the search space to each kernel and run lots of kernel executions. You can keep the accumulating results in GPU memory and periodically copy them back and write them to disk. Depending on what card you are using, you can even overlap the copying and kernel execution so that the copying and disk IO is effectively free.

You don’t need CUDA for that. In fact, you don’t even need a PC. And I’m almost sure you were supposed to solve this with paper and pencil.

You don’t need CUDA for that. In fact, you don’t even need a PC. And I’m almost sure you were supposed to solve this with paper and pencil.

Although this is an interesting approach, I doubt your instructor had this solution in mind. :) I think you can solve this problem recursively as well without CUDA.

Although this is an interesting approach, I doubt your instructor had this solution in mind. :) I think you can solve this problem recursively as well without CUDA.

You don’t need CUDA for that. In fact, you don’t even need a PC. And I’m almost sure you were supposed to solve this with paper and pencil.

You don’t need CUDA for that. In fact, you don’t even need a PC. And I’m almost sure you were supposed to solve this with paper and pencil.

Ha! There’s not enough pencils in the WORLD but maybe I can make a font that looks like pencil scribblings and write and app to print them out :-)

Ha! There’s not enough pencils in the WORLD but maybe I can make a font that looks like pencil scribblings and write and app to print them out :-)

I know, but I want to learn CUDA :-)

Today I learned there’s a 5-second timeout thingy! Who knows WHAT I’ll learn about CUDA tomorrow! Thanks.

I know, but I want to learn CUDA :-)

Today I learned there’s a 5-second timeout thingy! Who knows WHAT I’ll learn about CUDA tomorrow! Thanks.

Well, that’s annoying but you gave me some good ideas. I guess my Mac has a display manager. I’ll install Ubuntu Linux on it and try it there. If that doesn’t work, I think I just need a smarter approach to this problem. I’m trying to learn CUDA programming, so this HAS been useful to far…

Thank you!

Well, that’s annoying but you gave me some good ideas. I guess my Mac has a display manager. I’ll install Ubuntu Linux on it and try it there. If that doesn’t work, I think I just need a smarter approach to this problem. I’m trying to learn CUDA programming, so this HAS been useful to far…

Thank you!

To somewhat improve the estimate from your first post, a decent 3GHz quad-core will roughly take 1.11e24/3e9/3600/24/365/4 = 3 million years for the loop (unless the compiler optimizes away empty loops).
Don’t become too frustrated if you can’t optimize the CUDA version of the loop to finish within your life.

To somewhat improve the estimate from your first post, a decent 3GHz quad-core will roughly take 1.11e24/3e9/3600/24/365/4 = 3 million years for the loop (unless the compiler optimizes away empty loops).
Don’t become too frustrated if you can’t optimize the CUDA version of the loop to finish within your life.

[url=“http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx”]http://www.microsoft.com/whdc/device/displ...dm_timeout.mspx[/url]

[url=“http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx”]http://www.microsoft.com/whdc/device/displ...dm_timeout.mspx[/url]

This is one Windows feature which is superior to Linux… in Linux the watchdog timer is hardwired and cannot be disabled like it can be in Windows.

I should probably say the watchdog is hardwired in the X server, not Linux per se. Though in fact even that may be wrong… it may be in NVIDIA’S Linux drivers.