Tesla and Watchdog

The watchdog is NOT applicable for TESLA series of products, right?

i.e.

I can run a kernel with theoretically infinite execution time on a TESLA, without fearing being knocked off by watchdog timer, Am I right?

I think it would be wise to say if you want to know for linux or windows.

The watchdog is on only if the device is used for display. The tesla can’t be used for displaying, so the watchdog isn’t activated with tesla, I guess.

But you can also have a second card running the display and your normal cuda card running cuda kernels for more than 5 seconds. There’s a lot of threads here talking about that : for example

How does Linux or windows matter? I think with X on, Linux also has the watchdog problem.

But since TESLA has NO graphics hardware in it, the watchdog should not matter to it at all, right?

Well, linux & windows differ a lot in the graphics area. If you don’t have your graphics card in your X configuration file, you will never ever have watchdog trouble, since X does not know about the card, but you would have to make device nodes yourself by running a script.

I remember that for easy setting-up the Tesla cards were added to the X configuration file, so X does the driver loading & device node creation for you.

With windows, it could be that all devices controlled by the graphics driver are controlled by the watchdog, I have no clue, but according to tim that is not true luckily.

TESLA as an X device ??? Grrr…

Can some1 from NVIDIA clarify? I have a call with a customer tonight. It would be nice to make a sharp statement on TESLA than dish out sloppy statements.

Please help.

Moreover, In the thread mentioned by Tim – I see that people have run into watchdog issues even when the Graphics card was NOT primary display… Can some1 from NVIDIA help here?

Thank you

Well, you don’t NEED to make Tesla an X device. Just boot the machine in runlevel 3 and use the shell script that NVIDIA provides in the release notes (does nobody read these?) to create the device nodes. If you never even load X, there cannot possibly be a timeout.

Thanks for your reply.

What irks me is that TESLA is a compute-only device and why would I have to put up with all these sundry issues.

I think NVIDIA’s display driver is the one that supports TESLA. I see that in their download page – THe driver says it supports TESLA bla bla…

I really wish NVIDIA clarifies this. Any1 out there… Helllooooo…Heluuuuuuuu…Cant u hear the voice of this drunken monk???

These issues still exist because the Tesla is still a “GPU”. NVIDIA only added a small amount of silicon to the G80 series to add compute capabilities. The primary market is still the graphics card market. Even though Tesla is marketed at a compute only device, the silicon is the essentially the same as the G80.

If you are really that concerned about these watchdog issues, test it out yourself and see what results you get. Otherwise, my Tesla system will be operational in ~2 weeks and I can post the results.

Edit: I forgot to mention. If you want a true compute-only device and not a GPU check out clearspeed.

I have heard about clearspeed. Unfortunately, I dont have a TESLA card with me @ the momment.

Anyway, Whether they add silicon or reduce it – you still cant display anything on a TESLA, right?

I would see TESLA as a “Super-Threaded Co-Processor” than a GPU. What do you think?

Tesla is just an Graphics card without the possibility to connect a monitor. No outputs

If the watchdog only is enabled on GPUs with an output then there must be no problem. But I don’t know what is happening if you have a Tesla and a 8800GTX which both use the same driver but I think that if you have a (dirty word) ATI 3950 or what ever they have :P there shouldn’t be a problem.

But then your Tesla will not work in windows, since you cannot load more than 1 gfx driver…

that is why I don’t work with windows for this kind of things

wouldn’t entirely say clearspeed is not a gpu, remember it tried to be last time it failed as the pixelfusion gpu and now has sort-of ieee compliant floating point units added. but yes, it (still) doesn’t display anything.
of course it doesn’t always compute much either - since the compute capability is also generally limited from actual programmers unless you run very large dgemms. And though they seem in the habit of publishing fiction particularly about other gpus (they don’t like to remember themselves as one?), it’s probably believable where they publish that 2 of their now 4-year old chips on an ~$6,000 card slows MOLPRO down compared to CPU, but if you group 12 together in a $70K box it does actually get faster that a CPU - and, no display).

I just finished building a system with 2xTesla, 1xGeforce fx5200 PCI (to power the display) on an EVGA 780i SLI board, running Fedora Core 8.

I made a simple kernel that does busy work and attempted to check for a watchdog while running X, and was able to get a Tesla to attempt a computation for a very long time (minutes, at least) before I gave up and killed the process. No watchdog, it appears.

Thanks for your time. Glad to know you were able to run for minutes… I myself has run in 8800 GTX for nearly 9 mins without a watchdog… But sometimes, it strikes…

So, if you could run your computation for nearly 25 mins and still not get a watchdog timeout – I would be happy about it…

Best Regards,

Sarnath