As part of my continuing effort to make more of my internal tools for system testing available to you guys, here’s a burn-in test I wrote for GT200-based systems. It performs DGEMMs on every capable device simultaneously until device memory is filled and will repeat if you want. It also checks the results of each individual DGEMM to help you track down general stability problems. Time to completion varies widely with options, so feel free to take a look.
It requires CUDA 2.1, because it uses the ability to poll for an active watchdog timer (you can guess who the major proponent of this was). Like most of what I do, it’s Linux only for the moment, although I’m in the process of porting it to Windows. Compile with
2.1 final is out (STILL probably not on the website, but check the CUDA announcements forum for a link). These will eventually be included somewhere, just trying to figure out the right place for that.
Tim, excellent tool!
I had thought about making a burnin test myself, but I am very lazy and never did anything.
Do you think DGEMM has a good cascading behavior, so one small error in a memory or compute will get magnified to make the error obvious?
I thought I might use an FFT as a basis since a single sample error would create a delta function on input, which propagates to all frequencies of the FFT. (Hmm, but that wouldn’t magnify the magnitude of the error, ideally it should be a nice feedback that makes it grow.)
Big extra points to anyone who whips up a script to iterate over various memory and shader clocks and use this test to make a Shmoo plot of your card’s stability regions.
This begs the question: Is there a way to install CUDA in Linux without an X installation on the system? The nvidia driver installer insists on it by default. Is there a switch to override? There is often no reason for a headless compute server to run X.
Change the default runlevel in /etc/inittab from 5 to 3. Then xdm won’t start. Since X also creates the /dev/nvidia* devices for you, you’ll have to use the script in the Release Notes to create these device files at boot time.
No it doesn’t. I’ve installed the stock nvidia driver dozens of times on boxes without X installed.
It asks if you want to update some OpenGL library and it doesn’t really matter if you say yes or no. The library can be installed even if no one can use it.