As part of my continuing effort to make more of my internal tools for system testing available to you guys, here’s a burn-in test I wrote for GT200-based systems. It performs DGEMMs on every capable device simultaneously until device memory is filled and will repeat if you want. It also checks the results of each individual DGEMM to help you track down general stability problems. Time to completion varies widely with options, so feel free to take a look.
It requires CUDA 2.1, because it uses the ability to poll for an active watchdog timer (you can guess who the major proponent of this was). Like most of what I do, it’s Linux only for the moment, although I’m in the process of porting it to Windows. Compile with
nvcc -o dgemmSweep -arch sm_13 dgemmSweep.cu -lcublas
Feedback is welcome.