Multi-GPU Stress Testing


I am getting a rack-mounted multi-GPU system built and one of the things that was offered was a certain amount of burn-in testing to ensure that the system was stable before delivery. The only real condition was that I tell them what tests I wanted running. So, I was wondering if anyone knew of a good CUDA multi-gpu stress test that I can get them to run to hammer the cards. Bonus points if it monitors temperatures and calculation stability, or can run off a bootable USB drive.

So far the best I have found is but the last comment was quite a while ago so I am throwing the question out to you guys.


i would think that a multi-node - cpu/ gpu - system is merely a collection of single nodes, with the option of collectiveness - dependency and inter-dependency

i would also think that the stress test is most interested in temperature and power stability, at max load

hence, i would conclude that
a) inter-dependency is not really critical as part of the test
b) any test that can push all nodes to work at max load, at the same time, should suffice

c) i am not sure whether max load would imply a memory intensive or compute intensive task
i have heard DP can be power intensive; i may have heard wrong