Testing 4 Tesla T4s

Hi ,

 I am trying to test 4 Tesla GPUS all at the same time. I am using CUDA 11, Driver version 450.51.06 on Ubuntu 20.04.

I am using a superMicro X11 motherboard. This is the command I use to start nbody :
./nbody -benchmark -numdevices=2 -numbodies=100000

When I check nvdia-smi, After about 5 mins the Cards 2 and 3 drop out of the test and show 0% usage.

Does any one have any idea why it does that ? Any help would be appreciated. Let me know if I missed something.

You say you want to use four Tesla T4 simultaneously, but the nbody invocation line shows -numdevices=2 instead of -numdevices=4.

Assuming that’s just a typo, it is not clear exactly how the GPUs “drop out of the test”. GPUs suddenly becoming inoperable after brief runtime usually means one of two things: (1) Overheating (you can check temperature with `nvidia-smi), or (2) Insufficient power supply.

These devices are passively cooled. Is adequate air flow across the GPUs provides by the fans in the enclosure? Each GPU is specified for a power draw of 75W. Your system has four of them, and guessing the other system components, you will likely need a power supply unit rated for 750W for this system. What is the power rating of the PSU actually used?