Dell T3600 reboots when running MATLAB GPU Code on GTX TITAN X

Hi all,

First time poster here. I hope I have all the information you need. I am debugging this problem for one of our professors.
The machine in question is a Dell Precision T3600 with the latest BIOS, A14 09/29/2014.

64GB memory
GeForce GTX TITAN X
1TB SSD
Ubuntu 14.04.3
Cuda 7.5
Driver Version: 352.39
MATLAB 2015A

Driver and CUDA installed via NVIDIA’s CUDA repository.
CUDA Toolkit 11.7 Update 1 Downloads | NVIDIA Developer
dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.deb
apt-get update
apt-get install cuda

When we run the sample MATLAB GPU code below on the ‘GeForce GTX TITAN X’, the machine reboots. Black screen reboot, no warning!

sample MATLAB code
----snip----
n = 13000;
clz = ‘single’;
A = gpuArray.rand(n, n, clz) + 100*eye(n, n, clz);
b = gpuArray.rand(n, 1, clz);
x = A\b;
----snip-----

The same code on the same machine with the same driver with a ‘Quadro 4000’ reports a out of memory error. (see below)

—snip—
Out of memory on device. To view more detail about available memory on the GPU, use ‘gpuDevice()’. If
the problem persists, reset the GPU by calling ‘gpuDevice(1)’.

Error in test (line 3)
A = gpuArray.rand(n, n, clz) + 100*eye(n, n, clz);
----snip-----

If I remove the Q4000 and re-install the TITAN X and change ‘n’ in the sample code to 7000, it works no reboot. After more testing I was able to discover the following.

n = 7000 - works
n = 8000 - works
n = 9000 - works
n = 10000 - reboot
n =13000 - reboot

So what is causing the reboot?? Should I be getting a out of memory error or some other error instead of a reboot!! Any suggestions?

nvidia-bug-report.log.gz (71.1 KB)

Power supply is 635W, the largest for this model.

The power supply does seem like the most likely culprit. The Titan X recommends a 600W minimum, so 635W is pushing it. Do you have a different power supply you could try?

paulg_ca
Did you try to Blacklist all nouveau driver and start grub with nomodset ? .

How can 600W be the recommended minimum for a single card? I always thought that an 8 Pin connector can deliver 150W max. A Titan X has two which makes it 300 W plus I think 75 Watts from the PCIe connector. So in total such a card could draw 375 Watts if my assumptions are correct, but it will probably be less. TDP is 250W. There should be enough left to power the CPU, a spinning disk, the motherboard etc.

I would also suspect the PSU, though. It could have a malfunction and is not able to deliver steady power.

Personally I use a 550W PSU for my 2x 8 Pin GTX 580 and Core i7 both overclocked by 25%. Rock solid for 4 years even under heavy load. It’s an 80Plus Silver certified one.

The problem is the PSU! The T3600 can’t drive the GTX TITAN X. Thank you for your suggestions.

12v rail amperage is the only real important thing when it comes to power supplies; total wattage is a red herring. Most manufacturers wind up recommending 200-300W more than you actually need because of the likelihood that the 12v rail won’t make up the >80% of your total wattage that it realistically should.

(Speaking as someone happily running a Titan X on 450w because I have 36A on 12v and 12 * 36 = 432W that can actually be used by the power-hungry parts like the CPU and GPU.)

Hey, paulg_ca, I have exactly same problem, which PSU did you purchase? Would T5600’s 825W PSU work?

Hi paul_ca and 201power,

did you manage to find a proper Dell PSU that fits into the tower? I already contacted Dell support and they state that the 625W PSU is already the biggest for the T3600 line. Has anyone tried to put another PSU, such as the mentioned T5600’s one or one of the 1000W server PSUs, into the T3600?