System crash when running CUDA 7.5 on Titan X (Ubuntu 14.04)

Hi. I’m experiencing some strange crashes when using CUDA to training machine learning models on a Titan X. Instead of failing with an error message or segmentation fault, my machine shuts down with no error message or logs. Has anyone on here experienced something similar? It would be useful to know the kind of errors that can cause a crash like this.

Thanks,
Páidí

Perhaps an insufficiently dimensioned power supply?

Thanks, I think this might be the case. My power supply is 500W and the recommended minimum power supply for the Titan X is 600W. I only encounter the crashes when pushing very large chunks of data through larger models, so it would make sense if these were putting more pressure on the power supply.

Hi, I have exactly the same problem with GTX 980ti and cuda 7.5. My ubuntu 16.04 crashes when I’m running slightly larger volumes. My power supply is 630W though…

How did you solve your problem?

Thanks!

“Crashes” how?

reboot…

Spontaneous reboots could be a power issue. Note that statements about minimum recommended PSU wattage are just rough guidelines, as there could be any number of power-drawing components in a given system, besides the GPU, that we don’t know about. Plus there are PSUs of different quality, and like all electronics, PSUs age. In particular electrolytic capacitors will degrade, more so in hot environments.

The first thing to check is if there are any obvious interruptions in the power supply, such as power cable plugs not plugged in all they way, or PCIe cards not properly seated and secured in their sockets. There could be also be the possibility of cold solder joints, but unless you already know what those are, it would be difficult to explain how to spot them. Plus I have not one encountered one in many years, they seem to be rare in modern electronics manufacturing.

As for PSU recommendations, I usually suggest using those rated 80 PLUS Platinum (they are very efficient and often of superior quality overall), and sizing them such that the total nominal wattage of all components in the system (CPU, GPU, system memory, mass storage, other I/O) sums to about 60% of the PSU’s rated wattage.

Check you power pin connection.

thanks for the replies. I think the problem is in power shortage indeed. I’m running the same GPU code on other machine and it doesn’t cause any problems whatsoever.

Although I’m within the recommended limit (600W) for my GPU (I’ve got 630W), my 2 CPU’s have 12 threads each so they are power hungry… there must be a shortage somehow.

Regarding connection, I actually split my 6 pin into 8 + 6, so this might affect it somehow. Have no spare power connection due to 2 massive CPUs.

Thanks again,
regards
Dan

This is the likely culprit, you are running your PSU massively outside specifications. The 8-pin connectors are used for power supply strands specified for 150W each, while the the 6-pin connectors are used for power supply strands specified for 75W each. By splitting in the manner you describe, you are trying to load a part of the PSU designed to deliver 75W with up to 225W. That is begging for trouble, big time!

I would strongly suggest getting a PSU with the appropriate number and type of power connections.

Thanks! I was not sure plugging it this way but IT guys were certain that it’d be OK :) Anyway, I’m looking towards Corsair CP-9020044 860W, 80 PLUS as you suggested previously.

Splitting an 8-pin connection into an 8-pin and a 6-pin may work fine with a high quality PSU, by exploiting the engineering margins built into the PSU design, but even that should not be more than a temporary solution, IMHO. Splitting a 6-pin connection on the other hand …

I can’t give recommendations on specific PSU models, but a handy overview of the vast majority of 80 PLUS specified PSUs on the market can be found at http://www.plugloadsolutions.com/80PlusPowerSupplies.aspx

the entire meaning of 2 power connectors is to deliver more power from PSU to GPU, not just make more fancy wires inside the system box. if 6-pin wire will be enough to deliver enough power for your gpu, it would have exactly signle this wire

while some PSUs may deliver more power on the 6-pin wire than standards require, you should consult the PSU manuals to check that. overall, if your PSU doesn’t have enough wires, it’s exactly because it can’t deliver enough power, rather than because PSE maker doesn’t had a few extra wires at hand