Restarts when running tensorflow

https://github.com/tensorflow/tensorflow/issues/8858

Is NVIDIA aware of this? It seems like a driver issue.
I was able to mitigate some restarting by limiting the power to 150W according to the link. But it still restarts sometimes during high intensive tensorflow training.

Dismayed to have my brand new computer run into this issue when training intensive workloads.
Thanks.

The original poster in that thread subsequently (4-5 months later) reported this:

Now I feel confident that the issue is related to power supply.
I changed the power supply from “Corsair CX750 Builder Series ATX 80 PLUS” to “Cooler Master V1000” and don’t get the system crash anymore.”

Later, another commenter said:

Our issue turned out to be an issue with power supply as well

Later, a 3rd commenter said:

this is a power supply issue,”

Your own datapoint indicates that your observation is affected by modifying the power demand of the card.

This doesn’t look like a driver issue to me.

The same code works on windows, same machine. That is why I said that I “felt” that it was a driver issue.
Just because the newer power supply works doesn’t mean that there is no issue - it could be just that the higher cap power supply was able to absorb the power spike.

Has NVIDIA actually investigated this?

I don’t know. If you’d like NVIDIA to investigate something, the approach I recommend is to file a bug at http://developer.nvidia.com

Sorry, where exactly do I file the bug? Is this the one?
http://surveys.nvidia.com/index.jsp?pi=6e7ea6bb4a02641fa8f07694a40f8ac6

You can use that link if you wish. It logs to an internal system, although it is not the same as our internal bug database. If you want to “file a bug” as I indicated, use these steps:

  1. Go to developer.nvidia.com
  2. If you are not already a registered developer, register to become a registered developer (“Join” link in upper right hand corner.)
  3. Wait until your registration is approved. Typically less than 48 hours.
  4. Once registration is approved, log in using your credentials. (login link in upper right corner)
  5. In the upper right hand corner, use the drop-down by your name to click on “my account”
  6. On the left hand side click on “My Bugs”
  7. On the right hand side click on “Submit a new bug”

I don’t have any particular reason to recommend one approach over the other, for this issue. If it were me personally, I would use the bug system, but that is because I use the bug system (internally) every day.

Thanks, I filed the bug report.