hello
when i train neural net with gpu, sometimes the pc restarts without any message. i am running ubuntu 16.04 on pytorch.
please, help
Regards
Milton
hello
when i train neural net with gpu, sometimes the pc restarts without any message. i am running ubuntu 16.04 on pytorch.
please, help
Regards
Milton
First of all, check the kernel logs for anything just before the time of reboot. Then check the power consumption while the GPU is heavily utilized using nvidia-smi command. You may be nearing the power load that your PSU can supply.
Your PSU should have a factor of safety of around 150 Watts. For example, if your GPU can consume 200Watts, the box may say “Recommended minimum 450 Watts PSU”. That means you really want a 450+150 = 600Watts PSU.
hello
Thanks for reply. i have 2 1080ti whose power is (320w+250w). my ps is 850w. when i use single gpu, it still shows same restart automatically sometimes.
how can i view the kernel log?
regards
milton
If you are on systemd, journalctl.
For example, if last reboot was within the last 5 hours, type (as root):
journalctl --since “5 hours ago”
If you are not using systemd, check /var/log/kernel.log
You can always consult your distribution on reading log files so it neither option works, just ask them for instructions on log locations :)