My computer is restarting when executing deep learning style transfer notebook pytorch 1.8a on 3090 Ubuntu Linux, then thing is that I don’t know how to get an error trace after restart, also it doesnt seem that is something about temperature because I have instaled the sensors and the temperature is low or normal.
Also last try was to test again seting some breakpoints inside vs code and inside the trainning loop, after some epoch, not sure which operation was hit but the computer restarted even when I had some breakpoints that where respected (also that means the temp was in control because running at speed of human checking data and hiting continue to next breakpoint).
I have raised an issue, but they believe it is from the wrapper library (fastai) PC restart without trace executing nb · Issue #51850 · pytorch/pytorch · GitHub I believe it is the combination of latest approved drivers for Ubuntu and compiling from source pytorch 1.8.
So people, do you know how to trace? debug? print or something to do with my program to be able to get the exact place where this restart is launched? it has been pretty hard to pinpoint the place.