Under Jetpack 4.2, when I initialize tensorflow and load a model, sometimes Xavier AGX shutdowns abruptly, and I cannot find any diagnostics whatsoever to investigate the problem.
According to jtop, temperature is around 50C, so it does not seem that a HW shutdown due to overheating takes place.
Also system memory and other resource usage seem normal.
I have tried both default and MAXN power mode (but the target has been flashed under default configuration, not the maxn variant).
Perhaps it ran out of RAM. Use another computer, e.g., via ssh or serial console, and monitor something like htop or: watch -n 1 free -h
…see if this approaches zero spare near the moment of shutdown.
perhaps running out of ram could be mitigated with use of swap files
the default swapfile seems to be 8gb which can be resized or monitored e.g. with system monitor or
You should also know that operations which require physical memory will not be helped by swap. CUDA operations using the GPU would run out of RAM and fail in the same way even if you add swap. So it is good to determine first if this is a case of not enough RAM.
If this is not enough RAM, then some aspect of the program might be changed to use less RAM (fewer concurrent kernels for example).
I remember that at very early stage tensorflow wouldn’t even install on Jetson TX if there was no SWAP file or if it was small. It would poweroff or reset.
Another issue is GPU memory limitations. Also RAM limitations might be a separate issue. @krikun.daniel are you using tensorflow_cpu or tensorflow_gpu installation?