Nano occasional crash - strange output pattern

Occasionally I see the Nano crash with this output pattern. Does the pattern give a clue as to the reason why the device crashed? It happens once every few weeks with one of my devices. I have no idea of the cause, it occurs at seemingly random intervals. Hope you can help!

I should add, that overall the image looks grey, this is a close up of the pixels on the screen. This image represents around 10% of the width of a 1080p output.

You see this on the HDMI screen without any reason?

Did you run any application on your jetson nano when you saw this?

I am running our opengl based application, which normally runs just fine, but occasionally this happens. Sometimes soon after starting the device (and our application) other times it will run for days and days and this won’t happen at all.

it seems like a very specific pattern, something that I am sure our application is not generating.

I can still access the device via SSH when this occurs, but I cannot use the device as the screen is filled with this pattern.

a power cycle gets everything running again just fine. But I would really like to know what causes this pattern

From ssh (or scp, the ssh version of cp) or serial console you will probably want to save a copy of your “/var/log/Xorg.0.log” and post that here. The log should give more details. You could also save a copy of “dmesg” via:
dmesg 2>&1 | tee log_dmesg.txt

Xorg.0.log (188.5 KB)

not sure if the messages in here are timed, but i saw this pattern on the device around 6:53 am which is around 90minutes after i would have switched the device on today

regarding dmesg I tried your command over SSH

sudo mesg 2>&1 | tee log_dmesg.txt

and i get

mesg: ttyname failed: Inappropriate ioctl for device

?

Another important piece of information is when this happens, if I connect through SSH I can close our application and restart it, but the grey pattern persists. like it overlays everything on the desktop.

FYI I have seen this happen on both:
Nvidia Nano developer kit booting from microSD
Nvidia Xavier NX plus Auvidea JNX30 Carrierboard booting from internal eMMC with additional M.2 SSD

After trying a few things I discovered that :

sudo pkill X

does get rid of the grey pattern, and returns me to the login page on the device

this doesn’t really solve my problem, but i guess it gives us a hint where the issue is coming from?

It is “dmesg”, not “mesg”, hoping that was just a typo. Also, no need for sudo. There is another command “mesg”, but this is incorrect (use “dmesg”).

I see this in the Xorg log:

[     7.137] (WW) NVIDIA(0): BMD Blackmagic (DFP-0) does not have an EDID, or its EDID does
[     7.137] (WW) NVIDIA(0):     not contain a maximum image size; cannot compute DPI from
[     7.137] (WW) NVIDIA(0):     BMD Blackmagic (DFP-0)'s EDID.

…and an EDID is mandatory. I don’t know about all of that message, but at least this display will fail to auto configure for lack of EDID. Is this an actual HDMI monitor without adapters?

Later it is just an infinite loop of this log message:

[  2638.038] (--) NVIDIA(GPU-0): BMD Blackmagic (DFP-0): External TMDS
[  2640.053] (--) NVIDIA(GPU-0): BMD Blackmagic (DFP-0): connected
[  2640.053] (--) NVIDIA(GPU-0): BMD Blackmagic (DFP-0): External TMDS
[  2642.057] (--) NVIDIA(GPU-0): BMD Blackmagic (DFP-0): connected
[  2642.057] (--) NVIDIA(GPU-0): BMD Blackmagic (DFP-0): External TMDS

So everything in error seems to be related to the “BMD Blackmagic” on DFP-0. The start of the X server seemed ordinary until it got to this display.