The Xorg log looks normal in the beginning, and even loads the keyboard and a lot of Xorg extension modules. The early log lines noting a monitor are somewhat normal, but at that point it hasn’t actually tried to use the monitor or find its EDID. Some of the errors or warnings at this point don’t actually matter and can be ignored. It is at the end of the log file it gets really interesting. It should not have stopped here (the first two lines of that excerpt just show it is working normally):
[ 29.143] (II) event5 - NVIDIA Jetson AGX Xavier APE Headset Jack: is tagged by udev as: Keyboard Switch
[ 29.143] (II) event5 - NVIDIA Jetson AGX Xavier APE Headset Jack: device is a keyboard
[ 30.459] (--) NVIDIA(GPU-0): NVIDIA (DFP-0): connected
[ 30.459] (--) NVIDIA(GPU-0): NVIDIA (DFP-0): External TMDS
No EDID detect is attempted. It just stops. The dmesg log shows EDID attempt, but it is Xorg which should be causing that query and the note of failure. So I wonder why Xorg itself does not even mention EDID? Here is something from dmesg:
[ 7.094439] tegradc 15200000.display: dc_poll_register 0x41: timeout
[ 7.094593] tegradc 15200000.display: timeout waiting for win assignments to promote
[ 7.094751] tegradc 15200000.display: tegra_nvdisp_head_enable, failed head enable
[ 7.326476] tegra_cec 3960000.tegra_cec: Can't find physical address.
[ 7.326631] tegra_cec 3960000.tegra_cec: tegra_cec_init Done.
[ 7.613277] tegradc 15200000.display: hdmi: edid read failed
[ 7.613494] tegradc 15200000.display: hdmi: using fallback edid
[ 7.613642] tegradc 15200000.display: blank - powerdown
[ 7.623765] tegradc 15200000.display: unblank
[ 7.678431] tegradc 15200000.display: dc_poll_register 0x41: timeout
[ 7.678435] tegradc 15200000.display: dc timeout waiting for DC to stop
[ 7.730431] tegradc 15200000.display: dc_poll_register 0x41: timeout
[ 7.730435] tegradc 15200000.display: dc timeout waiting for DC to stop
[ 7.782438] tegradc 15200000.display: dc_poll_register 0x41: timeout
[ 7.782443] tegradc 15200000.display: timeout waiting for postcomp init state to promote
[ 7.834435] tegradc 15200000.display: dc_poll_register 0x41: timeout
[ 7.834439] tegradc 15200000.display: timeout waiting for win assignments to promote
[ 7.834442] tegradc 15200000.display: tegra_nvdisp_head_enable, failed head enable
[ 7.834460] tegradc 15200000.display: update windows ret = -14
[ 7.834477] tegradc 15200000.display: sync windows ret = -14
[ 7.836938] extcon-disp-state external-connection:disp-state: cable 53 state 1
[ 7.837143] Extcon HDMI: HPD enabled
[ 7.838139] tegradc 15200000.display: hdmi: plugged
Some other errors from a previous boot would have caused this, or else disk failure:
[ 8.729441] Found dev node: /dev/mmcblk0p1
[ 8.879074] EXT4-fs (mmcblk0p1): 3 orphan inodes deleted
[ 8.879910] EXT4-fs (mmcblk0p1): recovery complete
[ 8.886515] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)
[ 8.893789] Rootfs mounted over mmcblk0p1
Something must have gone really wrong with the i2c circuit which runs the EDID query over the HDMI/DisplayPort; I say “really” wrong because it does not show failure, it just locks up the Xorg server and it neither fails out nor continues.
I suspect that the hardware is ok, but the damage to the filesystem which resulted in orphan nodes has hit something important. Maybe power failure or removing the power without proper shutdown caused this; even a lock up forcing manual power off could cause this. Normally such a recovery of nodes does not result in complete failure, but I’m guessing you got unlucky.
If a flash did not solve this, then normally I would think that the wrong device tree was used (one intended for a different carrier board). You said this is a dev kit, and so NVIDIA’s JetPack/SDK Manager should have solved that and it would be the correct device tree. If the carrier board is not from NVIDIA, then it would be the wrong tree and this could cause a failure of EDID.
This is a very important question: The earlier posted log from dmesg or serial console boot contains an edid log message. When you have the Xorg log, the one which just “stops” and doesn’t continue, is this from the same flash which has the edid dmesg? If dmesg or serial console boot log did not actually contain a hint of edid in it for the Xorg log which stopped, then my debugging would be completely wrong. I’m establishing that EDID query was attempted where the o/s knew, but the Xorg locked up before logging.
I am assuming that under the circumstances you may have had to have tried to boot and cycle power when you couldn’t get in, and so orphan node recovery may be unrelated to the original failure. It could still be an issue. Any time you can get in with ssh or serial console I suggest trying to shut down with “sudo shutdown -h now”. If not, don’t worry about it too much.