Board Occasionally Boots up with a Dark Screen on Jetpack 5.1.5

We have a custom carrier board with one of the two display port pins connected to a usb-c. The display resolution is 1024x600.

Now, before we were running jetpack 5.1.2 and we observed dark screen issue maybe once a quarter (no backlight at all). Now that after the jetpack 5.1.5 upgrade, we see it much more often, and most of the time after a UEFI capsule update and/or a kernel payload update. The frequency has increased to maybe once a week or more.

We were careful in doing the jetpack upgrade and made sure the device tree sections regarding the display were unchanged. (for example, disable-seamless was removed in 5.1.5 to match 5.1.2)

We have identified the intermediate cause of the dark screen as display port link training. The training consists of clock recovery (CR) and channel equalization (CE). When the display fails, most of the time it would be CE not done. In particular,

kernel: dp lt: switching from state 3 (channel equalization) to state 3 (channel equalization)
kernel: dp lt: state 3 (channel equalization), pending_lt_evt 0
kernel: dp lt: CE not done
kernel: dp lt: CE retry limit 5 reached
kernel: dp lt: switching from state 3 (channel equalization) to state 6 (reduce link rate)
kernel: dp lt: state 6 (reduce link rate), pending_lt_evt 0
kernel: dp lt: retry CR, lanes: 2, link rate: 0x6
kernel: dp lt: switching from state 6 (reduce link rate) to state 2 (clock recovery)
kernel: dp lt: state 2 (clock recovery), pending_lt_evt 0
kernel: dp lt: config: lane 0: vs level: 0, pe level: 0, pc2 level: 0
kernel: dp lt: config: lane 1: vs level: 0, pe level: 0, pc2 level: 0
kernel: dp lt: tx_pu: 0x20
kernel: dp lt: CR done
kernel: dp lt: switching from state 2 (clock recovery) to state 3 (channel equalization)
kernel: dp lt: state 3 (channel equalization), pending_lt_evt 0
kernel: dp lt: CE not done
kernel: dp lt: new config: lane 0: vs level: 1, pe level: 0, pc2 level: 0
kernel: dp lt: new config: lane 1: vs level: 1, pe level: 0, pc2 level: 0
kernel: dp lt: config: lane 0: vs level: 1, pe level: 0, pc2 level: 0
kernel: dp lt: config: lane 1: vs level: 1, pe level: 0, pc2 level: 0
kernel: dp lt: tx_pu: 0x30
kernel: dp lt: CE retry
...
kernel: dp lt: CE not done
kernel: dp lt: CE retry limit 5 reached
kernel: dp lt: switching from state 3 (channel equalization) to state 7 (reduce lane count)
kernel: dp lt: state 7 (reduce lane count), pending_lt_evt 0
kernel: dp lt: retry CR, lanes: 1, link rate: 0x6
kernel: dp lt: switching from state 7 (reduce lane count) to state 2 (clock recovery)
...
kernel: dp lt: switching from state 3 (channel equalization) to state 3 (channel equalization)
kernel: dp lt: state 3 (channel equalization), pending_lt_evt 0
kernel: dp lt: CE not done
kernel: dp lt: CE retry limit 5 reached
kernel: dp lt: switching from state 3 (channel equalization) to state 4 (link training fail/disable)
kernel: dp lt: state 4 (link training fail/disable), pending_lt_evt 0
kernel: dp lt: NULL state handler in state 4

Then, the rest of the time, we would get CR not done and it would fail faster. I don’t have logs regarding that because it is much more rare.

==========================================================================

So, with these information, our suspicion is timing change due to kernel upgrade. It seems to us the tegra-dc portion of the kernel is unchanged, so it has to be something else. Does anyone have any suggestions to resolve CR not done and CE not done issues? And no upgrading jetpack is not an option for us at the moment.

Thank you very much!

To isolate if the problem is related to seamless or not, please try only hotplug case after boot.

If even after boot hotplug triggers the error, then it has nothing to do with seamless.

Hey, thank you for the reply! The issue is not with seamless. We have seamless turned on and off, and whether was is on or off didn’t make a difference.

Moreover, restarting the xorg server (we use xorg) usually lights up the display. Restarting xorg will cause link training to happen again, and it seems that past the boot up phase, the link training success rate is higher.

any other ideas? thank you