Hello,
I have been running into an issue with the Jetson TX2. Here are the steps to reproduce it.
- The Jetson TX2 is off but physically connected to a monitor via HDMI and it is connected to my network via Ethernet.
- I power up the jetson and login via SSH. The monitor is still connected but I did not power it up.
- I power on the monitor (or select it’s HDMI input to the Jetson)
- My SSH session freezes. The Jetson no longer responds to pings and appears to be locked up.
- I hit the reset button on the Jetson and it comes back up and starts responding normally again (this time connected to my monitor)
The boot log shows the following:
qurrent pts/1 10.0.1.156 Thu Feb 20 22:21 - crash (00:02)
runlevel (to lvl 5) 4.4.38-tegra Thu Feb 20 22:19 - 22:25 (00:06)
nvidia ttyS0 Thu Feb 20 01:15 - crash (21:09)
nvidia tty7 :0 Thu Feb 20 01:15 - crash (21:09)
Other info
I have reproduced this issue on 2 separate Jetson TX2s with very different hardware connections, they both exhibit the exact same behavior.
Ubuntu 18.04 does not reproduce the behavior. Although I am running Ubuntu 16.04 on the Jetsons so it is possible it is a bug in 16.04.
An SSH session is not required to cause the issue. If I am NOT connected via SSH and only sending pings I will see the pings start timing out.
I have looked all over on Google to try to find similar issues and I have found very little information about this. I suspect that the way we are using the Jetson is not common and therefore untested. One post I found indicated that it could be released to Ubuntu booting in low graphics mode.
Any help on this would be great.
Thanks,
Jonathan L Clark
Software Architect
Qurrent
Is there anything unusual about the monitor? For example, adapters or a KVM?
FYI, a monitor is queried for its specs via an i2c protocol. The i2c itself is powered through the HDMI on the Jetson, and so regardless of having the monitor on or off this should succeed. I am wondering if the i2c/DDC wire is doing something odd at power on (which in turn something like a KVM could affect).
No and I have done this with 2 totally different monitors. One is a standard Dell HDMI monitor the other is a large weatherproof outdoor TV. Again, totally different monitors but the same behavior. I could try a few other monitors?
When ping still works, what is the output of this:
sudo -s
grep -H '.*' `find /sys -name 'edid'`
exit
Knowing if EDID is present or not would at least determine if the monitor query succeeded.
/sys/kernel/debug/tegradc.0/edid: 00 ff ff ff ff ff ff 00 10 ac 3c 41 4c 47 36 43
/sys/kernel/debug/tegradc.0/edid: 14 1c 01 03 80 37 1f 78 ee ee 95 a3 54 4c 99 26
/sys/kernel/debug/tegradc.0/edid: 0f 50 54 a5 4b 00 71 4f a9 40 81 80 d1 c0 01 01
/sys/kernel/debug/tegradc.0/edid: 01 01 01 01 01 01 56 5e 00 a0 a0 a0 29 50 30 20
/sys/kernel/debug/tegradc.0/edid: 35 00 29 37 21 00 00 1a 00 00 00 ff 00 30 57 47
/sys/kernel/debug/tegradc.0/edid: 32 4a 38 35 48 43 36 47 4c 0a 00 00 00 fc 00 44
/sys/kernel/debug/tegradc.0/edid: 45 4c 4c 20 55 32 35 31 38 44 0a 20 00 00 00 fd
/sys/kernel/debug/tegradc.0/edid: 00 38 4c 1e 5a 19 00 0a 20 20 20 20 20 20 01 f8
/sys/kernel/debug/tegradc.0/edid: 02 03 24 f1 4f 90 05 04 03 02 07 16 01 06 11 12
/sys/kernel/debug/tegradc.0/edid: 15 13 14 1f 23 09 1f 07 83 01 00 00 67 03 0c 00
/sys/kernel/debug/tegradc.0/edid: 10 00 00 32 02 3a 80 18 71 38 2d 40 58 2c 45 00
/sys/kernel/debug/tegradc.0/edid: 29 37 21 00 00 1e 7e 39 00 a0 80 38 1f 40 30 20
/sys/kernel/debug/tegradc.0/edid: 3a 00 29 37 21 00 00 1a 01 1d 00 72 51 d0 1e 20
/sys/kernel/debug/tegradc.0/edid: 6e 28 55 00 29 37 21 00 00 1e bf 16 00 a0 80 38
/sys/kernel/debug/tegradc.0/edid: 13 40 30 20 3a 00 29 37 21 00 00 1a 00 00 00 00
/sys/kernel/debug/tegradc.0/edid: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c6
For reference here is the hex EDID which can be input at http://www.edidreader.com:
00 ff ff ff ff ff ff 00 10 ac 3c 41 4c 47 36 43
14 1c 01 03 80 37 1f 78 ee ee 95 a3 54 4c 99 26
0f 50 54 a5 4b 00 71 4f a9 40 81 80 d1 c0 01 01
01 01 01 01 01 01 56 5e 00 a0 a0 a0 29 50 30 20
35 00 29 37 21 00 00 1a 00 00 00 ff 00 30 57 47
32 4a 38 35 48 43 36 47 4c 0a 00 00 00 fc 00 44
45 4c 4c 20 55 32 35 31 38 44 0a 20 00 00 00 fd
00 38 4c 1e 5a 19 00 0a 20 20 20 20 20 20 01 f8
02 03 24 f1 4f 90 05 04 03 02 07 16 01 06 11 12
15 13 14 1f 23 09 1f 07 83 01 00 00 67 03 0c 00
10 00 00 32 02 3a 80 18 71 38 2d 40 58 2c 45 00
29 37 21 00 00 1e 7e 39 00 a0 80 38 1f 40 30 20
3a 00 29 37 21 00 00 1a 01 1d 00 72 51 d0 1e 20
6e 28 55 00 29 37 21 00 00 1e bf 16 00 a0 80 38
13 40 30 20 3a 00 29 37 21 00 00 1a 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c6
The checksum is valid, and the i2c query must have succeeded. For that monitor 1920x1080@60Hz is one of the modes which the monitor accepts, and I know the display driver also accepts this, so initial connect of the monitor is validated as working as it should. Whatever is going depends on something else.
Do you have serial console available? See:
http://www.jetsonhacks.com/2017/03/24/serial-console-nvidia-jetson-tx2/
Serial console would allow you get log more of what is going on during the failure. Serial console typically survives failure conditions which ethernet and local video drivers cannot survive (plus makes logging convenient). If we can get an error message during the conditions causing the failure it might provide a good clue.
Also, do you see all “ok” from this?
sha1sum -c /etc/nv_tegra_release
/usr/lib/aarch64-linux-gnu/tegra/libnvosd.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmmlite_image.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvomx.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmedia.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmmlite_utils.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libglx.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libscf.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvexif.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvtx_helper.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvfnetstorehdfx.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmm_parser.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvrm.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmm_contentpipe.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvos.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvtnr.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvimp.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvfnet.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvavp.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmmlite_video.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvfnetstoredefog.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvodm_imager.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvjpeg.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvtvmr.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvidia-egl-wayland.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvdc.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libtegrav4l2.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmm.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvapputil.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvcameratools.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvcam_imageencoder.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnveglstream_camconsumer.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmm_utils.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvomxilclient.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libargus.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvwinsys.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libargus_socketclient.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnveglstreamproducer.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvll.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libargus_socketserver.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvrm_graphics.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvcolorutil.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvddk_2d_v2.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvtestresults.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvcamerautils.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvcamlog.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvparser.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvddk_vic.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmmlite.so: OK
/usr/lib/aarch64-linux-gnu/libv4l/plugins/libv4l2_nvvidconv.so: OK
/usr/lib/aarch64-linux-gnu/libv4l/plugins/libv4l2_nvvideocodec.so: OK
/usr/lib/xorg/modules/drivers/nvidia_drv.so: OK
/usr/lib/xorg/modules/extensions/libglx.so: OK
They are all okay. I will take a look at the serial console option. I believe I have all the hardware here for it I just need to get it put together.
A few new pieces of information today.
If I produce the problem by switching my monitor to the Jetson’s HDMI input I can actually get the Jetson to start responding again if I change the input on my monitor back to what it was before quickly enough.
I tried disconnecting all the components from the I/O ports on my Jetson but this had no effect.
I tried unplugging a USB hub from my Jetson and this had no effect.
I pulled up the Kernel log just after the issue and I noticed something interesting.
Feb 24 16:15:25 tegra-ubuntu kernel: [ 391.955106] tegradc 15210000.nvdisplay: blank - powerdown
Feb 24 16:15:25 tegra-ubuntu kernel: [ 392.021229] PD DISP2 index4 DOWN
Feb 24 16:15:25 tegra-ubuntu kernel: [ 392.021362] PD DISP1 index3 DOWN
Feb 24 16:15:25 tegra-ubuntu kernel: [ 392.021463] PD DISP0 index2 DOWN
I also noticed this in my Xorg.0.log
[ 72.900] (II) XINPUT: Adding extended input device “gpio-keys” (type: KEYBOARD, id 6)
[ 72.900] () Option “xkb_rules” “evdev”
[ 72.900] () Option “xkb_model” “pc105”
[ 72.900] (**) Option “xkb_layout” “us”
[ 77.576] (–) NVIDIA(GPU-0): DELL U2518D (DFP-0): connected
[ 77.576] (–) NVIDIA(GPU-0): DELL U2518D (DFP-0): External TMDS
[ 78.098] (II) NVIDIA(0): Setting mode “HDMI-0: nvidia-auto-select @1920x1080 +0+0 {ViewPortIn=1920x1080, ViewPortOut=1920x1080+0+0}”
[ 79.061] (–) NVIDIA(GPU-0): DELL U2518D (DFP-0): connected
[ 79.062] (–) NVIDIA(GPU-0): DELL U2518D (DFP-0): External TMDS
[ 79.949] (–) NVIDIA(GPU-0): DELL U2518D (DFP-0): connected
[ 79.950] (–) NVIDIA(GPU-0): DELL U2518D (DFP-0): External TMDS
I am going to do some research on this myself. Does this mean anything?
It kind of looks like a power saving mode (typical of inactivity), but I am not positive.
HDMI itself is hot plug, and so sometimes things will reset and start working if the HDMI is disconnected and reconnected. However, the problem of HDMI failing is quite different than the entire system locking up. It isn’t unheard of that some sort of power saving mode would cause issues, but ssh failing at the same moment tends to say this is a true lockup. Are you now using serial console, and does serial console also lock up? Does unplugging and replugging the HDMI allow the system to keep going?
Unplugging and plugging in the HDMI causes similar behavior.
I had continual pings going from my PC to the Jetson as well as an active SSH session
When I unplugged the HDMI from the Jetson nothing happens.
When I plugged it back in I lost the SSH session and my pings timed out
It never recovered.
Right now I don’t have a serial terminal. I thought I had the hardware but I am not sure. I have a couple USB to serial cables (DB9) as well as breakout boards for the DB9 cables. But the video I watched that shows how to connect to the Jetson’s serial port had them using a different cable. I am not sure which pins on the DB9 go to the Jetson. I am not sure if I am even have the correct cable.
The DB9 version probably uses the wrong signal level. The serial UART pins on the board require 3.3V TTL logic level, and if the DB9 uses higher voltages it could actually damage the hardware. It is worth getting the serial cable when developing since it stays up and provides useful information when many other parts of the system go down. Having a more specific question during the failure would be very useful, and it seems ssh dies before you get to see the error.