I am currently using a TX2 board, and planning on setting it up along with a framos adapter and a few cameras for a project. I have been running into an issue since I got this board though. While the board is connected to my company’s internet network, it crashes multiple times a day, seemingly at random, from 2 minutes after a previous crash, up to several hours later.
Through the use of minicom, I have been able to capture a snapshot of the crash as soon as it had happened, through a serial interface. The crashes stopped immediately as I removed the ethernet cable.
Crash.xcf (701.6 KB)
My setup is just a TX2 board, with a framos FPA-4.A adapter, along with a framos fsm-imx530 sensor module, and finally with a Schneider Kreuznach lens. Usually connected to it are an HDMI Display, a USB hub for a keyboard and mouse, an ethernet connection, and the power adapter.
The board was previously flashed manually with an older version of Jetpack, but has recently been flashed to latest with the Nvidia SDK manager in hopes that the crashes would be fixed. It’s worth mentioning that the TX2i that we have been testing with does not experience the same issues.
Rather than a screenshot you should probably provide a full serial console log from boot and up to (and including) the actual crash. Serial console can keep a log on a host PC so this will not be harmed by the crash and will show not only the crash, but what leads up to the crash. See:
http://www.jetsonhacks.com/2017/03/24/serial-console-nvidia-jetson-tx2/
You might also add what the output of “lsmod
” is prior to the crash.
Sorry for the delay, covid issues arose and I couldn’t be in the office to test.
As I was putting together this reply, a second crash had happened, so the second part of it should not contain any scripts run by me.
minicom.cap (221.3 KB)
Output of lsmod:
Module Size Used by
bnep 18950 2
xt_conntrack 3979 1
ipt_MASQUERADE 2570 1
nf_nat_masquerade_ipv4 3993 1 ipt_MASQUERADE
nf_conntrack_netlink 33032 0
nfnetlink 9716 2 nf_conntrack_netlink
xt_addrtype 3915 2
iptable_filter 3008 1
iptable_nat 3423 1
nf_conntrack_ipv4 14158 2
nf_defrag_ipv4 2129 1 nf_conntrack_ipv4
nf_nat_ipv4 8176 1 iptable_nat
nf_nat 25020 2 nf_nat_masquerade_ipv4,nf_nat_ipv4
nf_conntrack 131705 6 nf_conntrack_ipv4,nf_conntrack_netlink,nf_nat_masquerade_ipv4,xt_conntrack,nf_nat_ipv4,nf_nat
br_netfilter 17460 0
zram 29369 4
overlay 52649 0
bcmdhd 979535 0
cfg80211 697380 1 bcmdhd
spidev 14571 0
userspace_alert 6697 0
nvgpu 1720761 20
bluedroid_pm 16123 0
ip_tables 21475 2 iptable_filter,iptable_nat
x_tables 38016 5 ip_tables,iptable_filter,ipt_MASQUERADE,xt_addrtype,xt_conntrack
Hi,
Is this case always accompanied with these i2c error before the crash happened? You can checked the error log you just pasted, and you shall see lots of i2c error there.
The i2c errors are unrelated to the crash, those are only there due to the EEPROM read/writes that I am doing as I am troubleshooting another issue, specifically about my project.
I think we may need you to do more test and see what is the scenario to hit this error.
First, please provide information
-
Is this a TX2 devkit? Or custom board?
-
Which jetpack release are you using?
-
Is it possible to test TX2 in a environment that is not your company’s internet environment?
-
Does every TX2 you have all suffer this error?
This is indeed a TX2 devkit. On the camera connector, there is a framos FPA-4.A module, along with a framos IMX530 Image Sensor Module, and then followed by the lens we use. Even after removing those, the crashes continue, and they are necessary for our project.
I am using Jetpack 4.5.1, the latest one through the Nvidia SDK Manager.
I have tried leaving the board overnight in two scenarios: No ethernet/WiFi connection; Ethernet connection to my work computer (the one used to flash it even); In neither case did it crash.
No. I have not tested with any other TX2’s personally, but my coworker with a TX2i is not running into any issues regarding crashes.
I think you didn’t try any switch or hub case here right? Looks like you only have a direct connection with your work computer.
No. I have not tested with any other TX2’s personally,
If you have other tx2, please also try. thanks.
It is connected to a network switch. Needed spare ethernet ports than just the one on the wall. Do you need information of that as well?
I wanted to first see if it could be solved on its own, before using another board. Due to other issues as well, I think it will be necessary anyway. Can you at least answer the first question here before I do that?
I think the most important thing I want to know if whether this issue happens to other TX2 or not.
If you take your error log to search over this forum, you will find out that there are almost 0 case as yours.
Thus, if other TX2 all have this issue in your company’s ethernet environment, then maybe we need to dump packets/traffic or add some print to the kernel and let you to debug since you are the only one that can reproduce this issue.
I will attempt this now, then, but will likely be able to reply on the matter in a few hours or only tomorrow
I am curious, since this is related to ethernet buffer issues, and because MAC address might be related to EEPROM, prior to this, can you see the MAC address without a crash from “ifconfig
”?
1 Like
Some notes for what linuxdev is talking about.
The MAC addr for the native ethernet interface is read from the EEPROM. If somehow the board not able to read the eeprom through i2c, then the mac addr would be gone and the driver will just give you a random one.
As I was writing this message, I had just observed the TX2i also crash.
I was unable to get serial console output throughout the night on my coworker’s TX2i, but what seems worrying to me is the output of last reboot
, which is:
thanh@thanh-desktop:~$ last reboot
reboot system boot 4.9.201-tegra Tue Aug 3 10:14 still running
reboot system boot 4.9.201-tegra Tue Aug 3 10:05 still running
reboot system boot 4.9.201-tegra Tue Aug 3 09:59 still running
reboot system boot 4.9.201-tegra Tue Aug 3 09:23 still running
reboot system boot 4.9.201-tegra Tue Aug 3 00:06 still running
reboot system boot 4.9.201-tegra Mon Aug 2 17:04 still running
wtmp begins Mon Aug 2 17:04:05 2021
I will be monitoring serial output and update whenever possible.
ifconfig
output:
thanh@thanh-desktop:~$ ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:26:d9:e7:f3 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1280
inet 172.16.9.128 netmask 255.255.252.0 broadcast 172.16.11.255
inet6 fe80::b545:700:ad56:fdc0 prefixlen 64 scopeid 0x20<link>
ether 00:04:4b:f8:47:9c txqueuelen 1000 (Ethernet)
RX packets 22573 bytes 21125537 (21.1 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 11597 bytes 1297350 (1.2 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 41
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1 (Local Loopback)
RX packets 404 bytes 31314 (31.3 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 404 bytes 31314 (31.3 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
rndis0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether b6:3e:24:7c:4a:d1 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
usb0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether b6:3e:24:7c:4a:d3 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
wlan0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether 00:04:4b:f8:47:9a txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
tx2i.cap (41.5 KB)
Attached is the TX2i serial log. Good timing.
Is there any application that is using ethernet when the error happened? For example, streaming.
Or just put it idle can hit this issue?
Not that I am aware of. The board was idle throughout the crash, and throughout the entire night as well. I had only just used chrome briefly to check what the command to see the last few reboots is.
I am little bit confused by the test you’ve tried yesterday, you said “it is connected on a switch”. Do you mean a local network that only has switch here and no connected to the ethernet port in your office?
host <-> switch <-> TX2
Our current setup is:
internal network <-> switch <-> TX2, work laptop, another coworker’s computer
Removing the switch from the equation would be difficult for me and the others.
That “internal network” means your office network environment, right?
Can you just bring your TX2 to other environment like your home and use the switch at home to see if this issue also happens? I guess this should not happen. Just want to prove that this is really related to office network environment.