Ethernet communication problem on bootup with Tx1

Hi

I am having a problem with Jetson Tx1 with ubuntu 16.04, jetpack version 3.3, and kernel version is 4.4.38. I have two devices(camera) connected, One is directly connected to PCI bridge: NVIDIA Corporation Device 0fae (rev a1), and other with Ethernet controller: Intel Corporation 82574L Gigabit Network Connection. Often when the device boots up, it does not show the devices connected neither I am able to ping the devices, networking restart or ifup down does not help.

-> Running the ip addr shows the eth0 and eth1 as UP
-> Running the ethtool with eth0 and eth1 shows the Link Detected : Yes

The only way to solve is to reboot the device. After rebooting the board the devices shows up in arp table and I am able to ping the devices.

The syslog can be found here

syslog1: https://drive.google.com/file/d/1i2ktuAKzBOYKfTrZo_kQJOrfYOxvflFT/view?usp=sharing
syslog2: https://drive.google.com/file/d/1iXL2xsSu9-fqOFH_EQQakfmoEbkgaHGv/view?usp=sharing

Which release are you using? See “head -n 1 /etc/nv_tegra_release”. Also, can you verify that all shows ok with “sha1sum -c /etc/nv_tegra_release”? Is this the standard dev kit carrier board?

I see “grabserial” and “rsyslog” errors. I am guessing rsyslog is set to log to a remote system and is a victim of the failure, but I’m wondering if grabserial is similar…is it simply showing error because it logs to a remote machine?

Can you post the output from “route” and “ifconfig” prior to a failure, and then again after a failure (which I suppose isn’t possible if ssh is your only access)?

The outputs are as follows:

$sha1sum -c /etc/nv_tegra_release
/usr/lib/xorg/modules/drivers/nvidia_drv.so: OK
/usr/lib/xorg/modules/extensions/libglx.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libargus_socketserver.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmedia.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvfnetstoredefog.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvdc.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libtegrav4l2.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmmlite_video.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmm_contentpipe.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmm.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnveglstreamproducer.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libargus_socketclient.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvtvmr.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libglx.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmm_parser.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvfnet.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvwinsys.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvomxilclient.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvddk_vic.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmmlite_utils.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvodm_imager.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvcolorutil.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libscf.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvrm_graphics.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmm_utils.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvcamerautils.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvparser.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvimp.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvos.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvddk_2d_v2.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmmlite_image.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnveglstream_camconsumer.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvfnetstorehdfx.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvtx_helper.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvexif.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvcamlog.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvtnr.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvrm.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvosd.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvjpeg.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvtestresults.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvavp.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libargus.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvcam_imageencoder.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvapputil.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvll.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvcameratools.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvomx.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmmlite.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvidia-egl-wayland.so: OK
/usr/lib/aarch64-linux-gnu/libv4l/plugins/libv4l2_nvvideocodec.so: OK
/usr/lib/aarch64-linux-gnu/libv4l/plugins/libv4l2_nvvidconv.so: OK
$head -n 1 /etc/nv_tegra_release
# R28 (release), REVISION: 2.0, GCID: 10567845, BOARD: t210ref, EABI: aarch64, DATE: Fri Mar  2 04:58:16 UTC 2018

I am using the Astro Carrier for Nvidia Jetson Tx1/Tx2
http://connecttech.com/product/astro-carrier-for-nvidia-jetson-tx2-tx1/ board.

On success case

$ route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         *               0.0.0.0         UG    0      0        0 wlan0
10.1.0.0        *               255.255.252.0   U     0      0        0 wlan0
10.3.0.0        *               255.255.0.0     U     0      0        0 eth0
10.3.0.0        *               255.255.0.0     U     0      0        0 eth1
10.3.1.91       *               255.255.255.255 UH    0      0        0 eth0
10.3.2.92       *               255.255.255.255 UH    0      0        0 eth1
link-local      *               255.255.0.0     U     1000   0        0 eth0
172.17.0.0      *               255.255.0.0     U     0      0        0 docker0
192.168.55.0    *               255.255.255.0   U     0      0        0 l4tbr0
$ ifconfig 
docker0   Link encap:Ethernet  HWaddr 02:42:1c:a3:b5:98  
          inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

eth0      Link encap:Ethernet  HWaddr 00:0c:8b:90:09:fe  
          inet addr:10.3.1.1  Bcast:10.3.255.255  Mask:255.255.0.0
          inet6 addr: fe80::20c:8bff:fe90:9fe/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:935720 errors:0 dropped:0 overruns:0 frame:0
          TX packets:85165 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1299212257 (1.2 GB)  TX bytes:8161043 (8.1 MB)
          Interrupt:112 Memory:13000000-13020000 

eth1      Link encap:Ethernet  HWaddr 00:04:4b:c0:5b:2e  
          inet addr:10.3.2.1  Bcast:10.3.255.255  Mask:255.255.0.0
          inet6 addr: fe80::204:4bff:fec0:5b2e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:112732544 errors:0 dropped:0 overruns:0 frame:0
          TX packets:414299 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:167177151704 (167.1 GB)  TX bytes:26540157 (26.5 MB)

On Failure case:

$route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         *               0.0.0.0         UG    0      0        0 wlan0
10.1.0.0        *               255.255.252.0   U     0      0        0 wlan0
10.3.0.0        *               255.255.0.0     U     0      0        0 eth1
10.3.0.0        *               255.255.0.0     U     0      0        0 eth0
10.3.1.91       *               255.255.255.255 UH    0      0        0 eth0
10.3.2.92       *               255.255.255.255 UH    0      0        0 eth1
link-local      *               255.255.0.0     U     1000   0        0 l4tbr0
172.17.0.0      *               255.255.0.0     U     0      0        0 docker0
192.168.55.0    *               255.255.255.0   U     0      0        0 l4tbr0

$ ifconfig
docker0   Link encap:Ethernet  HWaddr 02:42:e6:b1:ff:d8  
          inet addr:172.17.0.1  Bcast:172.17.255.255  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

eth0      Link encap:Ethernet  HWaddr 00:04:4b:c0:5b:2e  
          inet addr:10.3.1.1  Bcast:10.3.255.255  Mask:255.255.0.0
          inet6 addr: fe80::204:4bff:fec0:5b2e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1505 errors:0 dropped:0 overruns:0 frame:0
          TX packets:58 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:448252 (448.2 KB)  TX bytes:6710 (6.7 KB)

eth1      Link encap:Ethernet  HWaddr 00:0c:8b:90:09:fe  
          inet addr:10.3.2.1  Bcast:10.3.255.255  Mask:255.255.0.0
          inet6 addr: fe80::20c:8bff:fe90:9fe/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7422 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4596 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1000117 (1.0 MB)  TX bytes:306807 (306.8 KB)
          Interrupt:112 Memory:13000000-13020000

grabserial is a script which takes the data from a serial port and saves locally. It should not be related to this issue.

While the route table shows the IP for devices, But arp table does not show any such device.

First, is your “uname -r” “4.4.38”? Or is it “4.4.38-tegra”? If not the latter, then the kernel has been changed and the modules need a new location (any module based feature would be missing if you didn’t rebuild all modules).

Second, alternate carrier board demand a different board support package…mainly a device tree difference.

Next, if you used a clone of any unit for flash we’ll need to know. In part because a clone used during flash must come from the exact same L4T release as it was originally created for (so for example you can’t flash an R28.1 clone into an R28.2 release). In some cases a clone may also hard code network setup related to the MAC address of the original unit, and thus a clone which does this (most clones don’t) would need the MAC address edited in “/etc”. Was any cloning involved?

Btw, it will be easier to read logs if you edit the post (the “pencil” icon shows up for editing if you hover your mouse over the quote icon in the upper right), highlight the log or output, then click on the “code” icon (looks like “</>”).

I ask about grabserial because some people mistakenly use the serial console port for a free UART.

About Current Settings…

This is not valid in route:

10.3.0.0        *               255.255.0.0     U     0      0        0 eth0
10.3.0.0        *               255.255.0.0     U     0      0        0 eth1

You have two network cards with the same subnet and same netmask. The two cards and their drivers will fight for which one should be used. I would expect that somewhere in the middle of any working connection that the drivers might suddenly get mixed up and traffic might get split across two NICs in such a way that it locks up.

Looking at ifconfig, this conflict is verified, and just gets worse:

eth0      Link encap:Ethernet  HWaddr 00:04:4b:c0:5b:2e  
          inet addr:<u><i><b>10.3.1.1</b></i></u>  Bcast:10.3.255.255  Mask:255.255.0.0
...

eth1      Link encap:Ethernet  HWaddr 00:0c:8b:90:09:fe  
          inet addr:<u><i><b>10.3.2.1</b></i></u>  Bcast:10.3.255.255  Mask:255.255.0.0

Not only do their subnets and netmasks collide, the two are the same IP address.

The two NICs on a single host require separate addresses (and in fact no two NICs on any network should have the same address), and their combination of subnet and netmask must be unique (route won’t behave correctly when the two overlap). Until traffic is generated such that both drivers try to deal with the traffic you won’t see a failure…as soon as there is traffic with a conflict the system will fail.

The kernel version is

$ uname -r 
4.4.38-tegra

I am using the Jetpack Version 3.3 with L4T release R28.2. I have also flashed the same Image in different board and verified that MAC address assigned are different.

Next the IP add assigned to eth0 is 10.3.1.1 and for eth1 it is 10.3.2.1 are different. As per your suggestion I changed the netmask for the eth1 to avoid any collision. The new configuration outputs are as follows

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         *               0.0.0.0         UG    0      0        0 wlan0
10.1.0.0        *               255.255.252.0   U     0      0        0 wlan0
10.3.0.0        *               255.255.0.0     U     0      0        0 eth0
10.3.1.91       *               255.255.255.255 UH    0      0        0 eth0
10.3.2.0        *               255.255.255.0   U     0      0        0 eth1
10.3.2.92       *               255.255.255.255 UH    0      0        0 eth1
link-local      *               255.255.0.0     U     1000   0        0 l4tbr0
172.17.0.0      *               255.255.0.0     U     0      0        0 docker0
192.168.55.0    *               255.255.255.0   U     0      0        0 l4tbr0
$ ifconfig
eth0      Link encap:Ethernet  HWaddr 00:04:4b:c0:5b:2e  
          inet addr:10.3.1.1  Bcast:10.3.255.255  Mask:255.255.0.0
          inet6 addr: fe80::204:4bff:fec0:5b2e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:764 errors:0 dropped:0 overruns:0 frame:0
          TX packets:69 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:227434 (227.4 KB)  TX bytes:7276 (7.2 KB)

eth1      Link encap:Ethernet  HWaddr 00:0c:8b:90:09:fe  
          inet addr:10.3.2.1  Bcast:10.3.2.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:8bff:fe90:9fe/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2908 errors:0 dropped:0 overruns:0 frame:0
          TX packets:56 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:367468 (367.4 KB)  TX bytes:6516 (6.5 KB)
          Interrupt:112 Memory:13000000-13020000

Now the both IPs and netmask are different in the route table, still I was getting the same problem on bootup. On failure case the routing table and ifconfig outputs are same as above but I am not able to ping to the connected devices.

I am curious, what do you get from “ip route” (should essentially be the same output as “route”, but from a newer application)? Route shows two lines for each interface…is this host set up as a DNS server? What is the specific use of addresses 10.3.1.91 and 10.3.2.92?

What are the camera IP addresses? Is each NIC going through a separate switch, or through the same switch? How are the camera IP addresses set?

Unless you have some special purpose, I would also wonder about this assignment overlap:

10.3.0.0/255.255.0.0   (10.3.0.0/16)
# This subnet is inside of the previous subnet...which one is supposed to respond when in the "/24"?
10.3.2.0/255.255.255.0 (10.3.2.0/24)

Could you change netmasks to non-overlapping subnets? Or else to make the smaller subnet a higher priority (lower metric)? I’m not sure of how the two networks are supposed to be used so it is hard to say anything useful. If the two are to overlap, then I would think one needs a higher metric (they are both 0 and thus both the same priority…which doesn’t work well…the operating system won’t know which interface to use within that overlap).

following is the output of ip route:

$ip route
10.1.0.0/22 dev wlan0  proto kernel  scope link  src 10.1.0.224 
10.3.0.0/16 dev eth0  proto kernel  scope link  src 10.3.1.1 
10.3.1.91 dev eth0  scope link 
10.3.2.0/24 dev eth1  proto kernel  scope link  src 10.3.2.1 
10.3.2.92 dev eth1  scope link 
169.254.0.0/16 dev l4tbr0  scope link  metric 1000 linkdown 
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 linkdown 
192.168.55.0/24 dev l4tbr0  proto kernel  scope link  src 192.168.55.1 linkdown

Here is my interface config file to setup the route table which should answer most of your queries.

source-directory /etc/network/interfaces.d
 
# The loopback network interface
auto lo
iface lo inet loopback
 
auto eth0
iface eth0 inet static
address 10.3.1.1
netmask 255.255.0.0
post-up ip route add 10.3.1.91 dev $IFACE
 
auto eth1
iface eth1 inet static
address 10.3.2.1
netmask 255.255.255.0
post-up ip route add 10.3.2.92 dev $IFACE																															
 
auto wlan0
iface wlan0 inet dhcp
wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf

10.3.1.91 and 10.3.2.92 are the IPs for camera at eth0 and eth1 respectively.
One camera is directly connected to PCI bridge: NVIDIA Corporation Device 0fae (rev a1) that is eth0,
and other with a switch Ethernet controller: Intel Corporation 82574L Gigabit Network Connection which is eth1.

These network/netmask combinations overlap:

10.3.0.0/16
10.3.2.0/24

Would it be possible for you to turn the “10.3.0.0/16” into a “10.3.0.0/24” (or to “10.3.1.0/24, which is probably a better edit) in order to avoid overlap? Depending on the situation this is just trouble looking for a place to happen (you might actually want overlapping network/netmask if doing some sort of redundant failover scheme, but the metric would differ and the two would cover the exact same network). This would mean the camera at “10.3.1.91” would need its address changed to somewhere inside of “10.3.0.0/24” (however, if you change only the netmask to “255.255.255.0”, as explained below, then the camera could continue to use this address).

I see something which sticks out as odd:

auto <b>eth0</b>
iface eth0 inet static
address 10.3.<i><u><b>1</b></u></i>.1
netmask 255.255.0.0

…but route is:

10.3.0.0/16

…what you’ve tried to assign as “10.3.1.1/16” is being calculated differently than what you’ve tried to do (the system is reorganizing an invalid combination). If you changed the netmask to “255.255.255.0”, then this automated edit to what you tried to assign wouldn’t occur.

By what method the camera gets its IP address assigned? Is there a DHCP server? Are the cameras statically assigned? I’m only curious because I’m wondering if there is a router involved. I’m also curious if anything exists in the “10.0.0.0/24” address range…if not, then you are good to go just by editing the netmask of eth0.

I changed the configuration for eth0 to change the netmask to 255.255.255.0 as per your suggestion but still was getting the problem.
Please find the effect of changes below:

$ route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         *               0.0.0.0         UG    0      0        0 wlan0
10.1.0.0        *               255.255.252.0   U     0      0        0 wlan0
10.3.1.0        *               255.255.255.0   U     0      0        0 eth0
10.3.1.91       *               255.255.255.255 UH    0      0        0 eth0
10.3.2.0        *               255.255.255.0   U     0      0        0 eth1
10.3.2.92       *               255.255.255.255 UH    0      0        0 eth1
link-local      *               255.255.0.0     U     1000   0        0 l4tbr0
172.17.0.0      *               255.255.0.0     U     0      0        0 docker0
192.168.55.0    *               255.255.255.0   U     0      0        0 l4tbr0
$ifconfig
eth0      Link encap:Ethernet  HWaddr 00:04:4b:c0:5b:2e  
          inet addr:10.3.1.1  Bcast:10.3.1.255  Mask:255.255.255.0
          inet6 addr: fe80::204:4bff:fec0:5b2e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:297 errors:0 dropped:0 overruns:0 frame:0
          TX packets:59 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:88268 (88.2 KB)  TX bytes:6488 (6.4 KB)

eth1      Link encap:Ethernet  HWaddr 00:0c:8b:90:09:fe  
          inet addr:10.3.2.1  Bcast:10.3.2.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:8bff:fe90:9fe/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1275 errors:0 dropped:0 overruns:0 frame:0
          TX packets:51 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:151572 (151.5 KB)  TX bytes:6052 (6.0 KB)
          Interrupt:112 Memory:13000000-13020000
$ip route
10.1.0.0/22 dev wlan0  proto kernel  scope link  src 10.1.0.224 
10.3.1.0/24 dev eth0  proto kernel  scope link  src 10.3.1.1 
10.3.1.91 dev eth0  scope link 
10.3.2.0/24 dev eth1  proto kernel  scope link  src 10.3.2.1 
10.3.2.92 dev eth1  scope link 
169.254.0.0/16 dev l4tbr0  scope link  metric 1000 linkdown 
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 linkdown 
192.168.55.0/24 dev l4tbr0  proto kernel  scope link  src 192.168.55.1 linkdown

Further the cameras are statically assigned. Below are the configuration of cameras after the netmask have been changed as per you suggestion(Previously the netmask was 255.255.0.0).

+--------+----------+----------++------------------+-----------------+-----------++----------------+
| Device | DeviceID | CameraID ||    IP address    |   subnet mask   | Interface ||   Serial No.   |
+========+==========+==========++==================+=================+===========++================+
|   1    |   1001   |      2   ||        10.3.1.91 |     255.255.255.0 |    eth0   ||    4103334036  |
+--------+----------+----------++------------------+-----------------+-----------++----------------+
|   2    |   1002   |      1   ||        10.3.2.92 |     255.255.255.0 |    eth1   ||    4103290972  |
+--------+----------+----------++------------------+-----------------+-----------++----------------+

There is no router involved in between the connection as I have already mentioned, one camera is connected directly to PCI bridge: NVIDIA Corporation Device 0fae (rev a1) that is eth0, and the second one is via switch Ethernet controller: Intel Corporation 82574L Gigabit Network Connection which is eth1.

Your route now looks valid (no overlaps with equal metric).

About the PCIe device…the actual card in the PCIe slot would be an ethernet card (ifconfig says so since it lists eth1), but correct me if I’m wrong. The camera on the PCIe add-on card would just be an ethernet camera in this case, and the other camera on ethernet on the integrated ethernet (let me know if I am wrong).

How often does boot work and the device show up, versus how often do they fail?

How do the ethernet cameras get power, and is the power up before the Jetson boots?

How often is the add-on PCIe ethernet the failure, versus how often is the integrated ethernet the issue? Or both at once?

What software (or command) tells you first that the camera is missing?

If you run “lspci” when things work, verify which entry is the PCIe ethernet card. If the networking disappears for that device, verify if the network is otherwise present and only the camera is missing, versus the lspci entry missing.

One Camera is connected to a hub which is connected directly to PCIe Ethernet card, while the second one is by integrated ethernet.

  1. It is quite random actuallly, Can’t really point the success/failure Numbers.
  2. Cameras are powered with same power supply as jetson. When tx1 boots up, camera driver initialises the cameras.
  3. Both of them fails together.
  4. Cameras driver utility has a command to see available cameras. Also arp table shows the IPs of camera in success case while, in failure case it is missing.

The output of lspci in both the cases is the same

$ lspci
00:01.0 PCI bridge: NVIDIA Corporation Device 0fae (rev a1)
01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

Does the TX1 have connection to the outside world? If you were to find the cameras missing, would it be possible to see if ping to some outside host also fails?

When it fails see what "traceroute " shows (test to see traceroute works when the cameras do work because not all devices support ping/ICMP and it might not be a valid test).

Regarding the timing of the driver checking for the camera, is this running from “/etc/rc.local”? Or is it being run manually? Is it possible for you to delay running by about 30 second?

When it fails, I had tried and connected and external device(laptop assigned a static IP: 10.3.2.5/255.255.255.0) but ping to it still fails, while In success case I am able to ping the same device as well as cameras.

I have tried both ping and traceroute method and both of them replies in case of success while host-unreachable in fail case.

There’s a shell script which runs the camera driver at bootup. I can try to give a delay before running the driver and post the update.

Do addresses on both the PCIe and integrated NICs fail, or does just one fail (I’m guessing both, but you would have had to have configured your laptop with a static address for one subnet, and then for another subnet…I don’t know if you tried two addresses)? Can you carefully compare the output of “route” and “ifconfig” in the failed case and working case and see if there is something subtly different? Obviously it isn’t the camera itself which is at fault, but the network failure seems to be more of a software issue and not so much a hardware issue.

Hi,

So after lot of testing I found that OS switches the Interface configuration of ETH0 / ETH1 randomly during boot-up. On looking I found the /etc/udev/rules.d/70-persistent-net.rules missing and so does the /lib/udev/write_net_rules which auto-generates the former file. I have manually created the 70-persistent-net.rules file and the issue is no longer occurring.