Ethernet connexion missing after Cuda installation failure.

Ubuntu 16.04LTS, Jetson TX2, Jetpack 3.
Computer and Jetson connected to a hub, connected to internet.

After downloading Jetpack 3, I tried to flash the Jetson and install Cuda 8.
I tried to install using two different computers (wifi turned off).
I wasn’t able to install Cuda, I had an error.
For both computers, after the failure during the installation, the ethernet connexion doesn’t work anymore.
I checked and compared with another computer the /etc/network/interfaces and /etc/NetworkManager/NetworkManager.conf files.

I tried these commands to figure out what was wrong.


$ ifconfig eth0

eth0 Link encap:Ethernet HWaddr 28:f1:0e:48:52:d2
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:241328 errors:0 dropped:885 overruns:0 frame:0
TX packets:4512 errors:34 dropped:0 overruns:0 carrier:34
collisions:541 txqueuelen:1000
RX bytes:24081526 (24.0 MB) TX bytes:1106325 (1.1 MB)
Interrupt:16 Memory:dd400000-dd420000

$ cat /etc/network/interfaces

interfaces(5) file used by ifup(8) and ifdown(8)

auto lo
iface lo inet loopback

$ cat /etc/NetworkManager/NetworkManager.conf

[main]
plugins=ifupdown,keyfile,ofono
dns=dnsmasq

[ifupdown]
managed=false

$ journalctl -f

Jul 07 18:35:05 toromachine dhclient[22025]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 4 (xid=0x6a052979)
Jul 07 18:35:09 toromachine dhclient[22025]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 10 (xid=0x6a052979)
Jul 07 18:35:19 toromachine dhclient[22025]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 18 (xid=0x6a052979)
Jul 07 18:35:37 toromachine dhclient[22025]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 12 (xid=0x6a052979)
Jul 07 18:35:47 toromachine NetworkManager[19050]: [1499477747.7752] dhcp4 (eth0): request timed out
Jul 07 18:35:47 toromachine NetworkManager[19050]: [1499477747.7753] dhcp4 (eth0): state changed unknown -> timeout
Jul 07 18:35:47 toromachine NetworkManager[19050]: [1499477747.8079] dhcp4 (eth0): canceled DHCP transaction, DHCP client pid 22025
Jul 07 18:35:47 toromachine NetworkManager[19050]: [1499477747.8079] dhcp4 (eth0): state changed timeout -> done
Jul 07 18:35:47 toromachine NetworkManager[19050]: [1499477747.8086] device (eth0): state change: ip-config -> failed (reason ‘ip-config-unavailable’) [70 120 5]
Jul 07 18:35:47 toromachine NetworkManager[19050]: [1499477747.8091] manager: NetworkManager state is now DISCONNECTED
Jul 07 18:35:47 toromachine NetworkManager[19050]: [1499477747.8098] device (eth0): Activation: failed for connection ‘eth0’
Jul 07 18:35:47 toromachine NetworkManager[19050]: [1499477747.8113] device (eth0): state change: failed -> disconnected (reason ‘none’) [120 30 0]
Jul 07 18:35:47 toromachine NetworkManager[19050]: [1499477747.8151] policy: auto-activating connection ‘eth0’
Jul 07 18:35:47 toromachine NetworkManager[19050]: [1499477747.8166] device (eth0): Activation: starting connection ‘eth0’ (e7361020-1796-4583-aab9-9d479aac94fe)
Jul 07 18:35:47 toromachine NetworkManager[19050]: [1499477747.8167] device (eth0): state change: disconnected -> prepare (reason ‘none’) [30 40 0]
Jul 07 18:35:47 toromachine NetworkManager[19050]: [1499477747.8168] manager: NetworkManager state is now CONNECTING
Jul 07 18:35:47 toromachine NetworkManager[19050]: [1499477747.8170] device (eth0): state change: prepare -> config (reason ‘none’) [40 50 0]
Jul 07 18:35:47 toromachine NetworkManager[19050]: [1499477747.8180] device (eth0): state change: config -> ip-config (reason ‘none’) [50 70 0]
Jul 07 18:35:47 toromachine NetworkManager[19050]: [1499477747.8182] dhcp4 (eth0): activation: beginning transaction (timeout in 45 seconds)
Jul 07 18:35:47 toromachine NetworkManager[19050]: [1499477747.8206] dhcp4 (eth0): dhclient started with pid 22043
Jul 07 18:35:47 toromachine dhclient[22043]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3 (xid=0x15238503)
Jul 07 18:35:50 toromachine dhclient[22043]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 5 (xid=0x15238503)
Jul 07 18:35:55 toromachine dhclient[22043]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 9 (xid=0x15238503)
Jul 07 18:36:04 toromachine dhclient[22043]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 11 (xid=0x15238503)
Jul 07 18:36:15 toromachine dhclient[22043]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 20 (xid=0x15238503)
Jul 07 18:36:32 toromachine NetworkManager[19050]: [1499477792.7707] dhcp4 (eth0): request timed out
Jul 07 18:36:32 toromachine NetworkManager[19050]: [1499477792.7708] dhcp4 (eth0): state changed unknown -> timeout
Jul 07 18:36:32 toromachine NetworkManager[19050]: [1499477792.7874] dhcp4 (eth0): canceled DHCP transaction, DHCP client pid 22043
Jul 07 18:36:32 toromachine NetworkManager[19050]: [1499477792.7875] dhcp4 (eth0): state changed timeout -> done
Jul 07 18:36:32 toromachine NetworkManager[19050]: [1499477792.7881] device (eth0): state change: ip-config -> failed (reason ‘ip-config-unavailable’) [70 120 5]
Jul 07 18:36:32 toromachine NetworkManager[19050]: [1499477792.7885] manager: NetworkManager state is now DISCONNECTED
Jul 07 18:36:32 toromachine NetworkManager[19050]: [1499477792.7891] device (eth0): Activation: failed for connection ‘eth0’
Jul 07 18:36:32 toromachine NetworkManager[19050]: [1499477792.7902] device (eth0): state change: failed -> disconnected (reason ‘none’) [120 30 0]
Jul 07 18:36:32 toromachine NetworkManager[19050]: [1499477792.7937] policy: auto-activating connection ‘eth0’
Jul 07 18:36:32 toromachine NetworkManager[19050]: [1499477792.7945] device (eth0): Activation: starting connection ‘eth0’ (e7361020-1796-4583-aab9-9d479aac94fe)
Jul 07 18:36:32 toromachine NetworkManager[19050]: [1499477792.7954] device (eth0): state change: disconnected -> prepare (reason ‘none’) [30 40 0]
Jul 07 18:36:32 toromachine NetworkManager[19050]: [1499477792.7955] manager: NetworkManager state is now CONNECTING
Jul 07 18:36:32 toromachine NetworkManager[19050]: [1499477792.7958] device (eth0): state change: prepare -> config (reason ‘none’) [40 50 0]
Jul 07 18:36:32 toromachine NetworkManager[19050]: [1499477792.7969] device (eth0): state change: config -> ip-config (reason ‘none’) [50 70 0]
Jul 07 18:36:32 toromachine NetworkManager[19050]: [1499477792.7971] dhcp4 (eth0): activation: beginning transaction (timeout in 45 seconds)
Jul 07 18:36:32 toromachine NetworkManager[19050]: [1499477792.7984] dhcp4 (eth0): dhclient started with pid 22057

$ cat /var/lib/NetworkManager/NetworkManager.state

[main]
NetworkingEnabled=true
WirelessEnabled=false
WWANEnabled=true
WimaxEnabled=true

$ sudo dhcpdump -i eth0

TIME: 2017-07-07 18:45:11.570
IP: 0.0.0.0 (28:f1:e:48:52:d2) > 255.255.255.255 (ff:ff:ff:ff:ff:ff)
OP: 1 (BOOTPREQUEST)
HTYPE: 1 (Ethernet)
HLEN: 6
HOPS: 0
XID: 04f7d201
SECS: 39
FLAGS: 0
CIADDR: 0.0.0.0
YIADDR: 0.0.0.0
SIADDR: 0.0.0.0
GIADDR: 0.0.0.0
CHADDR: 28:f1:0e:48:52:d2:00:00:00:00:00:00:00:00:00:00
SNAME: .
FNAME: .
OPTION: 53 ( 1) DHCP message type 1 (DHCPDISCOVER)
OPTION: 12 ( 11) Host name toromachine
OPTION: 55 ( 18) Parameter Request List 1 (Subnet mask)
28 (Broadcast address)
2 (Time offset)
3 (Routers)
15 (Domainname)
6 (DNS server)
119 (Domain Search)
12 (Host name)
44 (NetBIOS name server)
47 (NetBIOS scope)
26 (Interface MTU)
121 (Classless Static Route)
42 (NTP servers)
121 (Classless Static Route)
249 (MSFT - Classless route)
33 (Static route)
252 (MSFT - WinSock Proxy Auto Detect)
42 (NTP servers)

I couldn’t pin down the source of this, but there is a TX error sticking out:

errors:34

There are also collisions:

collisions:541

Typically this means two interfaces were configured to use the same address. They may be fighting for who responds to traffic. This is just a contrived example (though it could actually happen), but in this case it could be the router and the PC both trying to act as router. Suppose for example you had originally told the router to do DHCP serving for the Jetson…then you missed a setting and the PC was instead being told to act as router…the two would fight for this traffic if the router or PC is not set up such that one and only one responds to DHCP. Traffic involving those two interfaces would probably stop everything on the interface from working correctly.

So if I well understood, my computer is currently set up as a server.
I need to set up it as a client so I need to remove the DHCP server from my computer.
I haven’t idea how I’m supposed to do that.
Which files I need to modify or which commands apply?

Everything DHCP is custom to a specific distribution of Linux, and I’m on Fedora…I don’t know about Ubuntu. However, if your PC is set up as a DHCP server (a router), and you have another router, then you would be correct about this being the cause of the error.

Since I use Fedora I don’t really have access to JetPack, but I do know it has the option during install to check off that you have a router, or to use PC to function as router…perhaps you have a real router and the checkbox to use PC as router was enabled…this would result in the above network errors.

In some cases systemd may be managing dhcpd as a service. If so, then it’ll show up via this:

sudo systemctl status dhcpd.service

If the result does not show disabled, then this would temporarily stop it for testing:

sudo systemctl stop dhcpd.service

And the following would disable dhcpd.service across reboots (you may still need to stop a running instance, not sure):

sudo systemctl disable dhcpd.service

I tried:

sudo systemctl status dhcpd.service
● dhcpd.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)

So that means, I don’t have a DHCP server on my computer. True?

It may be there is a different name for a DHCP server service on your distribution, but it may be correct that it means you don’t have dhcpd. What do you get from:

sudo which dhcpd

The presence of “collisions” in ifconfig is a serious error, and pretty much can only be triggered by two devices thinking they are to respond to the same thing (it could be two network cards on one computer, or two separate computers)…DHCP requests would be the most obvious if the network is otherwise configured correctly. If two devices have the same IP address, then this too would be an issue…but if devices use DHCP and not static address assignment, then it would be more difficult to assign an address under DHCP in the first place.

Do you know the address of every device on your local network? Can you verify they are all different and not using a copy of an address between any two devices?

I’m connected to the University network so I can’t know every IPs.
My others computers are connected on the University DHCP.

for the dhcpd, I get:

sudo which dhcpd
/usr/sbin/dhcpd

When I type this command on my other computer, there is nothing displayed.

There is something weird because I can connect my computer with the wifi to Internet through the DHCP. But not with Ethernet. So I guess, the problem comes from the setup of Ethernet interface.

To verify, is it correct that there is a single network switch/hub, that both the PC host and Jetson connect to that switch, and it is the switch which connects to your university network? This is my assumption.

One thing I do not know about which might be important is your university’s handling of the wired ethernet connection which your switch is connected to. If that port allows multiple computers (which is often the case), then both Jetson and PC would get their addresses from the University’s DHCP server. The lack of dhcpd on your PC would imply the Jetson cannot use the PC for routing, so this makes more sense, and would be the proper way to handle things. If the port the switch uses to connect to the university network assumes only a single computer device, then it would probably be a fight between PC and Jetson for an address. I can’t answer that, you may need to verify from your university IT department that multiple computers can connect through that wire without an additional router.

Something which is not in doubt is that collisions imply there are two or more devices both using the same address. In a university environment you can’t control this on their network (a second network card on the host is suggested, then run a private network to host…this would be a big boost in security as well as simplifying life). Normally devices would be assigned unique addresses based on the MAC address which is built into the hardware. However, this is not always the case. There was one university TK1 user I remember from several years back connected to a university network and I recall that for some reason another student with a TK1 was also responding to “tegra-ubuntu” naming convention…there was a conflict (I am not certain, but perhaps the Jetsons all had the same MAC address). Just as a test you might want to try to edit “/etc/hostname”. Change “tegra-ubuntu” to “tegra-ubuntu-tx2” (or anything you like), then reboot and test again. Check “ifconfig” and after several network commands see if “collisions” still has something other than “0” for count.

Currently, I just try to recover my ethernet connexion on my computer so Jetson is not used.

So my computer is connected to a hub, which is connected to the University network.
This hub is used to connect several computers to this network. I don’t have any issues with the others computers.

I’ll contact the IT department of the University but I don’t think they will be able to help me.

It would be interesting to see if “ifconfig” on any other computer connected to this port shows any sign of errors, or especially collisions (this would initially be without the Jetson connected). If those other computers do not show collisions, then connect the Jetson and get the collisions to show up on the Jetson…then check each of the other computers again to see if collisions start to occur. If one shows collisions which did not show collisions before, then compare the MAC address and other ifconfig data between the two computers.

This is my ifconfig with my other computer, I can get internet with this one through the same hub.
The wireless is disconnected and I plug ethernet to the hub.

lap@lapPc:~$ ifconfig
eth0      Link encap:Ethernet  HWaddr d4:be:d9:47:19:45  
          inet addr:169.237.89.143  Bcast:169.237.89.255  Mask:255.255.255.0
          inet6 addr: fe80::d6be:d9ff:fe47:1945/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:245 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5761 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:36538 (36.5 KB)  TX bytes:408305 (408.3 KB)
          Interrupt:20 Memory:e6e00000-e6e20000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:24395 errors:0 dropped:0 overruns:0 frame:0
          TX packets:24395 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1 
          RX bytes:2834309 (2.8 MB)  TX bytes:2834309 (2.8 MB)

wlan0     Link encap:Ethernet  HWaddr 8c:70:5a:b8:ca:80  
          inet6 addr: fe80::8e70:5aff:feb8:ca80/64 Scope:Link
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:234678 errors:0 dropped:0 overruns:0 frame:0
          TX packets:119983 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:294249620 (294.2 MB)  TX bytes:27591390 (27.5 MB)

So I don’t have collisions with this computer.
I contacted the IT service of the university to know if there are something wrong about the university network.
I’ll keep you informed.

Hi,

The problem is solved.
It was an issue with the DHCP of the university.
My computer was not allowed to be connected to the DHCP anymore.

Thank you very much for your help.