Hai all,
I am having problem with my board because of network cuts. Lately I bought a Gbit switch and setted up a private network for the nodes. Everything works just fine by default but when I switch to jumbo frames and maximum CPU then all nodes become unstable after a while and Ethernet is down with no kernel log. (mostly happening if I dont do a call for a while, makes me feel like something is going idle permanently)
So I find out that it is a common issue with Realtek 8169 drivers under high load:
Thanks @linuxdev I saw this thread and the solution I was hoping a better solution out there without reducing the speed. Anyway I will give a shot and test the bandwith
Ok I did a bit more reading about this driver maybe will save someone time. I have recompiled 21.1 kernel and I have changed r8169 from builtin (*) to Module (M) then I was able to try other drivers.
first I have tried existing r8169 driver with below parameter, network down on client after high load
options r8169 use_dac=1 debug=8
I compiled and loaded r8168-8.039.00 from realtek website eth0 was working but same result on high load its just survived a bit longer then original driver
r8169-6.0.014.00 driver from realtek website didnt recognize eth0
On uboot config disabling acpm didnt help for any driver, I had ethernet cut after high load
pci_acpm=off
each driver has different parameters you can check with
modinfo -p r816x
I have tried with different combinations of kernel parameters didnt help too. Especially some people were lucky about ‘autoneg off’
ethtool -s eth0 autoneg off
Mii-tool watch option didnt provide any output while network down
mii-tool -w eth0
No kernel log on any failure, even I set 7 on menuconfig
I have tried all of these mostly remotely
Some people also wrote that they got lucky after disabling IOMMU on kernel config/bios but I gave up:) I can get 968mbit/s with jumbo frames and when I set mtu 1500 (which is max for 100mbit network) its not even 50mbit/s. I am working on MPI so on some parts of code I rely on network, so I just bought a gigabit switch:( Is there anyone using their devices without any problem on gigabit network remotely?
Same problem here. I recompiled 3.10.40 without r8169 and installed r8168, but the network still fails with moderate traffic (wget something…). Too bad :-(
No idea on what to try, but after poking around my conclusion is that the r8169 driver is not the actual fault…at least not all by itself, so fixing the driver probably will not fix the issue. A wild guess is that the full story involves the scheduler interacting with the driver and that changing the scheduler would be the real fix. The older r8168 likely just interacts differently by good luck (luck is of course a great component…I keep trying to find it in object oriented design books, but they forgot to add this).
Figuring out how the timing and interaction works together would be an extraordinary feat. Somewhere a short distance into the future from 3.10.40 this interaction issue just disappeared.
I’m using my jetson just as a headless server and would love to use a mainline kernel. I tried 3.18.1 with def_tegra config, but the kernel does not boot. In fact I see nothing via hdmi or via network (no mac seen on the network). :-(
Still waiting for my serial/usb adapter to arrive to see if I can get anything from uboot.
The mainline kernel is still missing some features but it definitely should boot. But with upstream kernel you unfortunately need to use also upstream u-boot. Did you test with that?
Yes, but no luck. I compiled mainline u-boot and tried to install it with tegra-uboot-scripts. With this I can see the ethernetaddr and ip set via --env in the environment, so u-boot is working, but the kernel never boots after this. I also tried with flash.sh -L with mainline u-boot but with this I never seen anything on my network. Do I need to change anything on mainline u-boot to work with flash.sh? (tried several configs for network, preboot and netconsole and applied a patch for pcie to work, but nothing changed)
Maybe some dump mistake somewhere, but without serial console I’m stuck with the default stack…
Yeah, sometimes I’m a little bit masochistic :). Today my USB-serial cable arrived and it works perfect. BUT: With mainline u-boot nothing on the serial console either…
@coderchris, how did you install Docker on the TK1? Am I about to use the Grinch kernel? I tried using sudo apt-get install docker.io but running sudo docker -d gives me an error.
Here’s the output with error:
ubuntu@tegra-ubuntu:~$ sudo docker -d
[sudo] password for ubuntu:
2015/04/18 13:35:57 WARNING: The docker runtime currently only officially supports amd64 (not arm). THIS BUILD IS NOT OFFICIAL AND WILL NOT BE SUPPORTED BY DOCKER UPSTREAM.
2015/04/18 13:35:57 docker daemon: 1.0.1 990021a; execdriver: native; graphdriver:
[8ec66997] +job serveapi(unix:///var/run/docker.sock)
[8ec66997] +job initserver()
[8ec66997.initserver()] Creating server
2015/04/18 13:35:57 Listening for HTTP on unix (/var/run/docker.sock)
Error running DeviceCreate (createPool) dm_task_run failed
[8ec66997] -job initserver() = ERR (1)
2015/04/18 13:35:58 Error running DeviceCreate (createPool) dm_task_run failed