I would guess that if your traces are that short, that balanced, and impedance is well-matched, and with no hardware failure, that you will still need to profile. There is only one CPU core which can handle hardware IRQ…CPU0. Competition of other drivers for time on CPU0 means your driver cannot run (or the other driver cannot run) and speeds drop. And of course vice-versa. It seems you’ve only looked for hardware reasons, but software can do what you are seeing when drivers are not allowed to run without delay.
If there were noise, then I could imagine as noise source changes that speed would change…but those are short traces and it sounds like they are matched. Such extreme swings, and never reaching full speed, implies either the noise is severe and never goes completely away, or that noise is not the problem. Noise as the problem seems less likely when considering what you’ve said about layout of traces and impedance matching.
Note that it is possible to set the Jetson to performance mode to avoid clocks throttling back, this would be one way to get CPU0 to max performance. See:
http://elinux.org/Jetson/TX1_Controlling_Performance
I can’t tell you how, but it seems like your issue is IRQ starvation, not physical/electrical layout. You could invent something to artificially change the load on CPU0 (perhaps a file transfer to “/dev/null”) and see if things get even worse. Example:
# install htop, monitor "htop -uroot"...
sudo -s
dd if=/dev/mmcblk0 of=/dev/null bs=512
exit
Keep in mind that you can run the same dd test case multiple times simultaneously. See if you see networking slow down without your custom board as load goes up on CPU0. Or see if the effects of your card are made worse with CPU0 artificially loaded down even more. You can try to look at “/proc/interrupts” and verify if CPU0 interrupt rate goes up overall (indicating hardware IRQ use). You can also run the load with “nice” to increase its priority (I wouldn’t use more than -2 increase for testing). Example:
sudo -s
nice -n -2 dd if=/dev/mmcblk0 of=/dev/null bs=512
exit
You might get a feel for how ethernet speeds change if you change the priority of processes you think are related to the issue (htop can renice to higher priority like -2 or lower priority like +2 fairly easily…see the hot key menu at the bottom after you move the cursor up or down to the process you are interested in…you have to run htop itself with sudo to increase priority).
For reference, if you check “/proc/interrupts”, notice that some listed interrupt sources occur on any CPU, but others occur in large number only on CPU0. If you run this command it’ll give you just the processes with interrupts on CPU0:
cat interrupts | egrep -v '[:][ ]+0 '
I’m not sure if reading eMMC is actually the best way to produce hardware interrupt load, but it is an example. Perhaps reading an SD card ("/dev/mmcblk1") instead would be the best test, or with a dd block size of 1 instead of 512. The goal is to cause use of CPU0 while more or less leaving alone other CPUs. Should there be networking changes following CPU0 load, then you probably have shown the issue as not trace or hardware layout.