Network connection loss when TX ring full

damien.lefevre · September 23, 2020, 8:54am

Hi,

I recently applied the preempt-rt patches to my kernel and doing some optimization.

With streaming intensive ~950Mbit/s application on ETH0 I get this error

eqos 2490000.ether_qos eth0: eqos_start_xmit(): TX ring full for queue 0

At this point the i lose the connection to the device. From terminal I can see that the interface is alive

eth0      Link encap:Ethernet  HWaddr 00:04:4B:E5:92:98  
          inet addr:192.168.1.100  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::204:4bff:fee5:9298/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX packets:1322586 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6788940 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:68915744 (65.7 MiB)  TX bytes:78902510607 (73.4 GiB)
          Interrupt:40

but I cannot ping my host PC at 192.168.1.2 and vice versa.

The only way to recover the connection is to restart the network manager. And I see these kernel messages:

[30001.363523] gpio tegra-gpio wake20 for gpio=52(G:4)
[30001.374531] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[30004.303672] eqos 2490000.ether_qos eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[30004.304901] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

Is there a way to gracefully recover from this error?

linuxdev · September 23, 2020, 3:18pm

I cannot answer what you want to know, but it is somewhat interesting to see TX packet count go up so high, not have errors, dropped, overruns, nor carrier errors, and yet it fails. Prior to this failing are you able to ping the PC from the Jetson? Can you verify that the eth0 ifconfig above is from the Jetson side?

Can you also show the ifconfig for the other computer, along with route output for both Jetson and other computer?

thesyght · October 26, 2020, 2:03am

@damien.lefevre

Getting the same issue, any insights?

For me. Can’t ping anyone on network from within the xavier and nmap on an external device can’t see the xavier on the network.
As for ifconfig, the interface doesnt even know that it’s dropped off:

linuxdev · October 26, 2020, 3:25pm

Your ifconfig shows address 192.168.1.7 was configured. What is your output from the “route” command? The two work together. Same question on the host PC side…what is the output from “ifconfig” and “route”?

thesyght · October 27, 2020, 6:10am

Note had to blank some stuff out due to privacy/security:
Route gave the following at the time:

Normally it give:

On the host side it is:

ifconfig of note is the wifi interface:

What exactly are you looking for here, can you please elaborate.

linuxdev · October 27, 2020, 7:00pm

I was looking to see if the subnets matched (they do), and if the network interface on either side indicated a network setup issue (such as errors, drops, overruns, framing, collision…none seen).

At the external network level it appears that all is good. The error is probably internal to the Jetson (you already knew that, but worth mentioning), but it isn’t the network PHY itself at issue. My thought is that the application consuming data from the network simply is not fast enough (not necessarily the application’s fault; could for example be IRQ starvation or a lack of priority).

Note that if this is on the Jetson side, then your application is always sending data when the error hits:

The TX did not note any drops, and so this is quite curious. The thought here is that either the ifconfig was run prior to an error (confirm if the ifconfig was prior to or after the error), or else the data was dropped before making its way into the TX queue.

After the error occurs, if you use “less /proc/interrupts”, at the end of the file, is the “Err:” line still 0 (it should be)?

thesyght · October 27, 2020, 11:07pm

Hey @linuxdev,

Yeh it’s interesting, when you say application do you mean the singular or the “application layer”, since an nmap scan from an external device does not show the xavier on the network and from within the xavier it cannot even ping the gateway (btw unplugging and replugging the ethernet does not help) i think this may be an OS layer issue.

This issue is happening once a week, i have a few tests to run next time it happens (if maybe the new salt lamp i put next to it will fix it) and will repost here, anything you think worth testing (besides the “less” command).

damien.lefevre · October 28, 2020, 7:57am

Hi @thesyght

I haven’t bumped in the issue recently but I’ve been tweaking a lot of parameters.

One thing to note with PREEMT-RT is that each IRQ has its own kernel thread and the default priority is 50. So if your application/thread has higher priority it’ll preempt.

In case of the devkit NIC, there are 3 IRQs

watch -n 1 -d watch -n1 -d grep -e Err -e IPI -e eth -e CPU -e arch /proc/interrupts

Surprisingly enough, all IRQs are scheduled on CPU 0, although the default smp affinity allows scheduling on all cores

~#cat /proc/irq/default_smp_affinity
ff

I use a yocto based distro and I haven’t taken the time to flash the stock jetpack to check the behavior there. Maybe someone can confirm it is the same or not =). But it’s the same kernel.

You can manually change the affinity

echo 1 > /proc/irq/40/smp_affinity_list
echo 2 > /proc/irq/42/smp_affinity_list
echo 3 > /proc/irq/43/smp_affinity_list

To give affinity

ether_qos.common_irq - core 1
2490000.ether_qos.rx0 - core 2
2490000.ether_qos.tx0 - core 3

Note the setting is volatile and needs to be re-set after each boot. You’d need to add a start up script

The other thing that affect is the priority. Let say you cranked the niceness for your process to -20 and set priority to 99 on one or multiple threads. If the scheduler happens to schedule work on CPU 0 where all the IRQs appear to be handled, then they won’t get CPU time and this could explain the queue filling up.

In your code it’s easy to change the priority and CPU affinity for your threads, and the niceness.

For other processes / IRQs you can use the chrt utility.

You need to find the PID for the IRQ. In htop display options, make sure not to hide kernel threads and show custom thread names.

In /proc/interrupts, we saw that 2490000.ether_qos.tx0 is on IRQ 42, so if you search for irq/42 in htop, you get the corresponding PID (4532 for example).

To get to policy

# chrt -p 4532
pid 4532's current scheduling policy: SCHED_FIFO
pid 4532's current scheduling priority: 50

To set the RT (99) priority

chrt -f -p 99 4532

The policy options affects the jitter quite a bit. SCHED_OTHER it the default for non RT threads (priority is ignored). Then priorities are applied when using SCHED_FIFO and SCHED_RR.

Note that round robin gives up CPU time every 25ms by default

# cat /proc/sys/kernel/sched_rr_timeslice_ms
25

You can change that too.

So it’s quite a bargain on what to schedule were and it all depends on your application. I’ll be running more tests to see if I hit into this issue again. I would still not expect the network driver to “crash” for a full send queue.

Hopefully it was just a side effect of to aggressive scheduling

linuxdev · October 28, 2020, 7:20pm

I mean any application at the RX end which consumes the data and results in the buffer clearing. This could be a driver, but since it has no errors or drops or overruns, all I can say is it is “something else other than the actual network activity”. I kind of stabbing in the dark, but I can tell you that the network itself has functioned flawlessly, and something related to the network is failing.

The TX is on the Jetson side, the RX on the other end. Note that you will see a packet count
via ifconfig as data is sent or received. Some of the traffic will be for DHCP setup, or perhaps some sort of route setup, but if you are not using ssh to log in, then you can be guaranteed that most of RX and TX packet count is from your application running. To that extent, without using ssh (is that possible?), then at each end you could run ifconfig on the particular interface and see if during normal operation count goes up approximately the same on both ends, and at the moment of failure, see if the RX end failed to read due to missing packets, or if instead RX read the packets and TX just stopped. Example:
watch -n 1 ifconfig eth0
(then watch TX packets at the Jetson, and RX packets at the other end)

This is where we can run into problems when the hardware IRQ aggregator sends everything to CPU0. Some hardware has access to all cores (e.g., memory controller and timers), other hardware can only interrupt CPU0. You can try to force affinity to a core without hardware IRQ access, but it would end up being rescheduled to CPU0 for such a case (I wish I knew more about the IRQ aggregator design).

I do not know what differences you will see when using Yocto, but it is kind of a “wild card”.

The scheduling experiments are a good idea, but be careful to leave CPU0 alone since it is the only core which handles some hardware.

thesyght · October 29, 2020, 11:25pm

Wow, okay, thank you both @linuxdev and @damien.lefevre I have a lot to google and brush up on after reading those pieces.

@damien.lefevre It sounds like your really getting into the nuts and bolts of the OS there. As it is I’m really not doing anything near as intrusive or better yet impressive. ROS is running and large ammounts of image data is sent over the network using it and maybe it’s doing something funky, but it would be odd for me to be able to produce the same errors as someone playing around with thread priority and scheduling. I will try to increase the priority of the IRQ.
When you say “Jitter” are you referring to delay between receive and start of a task? Why would priority increase produce a delay?

@linuxdev The issue happens only every couple days of it being left on so setting up a test and waiting is going to be a pain and might take a few attempts but i will do my best.

11151 · November 17, 2020, 11:49pm

having the same issue.

using golang to proxy files (in GBs, two concurrent)

Nov 17 15:47:52 XXX kernel: eqos 2490000.ether_qos eth0: eqos_start_xmit(): TX ring full for queue 0

nothing else has been printed out

MarkusHess · November 24, 2020, 6:29pm

I have the same issue on a Jetson Xavier NX devkit while streaming images over traefik to another device. Is there any solution?

damien.lefevre · November 25, 2020, 12:47pm

for me the solution was to optimize the thread/irq priorities and CPU affinity.

Maybe you could try to tweak the buffer sizes in /etc/sysctl.conf. I have

net.core.wmem_max=8000000
net.core.wmem_default=8000000

MarkusHess · November 25, 2020, 6:15pm

Unfortunately, tweaking the buffer sizes did not help.

I wanted to try to change the priority of the 2490000.ether_qos.tx0 IRQ. From /proc/interrupts it should be irq/40 on the Xavier NX, but I cannot find the corresponding process. Any ideas?

This is the output of cat /proc/interrupts:

            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       
   2:          0          0          0          0          0          0     GICv2   29 Level     trusty
   3:    5264356    7537410    8085336    8180124    7851080    8069824     GICv2   30 Level     arch_timer
   6:    6698394          0          0          0          0          0     GICv2  208 Level     hsp
   7:          0          0          0          0          0          0     GICv2  202 Level     arm-smmu global fault
   8:          0          0          0          0          0          0     GICv2  203 Level     arm-smmu global fault
   9:          0          0          0          0          0          0     GICv2  264 Level     arm-smmu global fault
  10:          0          0          0          0          0          0     GICv2  265 Level     arm-smmu global fault
  11:          0          0          0          0          0          0     GICv2  272 Level     arm-smmu global fault
  12:          0          0          0          0          0          0     GICv2  273 Level     arm-smmu global fault
  13:          0          0          0          0          0          0     GICv2  368 Level     tegra-p2u-intr
  14:          0          0          0          0          0          0     GICv2  369 Level     tegra-p2u-intr
  15:          0          0          0          0          0          0     GICv2  370 Level     tegra-p2u-intr
  16:          0          0          0          0          0          0     GICv2  371 Level     tegra-p2u-intr
  17:          0          0          0          0          0          0     GICv2  372 Level     tegra-p2u-intr
  18:          0          0          0          0          0          0     GICv2  373 Level     tegra-p2u-intr
  19:          0          0          0          0          0          0     GICv2  374 Level     tegra-p2u-intr
  20:          0          0          0          0          0          0     GICv2  375 Level     tegra-p2u-intr
  21:          0          0          0          0          0          0     GICv2  376 Level     tegra-p2u-intr
  22:          0          0          0          0          0          0     GICv2  377 Level     tegra-p2u-intr
  23:          0          0          0          0          0          0     GICv2  253 Level     tegra-p2u-intr
  24:          0          0          0          0          0          0     GICv2  254 Level     tegra-p2u-intr
  25:          0          0          0          0          0          0     GICv2  378 Level     tegra-p2u-intr
  26:          0          0          0          0          0          0     GICv2  379 Level     tegra-p2u-intr
  27:          0          0          0          0          0          0     GICv2  380 Level     tegra-p2u-intr
  28:          0          0          0          0          0          0     GICv2  381 Level     tegra-p2u-intr
  29:          0          0          0          0          0          0     GICv2  382 Level     tegra-p2u-intr
  30:          0          0          0          0          0          0     GICv2  383 Level     tegra-p2u-intr
  31:          0          0          0          0          0          0     GICv2  235 Level     tegra-p2u-intr
  32:          0          0          0          0          0          0     GICv2  252 Level     tegra-p2u-intr
  33:          1          0          0          0          0          0     GICv2   83 Level     tegra-pcie-intr, PCIe PME, aerdrv
  34:      16081          0          0          0          0          0     GICv2   84 Level     tegra-pcie-msi
  35:          1          0          0          0          0          0     GICv2   85 Level     tegra-pcie-intr, PCIe PME, aerdrv
  36:         79          0          0          0          0          0     GICv2   86 Level     tegra-pcie-msi
  37:          1          0          0          0          0          0     GICv2  226 Level     ether_qos.common_irq
  39:   12821285          0          0          0          0          0     GICv2  222 Level     2490000.ether_qos.rx0
  40:    5845421          0          0          0          0          0     GICv2  218 Level     2490000.ether_qos.tx0
  47:         16          0          0          0          0          0     GICv2  144 Level     3100000.serial
  50:          0          0          0          0          0          0     GICv2  152 Level     combined_uart rx
  51:      32602          0          0          0          0          0     GICv2   94 Level     mmc0
  52:          0          0          0          0          0          0     GICv2   68 Level     3210000.spi
  53:          0          0          0          0          0          0     GICv2   70 Level     3230000.spi
  54:       4827          0          0          0          0          0     GICv2   67 Level     3270000.spi
  55:          0          0          0          0          0          0     GICv2   57 Level     3160000.i2c
  56:          0          0          0          0          0          0     GICv2   58 Level     c240000.i2c
  57:      11044          0          0          0          0          0     GICv2   59 Level     3180000.i2c
  58:          0          0          0          0          0          0     GICv2   60 Level     3190000.i2c
  59:          0          0          0          0          0          0     GICv2   62 Level     31b0000.i2c
  60:          0          0          0          0          0          0     GICv2   63 Level     31c0000.i2c
  61:       1225          0          0          0          0          0     GICv2   64 Level     c250000.i2c
  62:          0          0          0          0          0          0     GICv2   65 Level     31e0000.i2c
  64:        116          0          0          0          0          0     GICv2  193 Level     snd_hda_tegra
  65:          0          0          0          0          0          0     GICv2   51 Level     bc00000.rtcpu
  66:      92025          0          0          0          0          0     GICv2  242 Level     d230000.actmon
  67:    3179772          0          0          0          0          0     GICv2  297 Level     host_syncpt
  68:          2          0          0          0          0          0     GICv2  295 Level     host_status
  69:          0          0          0          0          0          0     GICv2  238 Level     vic
  70:          0          0          0          0          0          0     GICv2  268 Level     nvdla0
  71:          0          0          0          0          0          0     GICv2  269 Level     nvdla1
  72:          0          0          0          0          0          0     GICv2  185 Level     15200000.nvdisplay
  73:          0          0          0          0          0          0     GICv2  186 Level     15210000.nvdisplay
  74:          0          0          0          0          0          0     GICv2  191 Level     tegra_dp
  77:          0          0          0          0          0          0     GICv2  194 Level     cec_irq
  79:          0          0          0          0          0          0     GICv2  266 Level     pva-isr
  80:          0          0          0          0          0          0     GICv2  267 Level     pva-isr
  87:          0          0          0          0          0          0     GICv2  397 Level     carmel-pmu
  88:          0          0          0          0          0          0     GICv2  270 Level     noc_nonsecure_irq
  89:          0          0          0          0          0          0     GICv2  271 Level     noc_secure_irq
  90:          0          0          0          0          0          0        PM   42 Level     tegra_rtc
  91:          0          0          0          0          0          0     GICv2  255 Level     mc_status
  93:          4          0          0          0          0          0     GICv2  165 Level     c150000.tegra-hsp
 104:    1422348          0          0          0          0          0     GICv2  214 Level     b950000.tegra-hsp, b950000.tegra-hsp, b950000.tegra-hsp
 108:          0          0          0          0          0          0     GICv2  315 Level     3ad0000.se_elp
 110:          0          0          0          0          0          0     GICv2  108 Level     gpcdma.0
 111:          0          0          0          0          0          0     GICv2  109 Level     gpcdma.1
 112:          0          0          0          0          0          0     GICv2  110 Level     gpcdma.2
 113:          0          0          0          0          0          0     GICv2  111 Level     gpcdma.3
 114:         90          0          0          0          0          0     GICv2  112 Level     gpcdma.4
 115:          0          0          0          0          0          0     GICv2  113 Level     gpcdma.5
 116:          0          0          0          0          0          0     GICv2  114 Level     gpcdma.6
 117:          0          0          0          0          0          0     GICv2  115 Level     gpcdma.7
 118:          0          0          0          0          0          0     GICv2  116 Level     gpcdma.8
 119:          0          0          0          0          0          0     GICv2  117 Level     gpcdma.9
 120:          0          0          0          0          0          0     GICv2  118 Level     gpcdma.10
 121:          0          0          0          0          0          0     GICv2  119 Level     gpcdma.11
 122:          0          0          0          0          0          0     GICv2  120 Level     gpcdma.12
 123:          0          0          0          0          0          0     GICv2  121 Level     gpcdma.13
 124:          0          0          0          0          0          0     GICv2  122 Level     gpcdma.14
 125:          0          0          0          0          0          0     GICv2  123 Level     gpcdma.15
 126:          0          0          0          0          0          0     GICv2  124 Level     gpcdma.16
 127:          0          0          0          0          0          0     GICv2  125 Level     gpcdma.17
 128:          0          0          0          0          0          0     GICv2  126 Level     gpcdma.18
 129:          0          0          0          0          0          0     GICv2  127 Level     gpcdma.19
 130:          0          0          0          0          0          0     GICv2  128 Level     gpcdma.20
 131:          0          0          0          0          0          0     GICv2  129 Level     gpcdma.21
 132:          0          0          0          0          0          0     GICv2  130 Level     gpcdma.22
 133:          0          0          0          0          0          0     GICv2  131 Level     gpcdma.23
 134:          0          0          0          0          0          0     GICv2  132 Level     gpcdma.24
 135:          0          0          0          0          0          0     GICv2  133 Level     gpcdma.25
 136:          0          0          0          0          0          0     GICv2  134 Level     gpcdma.26
 137:          0          0          0          0          0          0     GICv2  135 Level     gpcdma.27
 138:          0          0          0          0          0          0     GICv2  136 Level     gpcdma.28
 139:          0          0          0          0          0          0     GICv2  137 Level     gpcdma.29
 140:          0          0          0          0          0          0     GICv2  138 Level     gpcdma.30
 238:          0          0          0          0          0          0  tegra-gpio   48 Edge      force-recovery
 242:          2          0          0          0          0          0  tegra-gpio   52 Level     phy_interrupt
 245:          0          0          0          0          0          0  tegra-gpio   55 Edge      3400000.sdhci cd
 287:          0          0          0          0          0          0  tegra-gpio   97 Edge      15200000.nvdisplay
 391:          0          0          0          0          0          0  tegra-gpio  201 Edge      external-connection:extcon@1
 454:          0          0          0          0          0          0  tegra-gpio-aon   36 Edge      power-key
 458:        618          0          0          0          0          0     GICv2   39 Level     30c0000.watchdog
 462:      19321          0          0          0          0          0     GICv2  198 Level     3550000.xudc
 463:        394          0          0          0          0          0        PM  195 Level     xhci-hcd:usb1
 464:          1          0          0          0          0          0        PM  196 Level     3610000.xhci
 465:          0          0          0          0          0          0        PM  199 Level     3610000.xhci
 466:        192          0          0          0          0          0     GICv2  102 Level     gk20a_stall
 467:          0          0          0          0          0          0     GICv2  103 Level     gk20a_nonstall
 468:          0          0          0          0          0          0     GICv2  424 Level     ras-fhi
 469:          0          0          0          0          0          0     GICv2  425 Level     ras-fhi
 470:          0          0          0          0          0          0     GICv2  426 Level     ras-fhi
 471:          0          0          0          0          0          0     GICv2  427 Level     ras-fhi
 472:          0          0          0          0          0          0     GICv2  428 Level     ras-fhi
 473:          0          0          0          0          0          0     GICv2  429 Level     ras-fhi
 476:          0          0          0          0          0          0     GICv2  262 Level     noc_nonsecure_irq
 477:          0          0          0          0          0          0     GICv2  263 Level     noc_secure_irq
 478:          0          0          0          0          0          0     GICv2  292 Level     noc_nonsecure_irq
 479:          0          0          0          0          0          0     GICv2  204 Level     noc_secure_irq
 480:          0          0          0          0          0          0     GICv2  294 Level     noc_nonsecure_irq
 481:          0          0          0          0          0          0     GICv2  206 Level     noc_secure_irq
 482:          0          0          0          0          0          0     GICv2  291 Level     noc_nonsecure_irq
 483:          0          0          0          0          0          0     GICv2  207 Level     noc_secure_irq
 484:          0          0          0          0          0          0     GICv2  293 Level     noc_nonsecure_irq
 485:          0          0          0          0          0          0     GICv2  205 Level     noc_secure_irq
 487:          0          0          0          0          0          0        PM  241 Edge      max77620-top
 491:          0          0          0          0          0          0  max77620-top    3 Edge      max77620-gpio
 492:          0          0          0          0          0          0  max77620-top    4 Edge      max77686-rtc
 496:          0          0          0          0          0          0  max77620-top    8 Edge      max77620-thermal
 497:          0          0          0          0          0          0  max77620-top    9 Edge      max77620-thermal
 552:          0          0          0          0          0          0  max77686-rtc    1 Edge      rtc-alarm1
 554:      16081          0          0          0          0          0   PCI-MSI    0 Edge      rtl88x2ce
 810:         18          0          0          0          0          0   PCI-MSI    0 Edge      nvme0q0, nvme0q1
 811:          2          0          0          0          0          0   PCI-MSI    1 Edge      nvme0q2
 812:          2          0          0          0          0          0   PCI-MSI    2 Edge      nvme0q3
 813:         46          0          0          0          0          0   PCI-MSI    3 Edge      nvme0q4
 814:          9          0          0          0          0          0   PCI-MSI    4 Edge      nvme0q5
 815:          2          0          0          0          0          0   PCI-MSI    5 Edge      nvme0q6
IPI0:    5965162    9709599   10066059    9732289    9774661    9519383       Rescheduling interrupts
IPI1:    3831036    4014493    3585342    3956985    3731902    3943269       Function call interrupts
IPI2:          0          0          0          0          0          0       CPU stop interrupts
IPI3:          0          0          0          0          0          0       Timer broadcast interrupts
IPI4:     960023     886124     793893     704548     674663     592605       IRQ work interrupts
IPI5:          0          0          0          0          0          0       CPU wake-up interrupts
 Err:          0

And here is are the processes having irq in their name:

root         3  0.1  0.0      0     0 ?        S    13:02   0:31 [ksoftirqd/0]
root        17  0.0  0.0      0     0 ?        S    13:02   0:00 [ksoftirqd/1]
root        23  0.0  0.0      0     0 ?        S    13:02   0:00 [ksoftirqd/2]
root        29  0.0  0.0      0     0 ?        S    13:02   0:00 [ksoftirqd/3]
root        35  0.0  0.0      0     0 ?        S    13:02   0:00 [ksoftirqd/4]
root        41  0.0  0.0      0     0 ?        S    13:02   0:00 [ksoftirqd/5]
root       220  0.0  0.0      0     0 ?        S    13:02   0:00 [irq/91-mc_statu]
root       732  0.0  0.0      0     0 ?        S    13:02   0:00 [irq/487-max7762]
root       769  0.0  0.0      0     0 ?        S    13:02   0:00 [irq/458-30c0000]
root       956  1.0  0.0      0     0 ?        S    13:02   3:07 [irq/67-host_syn]
root       957  0.0  0.0      0     0 ?        S    13:02   0:00 [irq/68-host_sta]
root      1221  0.0  0.0      0     0 ?        S    13:02   0:00 [irq/79-pva-isr]
root      1224  0.0  0.0      0     0 ?        S    13:02   0:00 [irq/80-pva-isr]
root      1264  0.0  0.0      0     0 ?        S    13:02   0:00 [irq/72-15200000]
root      1291  0.0  0.0      0     0 ?        S    13:02   0:00 [irq/287-1520000]
root      1304  0.0  0.0      0     0 ?        S    13:02   0:00 [irq/74-tegra_dp]
root      1357  0.0  0.0      0     0 ?        S    13:02   0:00 [irq/73-15210000]
root      1482  0.0  0.0      0     0 ?        S    13:02   0:00 [irq/52-3210000.]
root      1487  0.0  0.0      0     0 ?        S    13:02   0:00 [irq/53-3230000.]
root      1494  0.0  0.0      0     0 ?        S    13:02   0:00 [irq/54-3270000.]
root      1594  0.0  0.0      0     0 ?        S<   13:02   0:00 [vfio-irqfd-clea]
root      1643  0.0  0.0      0     0 ?        S    13:02   0:00 [irq/90-tegra_rt]
root      1708  0.0  0.0      0     0 ?        S    13:02   0:00 [irq/51-mmc0]
root      1817  0.0  0.0      0     0 ?        S    13:02   0:00 [irq/93-c150000.]
root      1827  0.0  0.0      0     0 ?        S    13:02   0:09 [irq/66-d230000.]
root      1898  0.0  0.0      0     0 ?        S    13:02   0:00 [irq/245-3400000]
root      2154  0.1  0.0      0     0 ?        S    13:02   0:26 [irq/104-b950000]
root      2155  0.1  0.0      0     0 ?        S    13:02   0:26 [irq/104-b950000]
root      2159  0.2  0.0      0     0 ?        S    13:02   0:45 [irq/104-b950000]
root      2167  0.0  0.0      0     0 ?        S    13:02   0:00 [irq/65-bc00000.]
root      2338  0.0  0.0      0     0 ?        S    13:02   0:00 [irq/466-gk20a_s]
root      4380  0.0  0.0      0     0 ?        S    13:03   0:00 [irq/464-3610000]
root      4383  0.0  0.0      0     0 ?        S    13:03   0:00 [irq/465-3610000]
root     18521  0.0  0.0   4456   620 pts/0    S+   18:14   0:00 grep --color=auto irq
``

damien.lefevre · November 26, 2020, 9:08am

I’ve made a little script

change_irq_priority

function get_pid {
    pids=$(ps aux | grep -i irq/$1 | awk '{print $2}')
    arrPids=(${pids// / })
    length=${#arrPids[@]}

    if [[ $length == 2 ]]; then
        echo "${arrPids[0]}"
    fi
}

pid=$(get_pid $1)
if [ ! -z "$pid" ]; then
    chrt -f -p $2 $pid 
    echo "Done"
else
    echo "$1 not found"
fi

Usage

change_irq_priority [iq name] [priority]

But that only will work if you’ve applied the preempt RT kernel patch since IRQs get their own kernel threads. With standard kernel you cannot control the priority.

linuxdev · November 26, 2020, 10:17pm

Drivers can have user space components with a PID, but this is running in the kernel itself (it is a kernel driver) and a user space PID won’t exist.

MarkusHess · December 9, 2020, 8:25am

Ok, we applied the preempt RT patches and increased the priority if the IRQ. Unfortunately this also did not help. Then we changed the TX_BUF_SIZE and EQOS_MAX_DATA_PER_TX_BUF in the driver sources to 8KB which seems to help. See this thread in the Jetson Xavier NX forum: Eqos 2490000.ether_qos eth0: eqos_start_xmit(): TX ring full for queue 0

Jaime_M · July 9, 2021, 3:48pm

Do you have more information on how to increase those buffers? what file are they on? what is the process for compiling and updating those drivers? I am running into a similar issue and want to try this out.

Topic		Replies	Views
Eqos 2490000.ether_qos eth0: eqos_start_xmit(): TX ring full for queue 0 Jetson Xavier NX ethernet	4	1266	October 18, 2021
Port ethernet performance Jetson TX2	19	2074	October 18, 2021
Localhost/TCP Ping Latency Jetson TK1 Jetson TK1	12	3643	January 21, 2015
Jetson TX1 strange network performance behaviour (still) Jetson TX1	41	4339	October 18, 2021
eth0 timeout on tx1 L4T R24.2.1 when bridged Jetson TX1	36	2620	May 3, 2017
Jetsen TK1 stops responding with SSH connection Jetson TK1	12	2791	December 27, 2014
10G ethernet for jetson tx1 using pci-e x4 Jetson TX1	55	9309	September 11, 2017
High CPU usage on Jetson TX2 with GigE fully loaded Jetson TX2 hw , kernel , performance	12	2572	October 18, 2021
Jetson TK1 r8169 Ethernet cuts on high load Jetson TK1	17	6901	April 19, 2015
After get ip address, ping gateway show: ping SendMsg: no buffer space available Jetson TX2	12	1580	October 18, 2021

Network connection loss when TX ring full

Related topics