We have an application that receives around 30% of full bandwidth on the Jetson TX 2’s. Occasionally we see the following error in journalctl:
kernel: Failed to allocate skb
kernel: Failed to re allocate skb
which doesn’t seem to cause any issues, but later the application will fail as the kernel uses 100% cpu accompanied by the following errors:
[ 6254.096123] prx_desc[00 ffffff800bf25bb0 187 RECEIVED FROM DEVICE] = 0x0:0x0:0x0:0x30208000
[ 6254.097323] prx_desc[00 ffffff800bf25fc0 252 RECEIVED FROM DEVICE] = 0x0:0x0:0x0:0x30208000
[ 6254.118568] prx_desc[00 ffffff800bf253d0 061 RECEIVED FROM DEVICE] = 0x0:0x0:0x0:0x30208000
The 100% is usually caused by irq/xx-gk20a or kworker or ksoftirq
At this point the only way to recover is to perform sudo ifconfig eth1 down and then eth1 up.
We are using a Connecttech Spaceley so the Jetson’s on board ethernet is eth1.
The problem seems to happen when the system is under load but it’s difficult to tell if the required ‘load’ is CPU or memory allocations.
If we run sudo ethtool -S eth1 we see information including:
q_re_alloc_rx_buf_failed: 246 rx_buf_unavailable_irq_n: 14897
I believe the first one equates to seeing Failed to allocate skb.
Could anyone help shed some light on this issue?