Tx2 eth port can’t handle high incoming traffic: GigE stream issue

Hello,
I am using GigE camera with my jetson Tx2. The camera is connected via the eth port. The eth port is configured for 1000mb/s. The video stream format is rawbayer at 14fps.

I have observed frame losses for framerates above 12fps This is because the RX ethernet buffers are not able to handle this incoming traffic on time. The following network parameters are set at maximum. It has improved the capacity but not yet to the required level.

  net.core.netdev_max_backlog
  net.core.rmem_max
  net.core.rmem_default

I don’t know if it is important but I could see that denver processor is in cpu-idle mode. Is that someway connected to the buffer loss issue?

How can I achieve the full capacity and avoid RX losses?

I‘m using Jetpack4.2.1 based yocto image for flashing

Updates
###############

I was investigating this issue, I had set different nvp modes and ran the camera again. Following are the observations

  1. The camera was running with nvp mode 1 and mode 3 without packet losses.
  2. Remaining modes were failed to handle incoming data from the Camera.
  3. For mode 1 and 3 denver cpu were inactive. My conclusion is that there are RX buffer losses when both denver and a57 cpus are activated(mode0,2)

hello anishmonachan7,

it’s surprise that you found this issue related to Denver cores enabled,
may I know who’s your sensor vendor,
please also refer to Jetson Partner Supported Cameras, did you working with cameras supported by Jetson Camera Partners on the Jetson platform.
thanks

Hi JerryChang,

Thanks for your reply.

I’m using ids ueye camera sensors, This is not in the list of supported cameras by nvidia.

I have already been using these sensors with older versions of jetpack. Say jetpack 3.2.1. I had no buffer losses with this jetpack version.

Now I’m testing on the new jetpack Linux4Tegra R32.2, JetPack 4.2.1.

Hello JerryChang,
I’ve been waiting to receive some updates on this issue.
Have you been able to have a look again?

Hi anishmonachan7,

please see this post

regards
Bibek

Hey Bibek,

Thanks for your reply. I have followed all the methods given by the posts.
I’ve set processor affinity. I have even tried to pin the task to single processor and threads.
I don’t think this solution would solve my Rx buffer loss issue.

A single processor or thread might not be able to handle such a gigabit incoming traffic without core switching/sharing. The incoming traffic handling went even worse when I pinned the task into denver cores and the performance stayed as the same when I pinned the task into a57 cores. When I set affinity to all cpus, the issue rises.

The issue in my case is very clear, my task requires both denver and a57, 6 cores to handle the incoming traffic to the eth interface. There is Rx buffer loss when it activates the denver cores.

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

how have you concluded that you need 6 cores and not 4 or 8?
How about increasing the payload size and reducing the interrupt count?

Hello,
This issue is still exists and not solved.

how have you concluded that you need 6 cores and not 4 or 8?
I had two different jetson tx2. One is running on jetpack 3.2.1 and the other on Jetpack 4.2.1 based yocto image. I tested the gigE stream with different nvp modes on these two jetson.

Jetson with jetpack 3.2.1 was handling 14fps input stream in both rgb and rawbayer format in nvpmode 0, but the second jetson couldn’t handle such high incoming data frames and shows frame drops.

Observation during comparison:
1.I didn’t see any difference between nvp mode 3 and nvp mode 0. ie; I had same behavior that frame loss occurs beyond 12fps.
2. I ran both jetsons on nvpmode 4, where denver is activated. Second jetson couldn’t handle even 1fps incoming data.

And I came to a conclusion that denver cores aren’t activated in nvpmode 4 and 0. And both A57 and Denver cores required to achieve the high incoming data

How about increasing the payload size and reducing the interrupt count?
I haven’t done this. Can you just tell me how to do it? If you meant to minimize the load on the cpu’s, yes It is already tested and no progress

By increasing payload, I was asking if you can reduce the burst of interrupts by using big packets. I will check internally with Connectivity experts if we can be more explicit.

From what you have written, I understand that you should not use denver cores for your interrupt processing. So, when Denver core are disabled, perf is better for your task. Note that for short burst tasks, its advised to not use denver cores.
Also, are you suggesting that with for A57, your usecase is not getting satisfied and you need more A57?

Here are two suggestions :

  1. Driver not fast enough fetch the data from Rx DMA buffers as MAC receiving fast and handover the same to network stack. Because of this MAC MTL queues will be overflow. We can get ethtool counters to confirm this -
  • ethtool -S eth0 * will give the details. If this is the issue we can try increasing CPU/MC clocks.
  1. One more case is that – No issues observed at HW/driver level but there is packet drop at stack level. Can be issue with re-assembly also at the network stack level. We need to tune the network stack parameters to avoid this kind of issue. If its issue with re-assembly we can try below parameters.

echo 2140004608 > /proc/sys/net/ipv4/ipfrag_high_thresh

sysctl -w net.core.rmem_max=26214400

sysctl net.core.rmem_max

sysctl -w net.core.netdev_max_backlog=2000

sysctl -w net.core.netdev_budget=600

Bibek,

Talking about your suggestion 2:
This doesn’t solve my issue. I still have packet losses. In fact, the camera driver daemon script does a few network stack tuning at start.

Suggestion 1:
I observed that I have rx-queue drops. I have made two modification in the nvpmodel.conf, with which I was able to stream without packet losses.

The modifs I made with the MAXN mode were:

1.`CPU_DENVER MAX_FREQ -1` **to** `CPU_DENVER MAX_FREQ 990000`

2.` EMC MAX_FREQ 0` **to** `EMC MAX_FREQ -1`

still I don’t properly understand if this will affect my system performance
My nvpmode 0 config now looks like

MAXN is the NONE power model to release all constraints

< POWER_MODEL ID=0 NAME=MAXN >
CPU_ONLINE CORE_1 1
CPU_ONLINE CORE_2 1
CPU_ONLINE CORE_3 1
CPU_ONLINE CORE_4 1
CPU_ONLINE CORE_5 1
CPU_A57 MIN_FREQ 0
CPU_A57 MAX_FREQ -1
CPU_DENVER MIN_FREQ 0
CPU_DENVER MAX_FREQ 990000
GPU MIN_FREQ 0
GPU MAX_FREQ -1
EMC MAX_FREQ -1

@anishmonachan7

Effect of your change in nvpmodel:
CPU_DENVER MAX_FREQ 990000 -> Reduces Denver Core Fmax from 2035MHz to 990MHz
EMC MAX_FREQ -1 -> same as writing 0, EMC Fmax is 1866MHz.

Instead of reducing Denver Core Fmax, you can try disabling the cores completely. There is a known latency issue with Denver cores on TX2. Please refer section 5.16 in our latest release notes https://docs.nvidia.com/jetson/l4t/pdf/Jetson_Linux_Driver_Package_Release_Notes_R32.4.3_GA.pdf

@rkasirajan Thanks for your answer. Just to make it clear, can you confirm since with which version of jetpack/l4t you have denver latency issue? if switching back to some other version solve this issue?

It should be observed with all recent Linux4Tegra “R32” releases due to kernel security fixes.

Have you tried disabling CPU idle states for both Denver and A57 cores? That can also affect performance.

To disable all CPU idle states on TX2:
echo 0 > /sys/kernel/debug/tegra_cpufreq/B_CLUSTER/cc3/enable
echo 0 > /sys/kernel/debug/tegra_cpufreq/M_CLUSTER/cc3/enable
for cpu in /sys/devices/system/cpu/cpu[0-9]/cpuidle/state[0-9]/disable;do cat “$cpu” ; done

I have following error for the third command for cpu in /sys/devices/system/cpu/cpu[0-9]/cpuidle/state[0-9]/disable;do cat “$cpu” ; done

root@j140-tx2:~# for cpu in /sys/devices/system/cpu/cpu[0-9]/cpuidle/state[0-9]/disable;do cat “$cpu” ; done
cat: ‘’'\342\200\234''/sys/devices/system/cpu/cpu0/cpuidle/state0/disable'’\342\200\235’: No such file or directory
cat: ‘’'\342\200\234''/sys/devices/system/cpu/cpu0/cpuidle/state1/disable'’\342\200\235’: No such file or directory
cat: ‘’'\342\200\234''/sys/devices/system/cpu/cpu1/cpuidle/state0/disable'’\342\200\235’: No such file or directory
cat: ‘’'\342\200\234''/sys/devices/system/cpu/cpu1/cpuidle/state1/disable'’\342\200\235’: No such file or directory
cat: ‘’'\342\200\234''/sys/devices/system/cpu/cpu1/cpuidle/state2/disable'’\342\200\235’: No such file or directory
cat: ‘’'\342\200\234''/sys/devices/system/cpu/cpu2/cpuidle/state0/disable'’\342\200\235’: No such file or directory
cat: ‘’'\342\200\234''/sys/devices/system/cpu/cpu2/cpuidle/state1/disable'’\342\200\235’: No such file or directory
cat: ‘’'\342\200\234''/sys/devices/system/cpu/cpu2/cpuidle/state2/disable'’\342\200\235’: No such file or directory
cat: ‘’'\342\200\234''/sys/devices/system/cpu/cpu3/cpuidle/state0/disable'’\342\200\235’: No such file or directory
cat: ‘’'\342\200\234''/sys/devices/system/cpu/cpu3/cpuidle/state1/disable'’\342\200\235’: No such file or directory
cat: ‘’'\342\200\234''/sys/devices/system/cpu/cpu4/cpuidle/state0/disable'’\342\200\235’: No such file or directory
cat: ‘’'\342\200\234''/sys/devices/system/cpu/cpu4/cpuidle/state1/disable'’\342\200\235’: No such file or directory
cat: ‘’'\342\200\234''/sys/devices/system/cpu/cpu5/cpuidle/state0/disable'’\342\200\235’: No such file or directory
cat: ‘’'\342\200\234''/sys/devices/system/cpu/cpu5/cpuidle/state1/disable'’\342\200\235’: No such file or directory

Finally, after several investigation, we have realized that this issue is coming from jetpack version 4.2.2. Earlier, we have been using jetpack version 3.2.1 with which we had no frame drops issues with streaming in nvpmodel 0.

So there is a new regression with latest jetpack versions for nvpmodel. Setting task affinity for the camera is the workaround

The command is not correct. Please try below cmdline to disable CPU idle states:
for cpu in /sys/devices/system/cpu/cpu[0-9]/cpuidle/state[0-9]/disable; do echo 1 > $cpu; done
echo 0 > /sys/kernel/debug/tegra_cpufreq/B_CLUSTER/cc3/enable
echo 0 > /sys/kernel/debug/tegra_cpufreq/M_CLUSTER/cc3/enable

@anishmonachan7
The nvpmodel change to disable Denver cores is not a regression. It is done intentionally to fix the TX2 CPU performance issues. Our R32.4.3 Release Note document already has this information https://docs.nvidia.com/jetson/l4t/pdf/Jetson_Linux_Driver_Package_Release_Notes_R32.4.3_GA.pdf. Please follow the instructions provided in the document to enable and schedule a job on Denver cores.