Long delays between USB3 SuperSpeed transactions

Hello there!

The issue is described as below.
Please let me know if any additional information is missing.

There’s a USB3 channel between Tegra and my controller.
I am able to read the data from the controller via BULK EP using maximal packets as allowed in burst mode.
Since I was not getting a good throughput I decided to analyze the transmission and here’s what I found:
Each transaction completes as expected in SuperSpeed rate, however, there’s a big delay between each transaction.
This delay is about 500us which decreases the overall throughput to ~30MB/sec

According to logs it looks like the link is in idle state, which I believe can lead to a bug in Tegra USB controller driver. Please see a screenshot below
https://imgur.com/a/h8Ut0

I’d be glad to provide any additional data/logs as required.
Same flow in Windows results in a short delay of ~50us between transactions.
Using JetPack 3.1 w/ Linux kernel 4.4

Regards,
Leffe

Is this R28.1? Have you started out in max performance mode (e.g., “sudo ~ubuntu/jetson_clocks.sh”)?

Unfortunately this really needs profiling and some of the standard profiling tools are not available for this kernel on this architecture.

However, as background, every piece of hardware is run by its driver after a hardware interrupt. On this architecture only CPU0 can service a hardware IRQ. This means that USB transfers run when CPU0 is available to run the driver. The rest of the time is either idle or servicing something else unless some specialized wiring exists (e.g., the memory controller can be reached by all cores). To know specifically what is going on you would have to know if the delay could be avoided, versus if some other driver is currently running another driver.

As a demonstration you might want to create this file…I named mine “showCPU0irq.sh”…and set it to executable:

#!/bin/bash

cat "/proc/interrupts" | egrep -v '^....          0 '

Then you can run “watch -n 1 showCPU0irq.sh” and see as irq calls build up.

I’ll see if I can attach the files once this is posted, but what I did was run some irq info before starting, ran “cat /dev/sda1 > /dev/null” (sda1 is a USB3 drive 36GB partition), followed by gathering information again:

sudo -s
cat /proc/stat | tee irqlog1.txt
showCPU0irq.sh | tee -a irqlog1.txt
cat /dev/sda1 > /dev/null
cat /proc/stat | tee irqlog2.txt
showCPU0irq.sh | tee -a irqlog2.txt
exit

There is exactly one line which corresponds to xhci-hcd-usb1, so this is the irq of interest. Everything else is something competing for CPU0 (software interrupts can also run on CPU0, but unless they are forced to do so they can migrate to other cores as determined by the scheduler). This case shows a massive increase in the usb irq compared to others.

To actually make this complete one would have to know how much time is spent to service each IRQ in order to find out if one is hogging any indivdual interrupt call…it isn’t bad if there are a lot of calls and they are quickly serviced…but a single call which hangs for a long time implies a laggy resonse in the system even if averages are good. The next question would if there was time which CPU0 spent idle when it could instead service an interrupt…I doubt this is the case, most likely it is just interrupt starvation and one interrupt waiting on another to complete.
irqlog1.txt (4.86 KB)
irqlog2.txt (4.87 KB)