TX2 USB3 controller bug

gotenkscha, USB3 hub in question is D-Link DUB-1340. Probably any USB3 hub will do, but I haven’t checked that.

DaneLLL, to clarify regarding your question “All XIMEA cameras with 2 bulk endpoints do not work and all with 1 bulk endpoint work on TX2?” - while all XIMEA cameras have 2 bulk endpoints in USB configuration, image acquisition works fine on them if there is no communication on control channel (second endpoint) at the same time as data streaming on data channel (first endpoint). So if only one endpoint is used at any time camera works with TX2.

If you have a protocol analyzer which can show bulk transfer pause in the control channel can you see if the case of working has non-halting transfer versus control pausing and then restarting? Bulk transfer does not guarantee the control signal will go through in a single transfer, but more than likely it does in most cases. I am very curious if a partial transfer and then restart occurs during a failure, and especially if the control signal is buffered sufficiently to not drop anything during a pause and restart.

linuxdev, I assume you mean NRDY and ERDY packets? As seen on screenshot I attached to the first post, no NRDY is seen around the time of failure on second (control) endpoint. Or do you think pausing of first (data) endpoint can affect the issue?

I think any pausing can cause an error if pausing is not handled correctly. The sender of the data and the receiver of the data may pause at different moments from a single event (sender and receiver have different stream positions and limited buffers), in which case they have to track what the restart point is at each end when transmission continues. Imagine a hard drive sending data to the host, and the host sends a command causing pause because something else needs the USB controller…it may be a moment before the drive actually pauses, and the host would have started discarding data immediately…the drive would need the ability to restart sending at the point where the host itself started ignoring the drive, not at the moment the drive received the message to stop sending.

Take this a step further, since I don’t know how your large commands are used, does a command ever start doing something if the entire command is not received? Is an entire command atomic?

If the control is not atomic, and if a command were to pause, does the host and camera have sufficient buffer that nothing is truncated and such that restart occurs at the correct stream position?

I don’t know that any command pausing and restarting actually has occurred, and if it hasn’t, then the question has no point. If a command is interrupted it is possible that this would cause an issue, and that this would only occur on a system which halts the command and restarts it, which in turn isn’t really predictable. It just seems it is something which needs to be checked when using bulk for controlling a device and it works on some systems but not others.

I don’t think the problem you described exists in USB protocol…

Hi,
We shall have it fixed in next release(after r28.2 DP).

Any more details on the fix?

Is it something (e.g. a driver) that we can update on deployed hardware where flashing an entire image isn’t feasible?

Hi AaronL, if you use XIMEA cameras, could you contact with parafin to get help?

We do use a Ximea camera, but I’m confused, as you said this would be fixed after r28.2, implying it was a Nvidia change that needed to happen?

@AaronL
Hi Parafin, who is from Ximea gave me a file from Nvidia to use which he says “alleviates the problem.”
I tried his firmware patch, but it still produces the same error after 20 minutes. If you write down your email, I could send this patch to you as well.

@DaneLLL
The Nvidia firmware patch you guys sent to Parafin does not work. Please make sure that the post r28.2 DP update has a functioning usb driver for the Ximea cameras.

Nvidia engineers provided me with an updated firmware for TX2 USB controller, it will be included in next L4T release or you can get it now by creating a support ticket on XIMEA website or writing to me. Firmware can be replaced without reflashing. In our internal testing this update seems to fix original problem, we haven’t been able to reproduce it anymore.
We are looking into gotenkscha issue, as of now it’s too early to tell what’s causing it.

I just want to point out that L4T R28.2 released recently contains new firmware, so you can extract it by yourself now from https://developer.nvidia.com/embedded/dlc/l4t-jetson-tx2-driver-package-28-2-ga

tar -xf Tegra186_Linux_R28.2.0_aarch64.tbz2
tar -xf Linux_for_Tegra/nv_tegra/nvidia_drivers.tbz2 lib/firmware/tegra18x_xusb_firmware

lib/firmware/tegra18x_xusb_firmware file then has to be placed in /lib/firmware/ directory on TX2 filesystem overwriting older firmware. Reboot Jetson for changes to take effect.