Jetson - intermittent usb protocol error (71)

Hello,

I’m having an issue that makes me want to blame hardware, but before I do I am hoping someone here may be able to shed some light.

I have an Oculus DK1 headset plugged into the main USB port on my Jetson. I read the headset telemetry directly and bypass the Oculus software… which is all fine, but… when I plug the headset in, the kernel sometimes logs:

“usb 2-1: device descriptor read/64, error -71” (and more)

to rsyslogd. If I plug and unplug the headset repeatedly, it will work occasionally (i.e. the kernel recognizes it and subsequent io works fine). I have not been able to deduce pattern in repetition, power, or timing. fail,fail,fail,fail,work,fail,fail,fail,fail,fail,work,fail,work,work etc.

If I plug in a D-Link USB hub into the Jetson instead, then plug the Oculus into the hub, it correctly recognizes the plug/unplug every time.

I’m guessing a perhaps a capacitance or resistance issue? or could this be software related in some way?

I should add that the Oculus works fine on any of several ubuntu desktops, and that a second
Oculus DK1 exhibits the same problem on the Jetson.

Any light on the issue would be appreciated,

thanks

FYI, I’m very interested Oculus with Jetson. This could be a whole new era for embedded with the kind of computing power from Jetson…sadly I’ve avoided picking one up though for practical reasons.

The first thing I suspect in a case like this would be something I’ve seen cured many times by the unexpected trick of using a POWERED USB HUB. Without a powered HUB you are pulling current from the Jetson itself. Many many devices out there unknowingly violate power consumption or power provision requirements, even if only for a moment. I don’t think the Jetson’s USB has had extensive testing with higher power consumption devices, but a typical user only connects a keyboard and mouse which don’t draw a lot of power (though a laser mouse draws more than people might expect). Try using a powered HUB to isolate the VR headset from the Jetson’s power bus.

Hope you post on how Oculus works through Jetson.

Thanks for the suggestion.

We have some custom hardware we put together being driven by the Jetson, so we are pretty aware of power management issues. In this case, the Oculus does work when plugged into a hub, and that is true whether the hub is powered or not. It exhibits the apparent non-deterministic behavior only when it is plugged directly into the Jetson. In both cases the Oculus is running from its own power supply, separate from the battery pack we use to run the Jetson.

To your other point, we first ported the Oculus library, but then jettisoned it due to the overbearing nature of what it has become, and due to their licensing restrictions. We rolled our own lightweight C interface using libusb, returning the Oculus to the role of a display device where we feel it belongs. The DK2 has some EDID issue(s) making it unusable so far - it sends an extension block which causes issues with the kernel on the Jetson, so until that is solved we have reverted to the DK1.

We are mobile, with about 1.5 hours of runtime while walking around inside a 1million+ vertex scene, textured mapped and lit. We gather telemetry from accelerometers, gyros, a gps, a camera, and input devices and still, in the most graphically dense regions of the landscape, we still maintain > 60fps. So, after solving a zillion issues and getting the power, heat, and size under control, it works pretty well.

About the EDID…there was a case not too long ago where a monitor provided EDID data, but it was not parsed correctly…I believe it was just some extension of some sort. The guy with that monitor saw a note when “get-edid | parse-edid” could not correctly parse it…the maintainer of the EDID software was emailed and he updated and that monitor now works. I suppose dual monitor for stereo changes things. What is the output from:

get-edid | edid-decode

and

get-edid | parse-edid

?

Also, is it possible to see the output for the Oculus for “lsusb -v”?

One stray thought about the USB working via HUB but having some form of failure when directly connected is that the resets may have the same effect with the HUB but the HUB may not pass that down to the root HUB…it might be the case that they are both doing the same thing but one is hiding it from the other.

We already emailed the same guy. He burned his name into the get-edid source. We
haven’t heard back though.

I read the get-edid source, compiled and ran it, and was just getting to understand the issue when I had
to put it away. We have a schedule, and someone waiting for the gadget, so I could not spend any more time on it. It could only deal with a certain type of extension block, and the “email me” comment was triggered when it saw a different type.

I’m away from the lab and so don’t have the exact EDID block contents, but it all made sense, ditto the lsusb info. I tried xrandr with the EDID data as delivered, but the DK2 headset goes into an error state after it sends the EDID block. I don’t know the full details of the kernel/monitor conversation though, so I might be missing something.

Could you elaborate on this?

the resets may have the same effect with the HUB but the HUB may not pass that down to the root HUB

The kernel reports a read error (71, I think strerror() saya “protocol error”). How would/could the hub
hide this?

(It’s a single monitor, with a barrier in the middle, and lenses to let each eye see 1/2 of it. You undo the lens distortion in software in the fragment shader.)

It was just a wild guess (emphasis wild guess because I have no access to the Oculus, nor do I have a USB analyzer), but you already know that the HUB changes the error (or at least the message). Normally there is a root HUB and the USB signalling during a reset event would be directly controlled by the timing from the root HUB (to see this you would probably need a USB analyzier or a modified root HUB driver). When using a HUB the devices are no longer truly connected to the root HUB except via other HUB’s hardware and firmware (which controls some aspects of timing and may have some limited form of dealing with errors and marginal signals). A big example of how a HUB can change things is if they have multiple transaction translators (TT) or single TT…with multiple TT timings may shift waiting for other devices, but in single TT speed can be forced back significantly by a single slower device (which means using a HUB or not changes things, but using multiple devices on the HUB might change things even more). Probably won’t change the issue, but it would be important to know if the HUB which works around the issue is single- or multi-TT (lsusb -v). Also, the root HUB itself may or may not have multi-TT, although I’d hope all modern motherboards are multi.

When you have a non-root HUB connected to a root HUB the reset is actually relayed by firmware in the non-root HUB, so timings differ (controlled mostly by a single dedicated ASIC with no programmable options, but it is still a program). The non-root HUB has a driver of sorts (even if it is in a bare metal environment) and one of two types of errors are of interest: Either logic errors which violate standards but get passed through unmodified, or else timing and level errors which are marginal. In the case of the former it would be obvious and the non-root HUB would pass this information on either by sending a copy of the same bad logic or else by sending its own interpretation of that logic (in which case the error would be the same or slightly altered but still an error). In the latter level/timing case the ASIC of the non-root HUB would be the actual output seen by the Jetson, and the levels and timings would be via the ASIC in the HUB rather than from the Oculus (perhaps the Oculus does not respond as fast to resets but the HUB acts as a sort of buffer on timings)…if something is lost or gained via timings prior to this it might be called a protocol error even though it was really a signal issue.

What would be interesting is to use a debug version of the linux kernel and put a break point on the error and see what leads up to it; see if it is missing data which is not missing under a HUB…indicating perhaps timing and level issues fixed by the HUB buffering timings and levels.