PCIe WiFi card

I am trying to install a WiFi/BT card into the mini PCI-e slot. I have two cards available to me:

Intel 6250, and Atheros AR9462.

If I understand correctly, I need to compile the kernel module for AR9462 since it is not on the original kernel that come with the TK1 board.

Seeing that there are kernel module for the Intel WiFi card, I wanted to try to use that instead. However, the TK1 board refused to boot after inserting the Intel card into the PCIe board. I understand that the Intel card was not listed as a tested device on the wiki page. Anyone has any experiences with this or know why the TK1 board refused to boot up ?

Thank you very much for your time.

So long as it is electrically ok, this should not stop boot. Do you have a serial console log during boot failure?

The errors that I see is:

PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id-0000(Requester ID)
device [10de:0e12] error status/mask=00004000/00000000
[14] Completion Timeout   (First)

These are the last lines from the screen and is stuck there for a good 15 mins or so until I powered the board down.

It sounds like the card in question has an electrical fault. Can you test on any other device with mini-PCIe?

Sometimes re-seating a card helps, sometimes the mounting screws (#2-56) hit something (and if not the correct thread, a nearly invisible metal shaving can touch something it shouldn’t).

I’m also seeing this error.

We’ve got several TK1s running Kinects through mini-PCIe USB 3.0 cards. This error happens about 10% of the time a few seconds after the Kinect starts, and makes the TK1 completely unresponsive.

I don’t think it is caused by any defect in the PCIe card (unless they all have the same problem).

Does anyone have any tips on trying to resolve an issue like this?

I don’t know that it’s connected, but when the Kinect is running I’m also getting a lot of:

mc-err: [mcerr] (vde) csr_vdebsevr: EMEM decode error on PDE or PTE entry                                               
mc-err: [mcerr]   status = 0x60000022; addr = 0x801d2000                                                                
mc-err: [mcerr]   secure: no, access-type: read, SMMU fault: nr-nw-s

This error does not seem to affect the performance.

If it helps, here is the full output (serial console crashes while printing):

[ 5572.722225] irq event 643: bogus return value ffffff94                                                                              
[ 5572.727771] handlers:                                                                                                               
[ 5572.730050] [<c04fc088>] xhci_msi_irq                                                                                               
[ 5572.734382] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)  
[ 5572.747874] pcieport 0000:00:00.0:   device [10de:0e12] error status/mask=00004000/00000000                                         
[ 5572.757555] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)                                                           
[ 5573.663979] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Request

EDIT: I was typing this in before the xhci message showed up. What follows would still be useful, but it does show the error is PCI despite being detected while running USB code. I couldn’t say why, but there is a failure detected in the PCI data.

Does any kind of OOPS message get generated in dmesg, or perhaps in a log available after reboot?

Unless the lockup is rather invasive, magic sysrq keys should be available and might be able to add information. One method of triggering sysrq is by a directly connected keyboard, a second method is to echo through serial console to “/proc/sysrq-trigger”. See:
https://en.wikipedia.org/wiki/Magic_SysRq_key

An example you can run without harm on a working system is to type at a keyboard directly connected to a jetson: “alt-sysrq-l” (where “sysrq” is the shift of the printscreen button on most keyboards).

An alternate way with no direct keyboard is to go into serial cosole and “sudo echo ‘l’ > /proc/sysrq-trigger”. Note that the letter ‘l’ shows a backtrace for where CPUs are. If the system is sufficiently alive, you’ll either be able to see the backtrace from serial console via dmesg, or else a log will survive and upon reboot the “prior” log (since logs rotate) will have the information.

So far as mini-PCIe cards go, there is both access to PCI and to a USB data line. To know which this card uses, does the mPCIe card show up on lspci, or does it show up on lsusb? Do you have a link to the card’s technical specs anywhere? I suspect it has to use PCIe since it is USB3 (it would be ridiculous to build a card using a USB2 lane for USB3 connectivity). I do not have a Kinect, but can it be confirmed the Kinect has its own power supply?

Apologies for not including the full error message in the first post. I nonetheless really appreciate the thorough response!

You’re right that the card uses the PCI line. I’m not able to find a specification for it, though. It is the Syba SD-MPE20142. Their website only has dead links for product info :/

The Kinect does have its own power supply. The mPCIe card plugs into the molex on the board, but I don’t think it’s pulling power through it for the Kinect.

Thanks for the info about magic sysrq keys. I didn’t know that!

Looking a bit more though the logs, the only other thing I can find that might be related is the following warning, of which there are a lot preceding a crash:

[ 5528.807360] xhci_hcd 0000:01:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENG?
[ 5528.807373] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 3 ep 2 with no TDs queued?

It really sounds like the PCIe lane is losing data, or data is corrupted. I think it is just reported by xhci and not caused by xhci. It’s hard to say for sure without actually sticking a protocol analyzer on it, or at least a quality scope to see if the eye pattern is bad in USB data or PCIe data.

Is it possible you could try with the Kinect attached to the full-size USB connector (you’d have to be sure it is in USB3 mode since default is only for USB2)? Or the reverse, some other USB3 device which runs a lot of data connected to the mPCIe USB3 card? I’m very suspicious this will follow mPCIe and not other USB3 connectors.

Also, double-check that the mPCIe card is properly seated and that any mounting screws are not loose or over-tightened (if this were a commercial assembly line under IPC the screws would probably be specified at 2 to 4 inch-pounds).

The mPCIe card is full length so it hangs off the side of the board. I think we have them mounted pretty good, though, but I will take another look at it.

I’ll test some other configurations like you suggest and post if I figure anything out. Thanks again!

Out of curiosity, are the mounting screws for the half length slot provided mounting screws at the correct location of the full length card? I suspect grounding through this might be needed.