Jetson AGX Xavier Standoffs - Connection Problem

Dear NVidia Team

According to the newest OEM Design Guide, the preferred standoffs should be 4.5mm and not 4.0mm as before. Why did you change them?
We tested our own Jetson AGX Carrier Board in a climate chamber with temperatures between -25°C and +65°C. In multiple systems we were able to see that for negative temperatures, some of the interfaces stopped working. Especially we could see this for a PCIe Device (I210) connected to UPHY8. The device was not recognized at all with the message PCIe link down. All these systems used 4.5mm standoffs to the Xavier Module. Changing the Standoffs to lower values (3.5mm) so that the Molex Connector 2034560003 was well connected, the behavior disappeared. Do you know of any contact problems with 4.5mm standoffs?
According to Molex, there should be standoffs within 30.00mm on each corner of the connector. This is not the case and we suspect that it leads to a problem on our side with the PCB as it is less stiff than the Xavier Module itself. Any suggestions how to solve this?
Thank you for your help.

Kind regards

The recommended height of standoff is 4.5mm in latest and last several versions OEM DG. And as said in it, “The platform designer can determine if a different height would be more appropriate but should consider both the PCB warpage (standoff height too short) and sweep range to find the best balance.”, the real standoff is 4.37 - 4.67 mm for connector 2034560003.

Hi Trumany

Thank you for the answer. What can you say about the 30mm distance from the Molex Connector which can be found in chapter 2.6 of the document:

On our side of the carrier board, we are not able to fulfill this criteria by molex due to missing mounting holes.

Is it a coincidence that in the newest OEM Design guide the chapter 3.4 “Module Installation and Removal” was added or is it due to known cases where an inappropriate installation of the Xavier lead to problems?

The App note you attached recommends standoffs within 30mm from every corner of the connector, but two sentences down, it says “The number of standoffs can be reduced when the board-to-board system is deemed stable enough." Since the Xavier module has the rigid TTP and bottom plate, that may meet this requirement.

And no, the addition of more information in the OEM DG related to mounting and removal was not due to issues, but partly due to requests from customers for this information and partly to ensure the standoffs were not too tall or too short which could cause PCB warpage when the mounting screws were tightened.

Hi Trumany
We did further testing with our customized carrier board. We can see that when we put some pressure on our PCB above the molex connector, we get PCIe Bus Errors and link lost for the PCIe Device and also USB device. We think that this is the same effect as when we do thermal cyclings where we see this behavior at negative temperatures. Is seems like a contact problem of the Molex Connector or that the contacts react to movement of the carrier board.
To verifiy that the problem is not our carrier board, we tried the same with the NVidia Developer Kit Carrier Board and a Xavier module with only 4mm standoffs. Pressing on the Carrier Board around the NVidia Logo leads to PCIe Bus Errors like:

[ 77.610240] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 77.610287] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0000(Receiver ID)
[ 77.610530] pcieport 0001:00:00.0: device [10de:1ad2] error status/mask=00000001/0000e000
[ 77.610890] pcieport 0001:00:00.0: [ 0] Receiver Error (First)
[ 77.611036] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 77.611060] pcieport 0001:00:00.0: can’t find device of ID0000
[ 77.614714] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 77.614809] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0000(Receiver ID)
[ 77.615049] pcieport 0001:00:00.0: device [10de:1ad2] error status/mask=00000081/0000e000
[ 77.615220] pcieport 0001:00:00.0: [ 0] Receiver Error (First)
[ 77.615358] pcieport 0001:00:00.0: [ 7] Bad DLLP
[ 77.615491] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 77.666722] pcieport 0001:00:00.0: can’t find device of ID0000
[ 77.666736] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 77.666756] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0000(Transmitter ID)
[ 77.667042] pcieport 0001:00:00.0: device [10de:1ad2] error status/mask=00009000/0000e000
[ 77.667209] pcieport 0001:00:00.0: [12] Replay Timer Timeout
[ 77.667345] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 77.667359] pcieport 0001:00:00.0: can’t find device of ID0000
[ 77.667366] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 77.667379] pcieport 0001:00:00.0: can’t find device of ID0000
[ 77.667385] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 77.667398] pcieport 0001:00:00.0: can’t find device of ID0000
[ 77.667405] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 77.667418] pcieport 0001:00:00.0: can’t find device of ID0000
[ 77.667425] pcieport 0001:00:00.0: AER: Multiple Uncorrected (Fatal) error received: id=0000
[ 77.667451] pcieport 0001:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0000(Requester ID)
[ 77.667666] pcieport 0001:00:00.0: device [10de:1ad2] error status/mask=00004020/00400000
[ 77.667845] pcieport 0001:00:00.0: [ 5] Surprise Down Error (First)
[ 77.667984] pcieport 0001:00:00.0: [14] Completion Timeout
[ 77.668117] pcieport 0001:00:00.0: broadcast error_detected message
[ 77.668128] ahci 0001:01:00.0: device has no AER-aware driver
[ 78.706678] pcieport 0001:00:00.0: Root Port link has been reset
[ 78.706712] pcieport 0001:00:00.0: AER: Device recovery failed
[ 174.236729] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 174.236759] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0000(Receiver ID)
[ 174.237062] pcieport 0001:00:00.0: device [10de:1ad2] error status/mask=00000001/0000e000
[ 174.237221] pcieport 0001:00:00.0: [ 0] Receiver Error
[ 174.240683] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 174.240777] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0000(Receiver ID)
[ 174.241002] pcieport 0001:00:00.0: device [10de:1ad2] error status/mask=00000001/0000e000
[ 174.241190] pcieport 0001:00:00.0: [ 0] Receiver Error
[ 174.241353] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 174.241391] pcieport 0001:00:00.0: can’t find device of ID0000
[ 174.241398] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 174.241419] pcieport 0001:00:00.0: can’t find device of ID0000
[ 174.243450] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 174.243484] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0000(Receiver ID)
[ 174.243814] pcieport 0001:00:00.0: device [10de:1ad2] error status/mask=00000001/0000e000
[ 174.243976] pcieport 0001:00:00.0: [ 0] Receiver Error
[ 174.244160] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 174.244198] pcieport 0001:00:00.0: can’t find device of ID0000
[ 174.244961] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 174.245069] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0000(Receiver ID)
[ 174.245310] pcieport 0001:00:00.0: device [10de:1ad2] error status/mask=00000001/0000e000
[ 174.245542] pcieport 0001:00:00.0: [ 0] Receiver Error
[ 174.245698] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 174.245734] pcieport 0001:00:00.0: can’t find device of ID0000
[ 174.245797] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 174.245812] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0000(Receiver ID)
[ 174.254491] pcieport 0001:00:00.0: device [10de:1ad2] error status/mask=00000001/0000e000
[ 174.262889] pcieport 0001:00:00.0: [ 0] Receiver Error
[ 174.268998] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 174.269038] pcieport 0001:00:00.0: can’t find device of ID0000
[ 174.269047] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 174.269075] pcieport 0001:00:00.0: can’t find device of ID0000
[ 174.269082] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 174.269109] pcieport 0001:00:00.0: can’t find device of ID0000
[ 174.269115] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 174.269144] pcieport 0001:00:00.0: can’t find device of ID0000
[ 174.269150] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 174.269176] pcieport 0001:00:00.0: can’t find device of ID0000
[ 174.269185] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 174.269234] pcieport 0001:00:00.0: can’t find device of ID0000
[ 247.526789] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 247.526834] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0000(Receiver ID)
[ 247.527164] pcieport 0001:00:00.0: device [10de:1ad2] error status/mask=00000001/0000e000
[ 247.527342] pcieport 0001:00:00.0: [ 0] Receiver Error
[ 247.527481] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 247.527521] pcieport 0001:00:00.0: can’t find device of ID0000
[ 247.527528] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 247.527542] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0000(Receiver ID)
[ 247.527726] pcieport 0001:00:00.0: device [10de:1ad2] error status/mask=00000001/0000e000
[ 247.527877] pcieport 0001:00:00.0: [ 0] Receiver Error
[ 247.527999] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 247.528022] pcieport 0001:00:00.0: can’t find device of ID0000
[ 247.528205] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 247.528239] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0000(Receiver ID)
[ 247.528436] pcieport 0001:00:00.0: device [10de:1ad2] error status/mask=00000001/0000e000
[ 247.528595] pcieport 0001:00:00.0: [ 0] Receiver Error
[ 247.528718] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 247.528739] pcieport 0001:00:00.0: can’t find device of ID0000
[ 247.528746] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 247.528782] pcieport 0001:00:00.0: can’t find device of ID0000
[ 247.528842] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 247.528857] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0000(Receiver ID)
[ 247.529273] pcieport 0001:00:00.0: device [10de:1ad2] error status/mask=00000001/0000e000
[ 247.529916] pcieport 0001:00:00.0: [ 0] Receiver Error
[ 247.551590] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 247.551641] pcieport 0001:00:00.0: can’t find device of ID0000
[ 247.551650] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 247.551672] pcieport 0001:00:00.0: can’t find device of ID0000
[ 247.551680] pcieport 0001:00:00.0: AER: Multiple Corrected error received: id=0000
[ 247.551700] pcieport 0001:00:00.0: can’t find device of ID0000
[ 247.551708] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 247.551728] pcieport 0001:00:00.0: can’t find device of ID0000
[ 247.551735] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 247.551755] pcieport 0001:00:00.0: can’t find device of ID0000
[ 247.551763] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 247.551783] pcieport 0001:00:00.0: can’t find device of ID0000
[ 247.551790] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 247.551810] pcieport 0001:00:00.0: can’t find device of ID0000
[ 247.551817] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 247.551837] pcieport 0001:00:00.0: can’t find device of ID0000
[ 247.551844] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 247.551866] pcieport 0001:00:00.0: can’t find device of ID0000
[ 247.551873] pcieport 0001:00:00.0: AER: Corrected error received: id=0000
[ 247.551892] pcieport 0001:00:00.0: can’t find device of ID0000

We will put the devkit into the temperature chamber over the weekend to check if we see the same behavior as some of our carrier boards as mentioned in the first post. Can you please check at your side if you can reproduce the issue?
Do you have a contact at Molex to forward our problem?
Thank you.

Did you test with 4.5mm standoffs? 4.0mm is not recommended as you know.

Can you share a photo of the whole test environment setting and the detail test setup such as the value of pressure, point of pressing etc.?

We saw the behavior with both the standoffs 4.0mm and 4.5mm. With smaller standoffs, the pressure to get the error must be higher. We think this is due to that the connector is better fixed with smaller standoffs.
Here the pictures:
IMG_4413 IMG_4414

IMG_4415

The value of pressure we can’t tell as we do it by hand.
Thank you.

The hand pressuring is not a standard test way, even it is similar to your board and devkit. Hope to see your temperature test result on dev kit with default standoffs. As you can see in module datasheet, Xavier module had passed “-20°C, 24 hours, operational, low temperature endurance test”.

We know that the hand pressure is not a standard test, we just think that with it we can reproduce the issue we get on our carrier board when doing temperature cycles. The Jetson AGX Xavier is specified to -25°C, did you also do tests at this temperature? As soon as we have results of our test, we will send them to you.

Hi Trumany
In the temperature cycling tests with the Dev Kit, we did not see any PCIe Bus Errors or that a PCIe Device was not recognized. Could you reproduce the error messages with applying pressure to the Developer Kit? We think that even if it is not a standard test, mechanical pressure should not lead to this behavior and the connector should not react to small movements.

Hand pressure might lead not only the contact problem but also the body capacitance/electrical problem, it should not be taken as an evidence. Since devkit can pass your test, we can only suggest to enhance your design referring to devkit.