Dram alias check failure on D00 revision TX2 SOM and custom carrier board

danwalkes1 · October 31, 2020, 10:47pm

Hi,

We have developed a custom carrier board for Jetson TX2 and have used it for multiple iterations of TX2 SOMs. On recent batches of SOMs we’ve ordered we’ve noticed very specific and unusual behavior, and have reproduced on multiple SOMs.

The devices initially program without issue on our carrier board.
After program, the devices crash with kernel oops attempting to access the camera sensor.
Attempting to reprogram in our carrier fails with the same message discussed in the thread at MTS error (2) : dram alias check failure on boot - #4 by WayneWWW

[0001.811] C> MTS error (2) : dram alias check failure
[0001.816] C> cpu waypoint 0.5 failed
[0001.820] C> ERROR: Highest Layer Module = 0x32, Lowest Layer Module = 0x32,
Aux Info = 0x1, Reason = 0x6

Attempting to reprogram in a TX2 carrier board works without issue.
Taking the device out of the carrier board after reprogramming successfully and placing in our mainboard results in the same CBoot error message, and the part refuses to boot,

[0001.811] C> MTS error (2) : dram alias check failure
[0001.816] C> cpu waypoint 0.5 failed
[0001.820] C> ERROR: Highest Layer Module = 0x32, Lowest Layer Module = 0x32,
Aux Info = 0x1, Reason = 0x6

The SOM boots fine when plugged into a Carrier board.

So the problem appears to be related to our custom carrier board.

One thing we suspected was the auto power on notice about delay in asserting CHARGER_PRSNT low as discussed in Power-on Autostart - #4 by Trumany. We don’t have this delay implemented in our hardware, we are simply shorting CHARGER_PRSNT low. However, the content of that post makes it seem like the issues would occur on shutdown and would not explain the observations here. We’ve also tried shorting CHARGER_PRSNT low on the development board and we can’t reproduce boot issues in this case.

We are wondering if you can share any detail about changes in the SOM in recent revisions which may explain this behavior, especially related to revision 699-83310-1000-D00 M, or where we should look on the carrier board design to explain the dram failure above.

WayneWWW · November 1, 2020, 6:18am

Hi,

What is your release?

And what does that mean the board is initially working after “programmed”? What is programmed here?

danwalkes1 · November 1, 2020, 12:51pm

Hi @WayneWWW
Thanks for the response

What is programmed here?

Tegraflashed

What is your release?

The first tegraflash was actually L4T 28.2.1 given the way our production process is setup. All subsequent tegraflash reprogram attempts were L4T 32.4.3, including completed and failed attempts across our carrier board and the TX2 dev kit hardware.

I believe we’ve also attempted unsuccessfully to take a SOM which was successfully tegraflashed with L4T 32.4.3 on a TX2 development board and re-run tegraflash with our carrier board to the same L4T 32.4.3 release. I can re-verify this.

WayneWWW · November 1, 2020, 4:32pm

Hi,

May I get a more clear test result here? Looks like We have rel-28/rel-32, devkit/custom carrier board and D00 /non-D00 modules.

Could you show me a table that marks all the test result (pass/fail) here with all the combinations you’ve tested? Not very sure about them just by reading the description so far.

danwalkes1 · November 1, 2020, 4:43pm

Hi @WayneWWW
Here’s the table:

WayneWWW · November 1, 2020, 4:49pm

Hi,

But the custom carrier board is working fine with older revision of SOM, right?

danwalkes1 · November 1, 2020, 6:47pm

Correct. We have noticed B rev SOMs which work fine on the same carrier board in at least one instance.

WayneWWW · November 2, 2020, 3:29am

Hello,

May I get your full log from uart?

danwalkes1 · November 2, 2020, 4:03am

@WayneWWW here’s a log from tegraflash:

[0000.631] I> Loading SCE-FW ...
[0000.634] W> No valid slot number is found in scratch register
[0000.640] W> Return default slot: _a
[0000.643] I> A/B: bin_type (12) slot 0
[0000.647] I> Loading partition sce-fw at 0xd7300000
[0000.652] I> Reading two headers - addr:0xd7300000 blocks:1
[0000.657] I> Addr: 0xd7300000, start-block: 5904752, num_blocks: 1
[0000.666] I> Binary(12) of size 76592 is loaded @ 0xd7300000
[0000.672] I> Init SCE
[0000.674] I> Copy BTCM section
[0000.677] W> No valid slot number is found in scratch register
[0000.683] W> Return default slot: _a
[0000.686] I> A/B: bin_type (13) slot 0
[0000.690] I> Loading partition cpu-bootloader at 0x96000000
[0000.695] I> Reading two headers - addr:0x96000000 blocks:1
[0000.701] I> Addr: 0x96000000, start-block: 5879856, num_blocks: 1
[0000.713] I> Binary(13) of size 282736 is loaded @ 0x96000000
[0000.719] W> No valid slot number is found in scratch register
[0000.725] W> Return default slot: _a
[0000.728] I> A/B: bin_type (20) slot 0
[0000.732] I> Loading partition bootloader-dtb at 0x8520f400
[0000.737] I> Reading two headers - addr:0x8520f400 blocks:1
[0000.743] I> Addr: 0x8520f400, start-block: 5881904, num_blocks: 1
[0101.875] E> Waypoint-0.5 ACK pending: 0x8
[0101.879] C> MTS error (2) : dram alias check failure
[0101.884] C> cpu waypoint 0.5 failed
[0101.887] C> ERROR: Highest Layer Module = 0x32, Lowest Layer Module = 0x32,
Aux Info = 0x1, Reason = 0x6

Here’s a log from boot:

[0001.811] C> MTS error (2) : dram alias check failure
[0001.816] C> cpu waypoint 0.5 failed
[0001.820] C> ERROR: Highest Layer Module = 0x32, Lowest Layer Module = 0x32,
Aux Info = 0x1, Reason = 0x6

WayneWWW · November 3, 2020, 3:27am

Hello,

Sorry for one more request. Could you post the whole flash log from host side?

danwalkes1 · November 3, 2020, 4:36am

I don’t have the full host side log but I can get it tomorrow. Here’s where it hangs:

[   6.8878 ] Sending bootloader and pre-requisite binaries
[   6.8892 ] tegrarcm_v2 --download blob blob.bin
[   6.8903 ] Applet version 01.00.0000
[   6.9113 ] Sending blob
[   6.9113 ] [................................................] 100%
[   7.3700 ] 
[   7.3732 ] tegrarcm_v2 --boot recovery
[   7.3756 ] Applet version 01.00.0000
[   7.3801 ] 
[   8.3838 ] tegrarcm_v2 --isapplet
[   9.1352 ] 
[   9.1390 ] tegradevflash_v2 --iscpubl
[   9.1401 ] Cannot Open USB
[   9.6000 ] 
[  10.6039 ] tegrarcm_v2 --isapplet

I should mention we are using meta-tegra so this is invoked from tegra186-flash-helper.sh. We are planning to compare the behavior with stock nvidia L4T programming tomorrow.

One other interesting observation from today, we are able to tegraflash from a non-booting R32 configuration back to R28 on our carrier board, so it appears to be something about the combination of the tegraflash step for R32.4.3 and our carrier board setup.

WayneWWW · November 3, 2020, 4:53am

Hi,

Please just use pure flash.sh. Do not use something like meta-tegra. They are not our official tool so we cannot guarantee their functionality.

danwalkes1 · November 3, 2020, 9:45pm

I’ve reproduced the same thing with the NVIDIA SDK manager and release 32.4.3, so it’s not meta-tegra specific

Serial logs:

E> Waypoint-0.5 ACK pending: 0x8                                                    
[0247.902] C> MTS error (2) : dram alias check failure                                                             
[0247.907] C> cpu waypoint 0.5 failed                                                                              
[0247.911] C> ERROR: Highest Layer Module = 0x32, Lowest Layer Module = 0x32,                                      
Aux Info = 0x1, Reason = 0x6

Host logs:

I’ve also noticed what looks like a similar issue here which looks like it describes what I’m seeing, however unfortunately I wasn’t able to boot even after disconnecting the USB cable connection.

danwalkes1 · November 3, 2020, 10:43pm

Same observation as JetPack 4.4 will not flash on custom board - #24 by chadiris regarding Jetpack 4.3. If I use the SDK manager with Jetpack 4.3 instead of 4.4 everything works. So the problem appears to be related to changes in the flash update tools for Jetpack 4.4 combined with something about differences between hardware other than the dev board hardware.

Back to my earlier question:

We are wondering if you can share any detail about changes in the SOM in recent revisions which may explain this behavior, especially related to revision 699-83310-1000-D00 M, or where we should look on the carrier board design to explain the dram failure above.

Any suggestions for us as to where to look? Just trying to narrow down the pin scope to something less than 400. My first suspicion was power related pins, however I’m not seeing anything that is jumping out at me as a difference comparing to the dev board and I’ve removed our power supply from the equation by powering from a benchtop supply.

WayneWWW · November 4, 2020, 2:49am

Hello danwalkes1,

Here is my summary so far. Please confirm if it is right.

D00 module + devkit + jp4.4 → Good
D00 module + custom carrier + jp4.4 → NG
Non- D00 module + custom carrier + jp 4.4 → Good
D00 module + customer carrier + jp4.3 → Good

Are you able to switch some binaries/scripts from rel-32.3.1 to rel-32.4.3 and see if it can make it work? For example, flash.sh or nvtboot_recovery.bin.

danwalkes1 · November 4, 2020, 3:50am

Yes, that’s right. For “Non- D00 module” the B rev SOM is the only one we’ve tried so far.

Are you able to switch some binaries/scripts from rel-32.3.1 to rel-32.4.3 and see if it can make it work? For example, flash.sh or nvtboot_recovery.bin.

Thanks for the suggestion. I’ve prepared a branch at GitHub - sighthoundinc/meta-tegra at flashtools-32.3.1-hacks which I intend to test for this, which rolls back tegra186-flashtools-native to 32.3.1.

danwalkes1 · November 4, 2020, 5:20am

Are you able to switch some binaries/scripts from rel-32.3.1 to rel-32.4.3 and see if it can make it work? For example, flash.sh or nvtboot_recovery.bin.

No change when rolling both of these back to 32.3.1

Also no change when rolling any of the tegra186-flashtools-native back to 32.3.1

Since the error mentions dram I looked for dram references and noticed this difference between JP 4.4 and JP 4.3:

dan@yocto:/build/nvidia/nvidia_sdk/JetPack_4.4_Linux_JETSON_TX2/Linux_for_Tegra$ sudo find . -name "dram-*"
./bootloader/8755/dram-ecc_sigheader.bin.hash
./bootloader/8755/dram-ecc_sigheader.bin.encrypt
./bootloader/8755/dram-ecc.bin
./bootloader/8755/dram-ecc_sigheader.bin
./bootloader/dram-ecc.bin
dan@yocto:/build/nvidia/nvidia_sdk/JetPack_4.4_Linux_JETSON_TX2/Linux_for_Tegra$ sudo find ../../JetPack_4.3_Linux_JETSON_TX2/ -name "dram-*"
../../JetPack_4.3_Linux_JETSON_TX2/Linux_for_Tegra/bootloader/dram-ecc.bin

Copying dram-ecc.bin from JP 4.3 doesn’t help either.

In all cases above I still get the same

[0070.255] E> Waypoint-0.5 ACK pending: 0x8                                                                                                           
[0070.259] C> MTS error (2) : dram alias check failure                                                                                                
[0070.264] C> cpu waypoint 0.5 failed                                                                                                                 
[0070.267] C> ERROR: Highest Layer Module = 0x32, Lowest Layer Module = 0x32,                                                                         
Aux Info = 0x1, Reason = 0x6

If I understand the cboot source correctly and line 116 of bootloader/partner/common/include/tegrabl_error.h Reason 0x6 is #define TEGRABL_ERR_TIMEOUT 0x06U and the 0x32 module refers to #define TEGRABL_ERR_CPUINIT 0x32U.

WayneWWW · November 4, 2020, 6:14am

Hi,

Just for some test

Is your board +D00 module able to be flashed on rel-28.2.1 and rel-28.4?

Bibek · November 4, 2020, 12:29pm

Have you made any changes to mem bct and bpmp dtb file?
dram failure could point to a different DRAM used in D00 which you have not accounted for.
Random kernel oops could also be due to memory not stable.
What is the voltage you are supplying, 19v?

danwalkes1 · November 4, 2020, 2:55pm

Is your board +D00 module able to be flashed on rel-28.2.1 and rel-28.4?

We’ve only tried 28.2.1, 32.3.1 and 32.4.3. It works on 28.2.1, 32.2.3, fails on 32.4.3.

No, and in all of the SDK Manager cases I’m not making any changes to the stock L4T flashing binaries or sequence (other than what I’ve listed above as troubleshooting steps).

What is the voltage you are supplying, 19v?

We are supplying 12V on VDD_IN

Topic		Replies	Views
Boot up issues with custom carrier Jetson TX2	9	1829	October 18, 2021
Flashing issue on customize device with TX2 4GB som module Jetson TX2 reflash	25	1619	October 18, 2021
TX2 Module Failure? Jetson TX2	1	1315	September 23, 2017
Can't flash TX2 on a Jetson TX1 board Jetson TX2	8	1117	October 18, 2021
Flash error using Jetson TX2 with orbitty carrier Jetson TX2	9	1270	October 18, 2021
Flashing error - ERR: unsupported board revision: D01 Jetson TX2	13	1425	October 18, 2021
R24 won't boot on newer TX1 Jetson TX1	7	901	October 18, 2021
Flashing L4T R27.1 in TX2: UnicodeDecodeError : Reading board information failed Jetson TX2	28	4436	October 18, 2021
JetPack 4.4 will not flash on custom board Jetson TX2 reflash	21	1841	October 18, 2021
TX2 flash Jetson TX2 reflash	40	1432	September 14, 2022

Dram alias check failure on D00 revision TX2 SOM and custom carrier board

Related topics