Dram alias check failure on D00 revision TX2 SOM and custom carrier board

@WayneWWW here’s a log from tegraflash:

[0000.631] I> Loading SCE-FW ...
[0000.634] W> No valid slot number is found in scratch register
[0000.640] W> Return default slot: _a
[0000.643] I> A/B: bin_type (12) slot 0
[0000.647] I> Loading partition sce-fw at 0xd7300000
[0000.652] I> Reading two headers - addr:0xd7300000 blocks:1
[0000.657] I> Addr: 0xd7300000, start-block: 5904752, num_blocks: 1
[0000.666] I> Binary(12) of size 76592 is loaded @ 0xd7300000
[0000.672] I> Init SCE
[0000.674] I> Copy BTCM section
[0000.677] W> No valid slot number is found in scratch register
[0000.683] W> Return default slot: _a
[0000.686] I> A/B: bin_type (13) slot 0
[0000.690] I> Loading partition cpu-bootloader at 0x96000000
[0000.695] I> Reading two headers - addr:0x96000000 blocks:1
[0000.701] I> Addr: 0x96000000, start-block: 5879856, num_blocks: 1
[0000.713] I> Binary(13) of size 282736 is loaded @ 0x96000000
[0000.719] W> No valid slot number is found in scratch register
[0000.725] W> Return default slot: _a
[0000.728] I> A/B: bin_type (20) slot 0
[0000.732] I> Loading partition bootloader-dtb at 0x8520f400
[0000.737] I> Reading two headers - addr:0x8520f400 blocks:1
[0000.743] I> Addr: 0x8520f400, start-block: 5881904, num_blocks: 1
[0101.875] E> Waypoint-0.5 ACK pending: 0x8
[0101.879] C> MTS error (2) : dram alias check failure
[0101.884] C> cpu waypoint 0.5 failed
[0101.887] C> ERROR: Highest Layer Module = 0x32, Lowest Layer Module = 0x32,
Aux Info = 0x1, Reason = 0x6

Here’s a log from boot:

[0001.811] C> MTS error (2) : dram alias check failure
[0001.816] C> cpu waypoint 0.5 failed
[0001.820] C> ERROR: Highest Layer Module = 0x32, Lowest Layer Module = 0x32,
Aux Info = 0x1, Reason = 0x6

Hello,

Sorry for one more request. Could you post the whole flash log from host side?

I don’t have the full host side log but I can get it tomorrow. Here’s where it hangs:

[   6.8878 ] Sending bootloader and pre-requisite binaries
[   6.8892 ] tegrarcm_v2 --download blob blob.bin
[   6.8903 ] Applet version 01.00.0000
[   6.9113 ] Sending blob
[   6.9113 ] [................................................] 100%
[   7.3700 ] 
[   7.3732 ] tegrarcm_v2 --boot recovery
[   7.3756 ] Applet version 01.00.0000
[   7.3801 ] 
[   8.3838 ] tegrarcm_v2 --isapplet
[   9.1352 ] 
[   9.1390 ] tegradevflash_v2 --iscpubl
[   9.1401 ] Cannot Open USB
[   9.6000 ] 
[  10.6039 ] tegrarcm_v2 --isapplet

I should mention we are using meta-tegra so this is invoked from tegra186-flash-helper.sh. We are planning to compare the behavior with stock nvidia L4T programming tomorrow.

One other interesting observation from today, we are able to tegraflash from a non-booting R32 configuration back to R28 on our carrier board, so it appears to be something about the combination of the tegraflash step for R32.4.3 and our carrier board setup.

Hi,

Please just use pure flash.sh. Do not use something like meta-tegra. They are not our official tool so we cannot guarantee their functionality.

I’ve reproduced the same thing with the NVIDIA SDK manager and release 32.4.3, so it’s not meta-tegra specific

Serial logs:

E> Waypoint-0.5 ACK pending: 0x8                                                    
[0247.902] C> MTS error (2) : dram alias check failure                                                             
[0247.907] C> cpu waypoint 0.5 failed                                                                              
[0247.911] C> ERROR: Highest Layer Module = 0x32, Lowest Layer Module = 0x32,                                      
Aux Info = 0x1, Reason = 0x6  

Host logs:

I’ve also noticed what looks like a similar issue here which looks like it describes what I’m seeing, however unfortunately I wasn’t able to boot even after disconnecting the USB cable connection.

Same observation as JetPack 4.4 will not flash on custom board - #24 by chadiris regarding Jetpack 4.3. If I use the SDK manager with Jetpack 4.3 instead of 4.4 everything works. So the problem appears to be related to changes in the flash update tools for Jetpack 4.4 combined with something about differences between hardware other than the dev board hardware.

Back to my earlier question:

We are wondering if you can share any detail about changes in the SOM in recent revisions which may explain this behavior, especially related to revision 699-83310-1000-D00 M, or where we should look on the carrier board design to explain the dram failure above.

Any suggestions for us as to where to look? Just trying to narrow down the pin scope to something less than 400. My first suspicion was power related pins, however I’m not seeing anything that is jumping out at me as a difference comparing to the dev board and I’ve removed our power supply from the equation by powering from a benchtop supply.

Hello danwalkes1,

Here is my summary so far. Please confirm if it is right.

  1. D00 module + devkit + jp4.4 → Good
  2. D00 module + custom carrier + jp4.4 → NG
  3. Non- D00 module + custom carrier + jp 4.4 → Good
  4. D00 module + customer carrier + jp4.3 → Good

Are you able to switch some binaries/scripts from rel-32.3.1 to rel-32.4.3 and see if it can make it work? For example, flash.sh or nvtboot_recovery.bin.

Yes, that’s right. For “Non- D00 module” the B rev SOM is the only one we’ve tried so far.

Are you able to switch some binaries/scripts from rel-32.3.1 to rel-32.4.3 and see if it can make it work? For example, flash.sh or nvtboot_recovery.bin.

Thanks for the suggestion. I’ve prepared a branch at GitHub - sighthoundinc/meta-tegra at flashtools-32.3.1-hacks which I intend to test for this, which rolls back tegra186-flashtools-native to 32.3.1.

Are you able to switch some binaries/scripts from rel-32.3.1 to rel-32.4.3 and see if it can make it work? For example, flash.sh or nvtboot_recovery.bin.

No change when rolling both of these back to 32.3.1

Also no change when rolling any of the tegra186-flashtools-native back to 32.3.1

Since the error mentions dram I looked for dram references and noticed this difference between JP 4.4 and JP 4.3:

dan@yocto:/build/nvidia/nvidia_sdk/JetPack_4.4_Linux_JETSON_TX2/Linux_for_Tegra$ sudo find . -name "dram-*"
./bootloader/8755/dram-ecc_sigheader.bin.hash
./bootloader/8755/dram-ecc_sigheader.bin.encrypt
./bootloader/8755/dram-ecc.bin
./bootloader/8755/dram-ecc_sigheader.bin
./bootloader/dram-ecc.bin
dan@yocto:/build/nvidia/nvidia_sdk/JetPack_4.4_Linux_JETSON_TX2/Linux_for_Tegra$ sudo find ../../JetPack_4.3_Linux_JETSON_TX2/ -name "dram-*"
../../JetPack_4.3_Linux_JETSON_TX2/Linux_for_Tegra/bootloader/dram-ecc.bin

Copying dram-ecc.bin from JP 4.3 doesn’t help either.

In all cases above I still get the same

[0070.255] E> Waypoint-0.5 ACK pending: 0x8                                                                                                           
[0070.259] C> MTS error (2) : dram alias check failure                                                                                                
[0070.264] C> cpu waypoint 0.5 failed                                                                                                                 
[0070.267] C> ERROR: Highest Layer Module = 0x32, Lowest Layer Module = 0x32,                                                                         
Aux Info = 0x1, Reason = 0x6  

If I understand the cboot source correctly and line 116 of bootloader/partner/common/include/tegrabl_error.h Reason 0x6 is #define TEGRABL_ERR_TIMEOUT 0x06U and the 0x32 module refers to #define TEGRABL_ERR_CPUINIT 0x32U.

Hi,

Just for some test

Is your board +D00 module able to be flashed on rel-28.2.1 and rel-28.4?

Have you made any changes to mem bct and bpmp dtb file?
dram failure could point to a different DRAM used in D00 which you have not accounted for.
Random kernel oops could also be due to memory not stable.
What is the voltage you are supplying, 19v?

Is your board +D00 module able to be flashed on rel-28.2.1 and rel-28.4?

We’ve only tried 28.2.1, 32.3.1 and 32.4.3. It works on 28.2.1, 32.2.3, fails on 32.4.3.

No, and in all of the SDK Manager cases I’m not making any changes to the stock L4T flashing binaries or sequence (other than what I’ve listed above as troubleshooting steps).

What is the voltage you are supplying, 19v?

We are supplying 12V on VDD_IN

I should clarify we needed to patch the flash script to try with 28.2.1 (see Flashing error - ERR: unsupported board revision: D01 - #8 by damien.lefevre)

Today I’ve found if I completely replace the 4.4 bootloader directory with 4.3:

cd JetPack_4.4_Linux_JETSON_TX2/Linux_for_Tegra
mv bootloader bootloader-old
cp -rp ../../JetPack_4.3_Linux_JETSON_TX2/Linux_for_Tegra/bootloader/ .

Then fix error about missing file by copying existing similarly named file to P3310_A00_8GB_lpddr4_A02_l4t.cfg from P3310_A00_8GB_Samsung_8GB_lpddr4_204Mhz_A02_l4t.cfg

cp bootloader/t186ref/BCT/P3310_A00_8GB_Samsung_8GB_lpddr4_204Mhz_A02_l4t.cfg bootloader/t186ref/BCT/P3310_A00_8GB_lpddr4_A02_l4t.cfg

Then run

sudo ./flash.sh jetson-tx2-devkit mmcblk0p1

I can successfully flash an JP 4.4 image with JP 4.3 bootloader files. I have no idea what this means or if this is really something valid as a workaround, would greatly appreciate suggestions about how to proceed.

That is why I asked to try rel-28.2.1 and rel-28.4. You may see rel-28.4 fail to flash either. But it is not needed to test anymore.

Your issue seems have something to do with pcn206440.
https://developer.nvidia.com/jetson-tx2-pcn-206440-dramemmc-public

This pcn makes some change to the dram config you just pasted. However, the new release (rel-32.4.3/rel-28.4) should only add new config and does not remove the old one from cfg. You shall see that by diff those cfg files between rel-32.4.3 and rel-32.3.1.

Thanks @WayneWWW

Modules with 699 level part number versions equal to or greater than D02 may be built with the new components described in this PCN, or with the older components

So if my revision is 699-83310-1000-D00 M will this apply?

It looks like PCN PCN204840-Jetson_TX2_Multiple_Changes.pdf applies to D00

Any Jetson TX2 module with 699 level part number version greater than or equal to D00 may be affected by this PCN. Each module has a label displaying the 699 level part number.

Linux for Tegra software R28.2 (or later) includes the required changes and can be obtained from the Jetson developer
website at http://developer.nvidia.com/jetson . There are two download options: 1. Obtain “JetPack” version 3.2 from the download center, or 2. Obtain the BSP files directly: “L4T Jetson TX2 Driver Package” version 28.2 from the download center

So it looks like I should be covered with 28.2 when using the D00 SOM unless I’m misunderstanding something here.

However, since the change for https://developer.nvidia.com/jetson-tx2-pcn-206440-dramemmc-public came in starting with JP 4.4 r32.4.3 and 28.4, I think what you are saying is it’s likely the changes to support D00 SOMs have broken something on our hardware when using D00 SOMs. However these changes aren’t strictly needed to support D00 SOMs, only D02 SOMs. This means replacing the bootloader directory completely with JP 4.3 content is probably a solution for supporting D00 SOMs on JP 4.4. However we’ll probably have a problem with D02 SOMs in this case.

Am I understanding this correctly?

Can we make sure this issue is related to these PCN?

Actually PCN204840 should be already existed since 2018/7 so rel-32.3.1 and rel-32.4.3 both have this change.

But rel-32.3.1 and rel-32.4.3 differ in PCN206440. Could you make sure the dram cfg is the cause of your issue?

P3310_A00_8GB_Samsung_8GB_lpddr4_204Mhz_A02_l4t.cfg

Have you tried to replace this file only (not whole bootloader) to rel-32.4.3 and see if it can work? You can also replace the bpmp-dtb file (tegra186-bpmp-quill-p3310-1000-a00-00.dtb). Because pcn 206440 also has change in this file.

You may see rel-28.4 fail to flash either

I do see this fail to flash as well.

Actually PCN204840 should be already existed since 2018/7 so rel-32.3.1 and rel-32.4.3 both have this change.

Understood, I was trying to understand the difference between rev B and rev D SOM from a hardware perspective.

Have you tried to replace this file only (not whole bootloader) to rel-32.4.3 and see if it can work?

I just tried:

cd JetPack_4.4_Linux_JETSON_TX2/Linux_for_Tegra
cp ../../JetPack_4.3_Linux_JETSON_TX2/Linux_for_Tegra/bootloader/t186ref/BCT/P3310_A00_8GB_Samsung_8GB_lpddr4_204Mhz_A02_l4t.cfg bootloader/t186ref/BCT/P3310_A00_8GB_lpddr4_A02_l4t.cfg
sudo ./flash.sh jetson-tx2-devkit mmcblk0p1

It failed/hung up at the same place:

[   8.1919 ] tegrahost_v2 --chip 0x18 --generateblob blob.xml blob.bin
[   8.1939 ] number of images in blob are 9
[   8.1947 ] blobsize is 4335416
[   8.1950 ] Added binary blob_nvtboot_recovery_cpu_sigheader.bin.encrypt of size 221312
[   8.2006 ] Added binary blob_nvtboot_recovery_sigheader.bin.encrypt of size 90016
[   8.2020 ] Added binary blob_preboot_d15_prod_cr_sigheader.bin.encrypt of size 63104
[   8.2036 ] Added binary blob_mce_mts_d15_prod_cr_sigheader.bin.encrypt of size 2082144
[   8.2051 ] Added binary blob_bpmp_sigheader.bin.encrypt of size 533904
[   8.2068 ] Added binary blob_tegra186-a02-bpmp-quill-p3310-1000-c04-00-te770d-ucm2_sigheader.dtb.encrypt of size 605120
[   8.2092 ] Added binary blob_tos-trusty_sigheader.img.encrypt of size 366400
[   8.2106 ] Added binary blob_eks_sigheader.img.encrypt of size 1440
[   8.2116 ] Added binary blob_tegra186-quill-p3310-1000-c03-00-base_sigheader.dtb.encrypt of size 371824
[   8.2171 ]
[   8.2172 ] Sending bootloader and pre-requisite binaries
[   8.2197 ] tegrarcm_v2 --download blob blob.bin
[   8.2217 ] Applet version 01.00.0000
[   8.2424 ] Sending blob
[   8.2427 ] [................................................] 100%
[   8.7538 ]
[   8.7585 ] tegrarcm_v2 --boot recovery
[   8.7624 ] Applet version 01.00.0000
[   8.7846 ]
[   9.7949 ] tegrarcm_v2 --isapplet

However the serial port prints look different:

NOTICE:  BL31: v1.3(release):41d46a9cf
NOTICE:  BL31: Built : 21:14:44, Jun 25 2020
ipc-unittest-main: 1519: Welcome to IPC unittest!!!
ipc-unittest-main: 1531: waiting forever
ipc-unittest-srv: 329: Init unittest services!!!
hwkey-agent: 40: hwkey-agent is running!!
hwkey-agent: 182: key_mgnt_processing .......
hwkey-agent: 157: Init hweky-agent services!!
platform_bootstrap_epilog: trusty bootstrap complete

You can also replace the bpmp-dtb file (tegra186-bpmp-quill-p3310-1000-a00-00.dtb). Because pcn 206440 also has change in this file.

Starting with the bootloader directory above with modified P3310_A00_8GB_lpddr4_A02_l4t.cfg I tried:

cp ../../JetPack_4.3_Linux_JETSON_TX2/Linux_for_Tegra/bootloader/t186ref/tegra186-a02-bpmp-quill-p3310-1000-a00-00.dtb bootloader/t186ref/
sudo ./flash.sh jetson-tx2-devkit mmcblk0p1

This also fails at

[   8.1852 ] Added binary blob_nvtboot_recovery_cpu_sigheader.bin.encrypt of size 221312
[   8.1914 ] Added binary blob_nvtboot_recovery_sigheader.bin.encrypt of size 90016
[   8.1930 ] Added binary blob_preboot_d15_prod_cr_sigheader.bin.encrypt of size 63104
[   8.1942 ] Added binary blob_mce_mts_d15_prod_cr_sigheader.bin.encrypt of size 2082144
[   8.1957 ] Added binary blob_bpmp_sigheader.bin.encrypt of size 533904
[   8.1973 ] Added binary blob_tegra186-a02-bpmp-quill-p3310-1000-c04-00-te770d-ucm2_sigheader.dtb.encrypt of size 605120
[   8.1996 ] Added binary blob_tos-trusty_sigheader.img.encrypt of size 366400
[   8.2007 ] Added binary blob_eks_sigheader.img.encrypt of size 1440
[   8.2017 ] Added binary blob_tegra186-quill-p3310-1000-c03-00-base_sigheader.dtb.encrypt of size 371824
[   8.2073 ]
[   8.2074 ] Sending bootloader and pre-requisite binaries
[   8.2099 ] tegrarcm_v2 --download blob blob.bin
[   8.2118 ] Applet version 01.00.0000
[   8.2312 ] Sending blob
[   8.2316 ] [................................................] 100%
[   8.7317 ]
[   8.7358 ] tegrarcm_v2 --boot recovery
[   8.7395 ] Applet version 01.00.0000
[   8.7598 ]
[   9.7704 ] tegrarcm_v2 --isapplet

with a similar serial port message above:

[0142.160] I> Welcome to MB2(TBoot-BPMP) Recovery(version: 01.00.160913-t186-M-00.00-mobile-82dac681)
[0142.169] I> bit @ 0xd480000
[0142.172] I> Boot-device: eMMC
[0142.294] I> sdmmc DDR50 mode
[0142.299] I> sdmmc bdev is already initialized
[0142.304] I> pmic: reset reason (nverc)        : 0x50
[0142.310] I> Found 18 partitions in SDMMC_BOOT (instance 3)
[0142.318] I> Found 33 partitions in SDMMC_USER (instance 3)
[0142.326] I> Binary(16) of size 533504 is loaded @ 0xd7800000
[0142.335] I> Binary(17) of size 604720 is loaded @ 0xd796c5c0
[0142.565] I> Copy BTCM section
[0142.570] I> Binary(13) of size 220912 is loaded @ 0x96000000
[0142.577] I> Binary(20) of size 371424 is loaded @ 0x8520f400
[0142.585] I> Binary(14) of size 366000 is loaded @ 0x8530f600
[0142.593] I> TOS boot-params @ 0x85000000
[0142.596] I> TOS params prepared
[0142.600] I> Loading EKS ...
[0142.603] I> Binary(15) of size 1040 is loaded @ 0x8590f800
[0142.608] I> EKB detected (length: 0x400) @ 0x8590f800
[0142.613] I> Copied encrypted keys
[0142.617] I> boot profiler @ 0x275844000
[0142.621] I> boot profiler for TOS @ 0x275844000
[0142.626] I> Unhalting SCE
[0142.628] I> Primary Memory Start:80000000 Size:70000000
[0142.634] I> Extended Memory Start:f0110000 Size:1856f0000
[0142.640] I> MB2(TBoot-BPMP) Recovery done

NOTICE:  BL31: v1.3(release):41d46a9cf
NOTICE:  BL31: Built : 21:14:44, Jun 25 2020
ipc-unittest-main: 1519: Welcome to IPC unittest!!!
ipc-unittest-main: 1531: waiting forever
ipc-unittest-srv: 329: Init unittest services!!!
hwkey-agent: 40: hwkey-agent is running!!
hwkey-agent: 182: key_mgnt_processing .......
hwkey-agent: 157: Init hweky-agent services!!
platform_bootstrap_epilog: trusty bootstrap complete

Something changed, I got the same serial port message above about trusty bootstrap complete and hung at the tegrarcm_v2 --isapplet location trying to flash Jetpack 4.4 with Jetpack 4.3 bootloader as well. The change happened after I successfully programed rel-28.4 and then failed to program JP 4.4 with only P3310_A00_8GB_lpddr4_A02_l4t.cfg copied from JP 4.3

I re-flashed JP 4.3 successfully, and booted this once, then retried the JP 4.4 based upload script with both P3310_A00_8GB_lpddr4_A02_l4t.cfg and tegra186-a02-bpmp-quill-p3310-1000-a00-00.dtb copied from JP 4.3. I reproduced the same state again on the serial port

[0187.534] I> Welcome to MB2(TBoot-BPMP) Recovery(version: 01.00.160913-t186-M-00.00-mobile-82dac681)
[0187.543] I> bit @ 0xd480000
[0187.546] I> Boot-device: eMMC
[0187.735] I> sdmmc DDR50 mode
[0187.740] I> sdmmc bdev is already initialized
[0187.744] I> pmic: reset reason (nverc)        : 0x80
[0187.751] I> Found 18 partitions in SDMMC_BOOT (instance 3)
[0187.758] I> Found 33 partitions in SDMMC_USER (instance 3)
[0187.767] I> Binary(16) of size 533504 is loaded @ 0xd7800000
[0187.776] I> Binary(17) of size 604720 is loaded @ 0xd796c5c0
[0188.006] I> Copy BTCM section
[0188.010] I> Binary(13) of size 220912 is loaded @ 0x96000000
[0188.018] I> Binary(20) of size 371424 is loaded @ 0x8520f400
[0188.025] I> Binary(14) of size 366000 is loaded @ 0x8530f600
[0188.033] I> TOS boot-params @ 0x85000000
[0188.037] I> TOS params prepared
[0188.040] I> Loading EKS ...
[0188.043] I> Binary(15) of size 1040 is loaded @ 0x8590f800
[0188.049] I> EKB detected (length: 0x400) @ 0x8590f800
[0188.054] I> Copied encrypted keys
[0188.057] I> boot profiler @ 0x275844000
[0188.061] I> boot profiler for TOS @ 0x275844000
[0188.066] I> Unhalting SCE
[0188.069] I> Primary Memory Start:80000000 Size:70000000
[0188.074] I> Extended Memory Start:f0110000 Size:1856f0000
[0188.081] I> MB2(TBoot-BPMP) Recovery done

NOTICE:  BL31: v1.3(release):41d46a9cf
NOTICE:  BL31: Built : 21:14:44, Jun 25 2020
ipc-unittest-main: 1519: Welcome to IPC unittest!!!
ipc-unittest-main: 1531: waiting forever
ipc-unittest-srv: 329: Init unittest services!!!
hwkey-agent: 40: hwkey-agent is running!!
hwkey-agent: 182: key_mgnt_processing .......
hwkey-agent: 157: Init hweky-agent services!!
platform_bootstrap_epilog: trusty bootstrap complete

So I still don’t have a solution other than replacing the entire bootloader directory, and even that solution doesn’t work after I fail to program the JP 4.4 release after replacing a smaller subset of files, until I completely reprogram with JP 4.3.