Off-limits RAM regions

Hello,

I am running a non-Linux OS on the Jetson Nano. Everything seems fine so far, but I ran into a problem when initializing memory. Somewhere in the physical address range 0x80000000-0xffffffff there seems to be memory that should not be touched - any attempt to set the memory to 0 reads back as all 1s. I have so far narrowed things down such that if I exclude the range 0xc8000000-0xcffffff the system boots and works, but I don’t see any relevant entries in the FDT that would correspond to such a range.

Any thoughts?

Thanks,
–Elad

The Tegra X1 TRM, in chapter 2, has an address map.
It claims that the entire range 0x80000000-0xffffffff is mapped to DRAM.
Thus, it’s likely that some other boot device sets up that area to be used for DMA or GPU command streams or similar.
How early in the boot process does your OS take over?

The boot sequence is

(whatever the Nano firmware does)-> u-boot → board-specific code (MMU and caches are off) → kernel (MMU and caches are on)

The failure occurs in the kernel when it bootstraps the virtual memory manager, well before any drivers are up. So unless u-boot is doing something funny there shouldn’t be any DMA transactions at this point.

If there is really no expectation of any physical addresses within this range that should not be touched then I can run a memory test early on (soon after the u-boot hand over) to see if it is something in the BSP code.

I ran a memory test very early on after the u-boot hand-off (MMU is still off), and got the following results:

MEM FAILURE 00000000ff300000 0000000080100fa0

MEM PASS 00000000ff540000
MEM FAILURE 00000000ff600000 0000000070006014

MEM CHECK DONE

The test simply writes 0 to the address and then reads back the current value, reporting an error if it is not 0.

So the offending ranges are 0xff300000-0xff53ffff and 0xff600000-0xffffffff (the previous suspicious regions turned out to be a red herring: excluding any range in the middle masks the problem just because of the way the bootstrap memory allocator works, so it didn’t touch the problematic addresses).

The bad values are highly suspicious (0x70006014 is the status register for UART A) but with the MMU off I’m not sure what can prevent the memory from being modified. My only other guess is that it has something to do with the caches.