M.2 nvme SSD detection issue

Hi WayneWWW,
Please check them(Data is gotten from SOM with eMMC+devkit carrier).
dmesg (67.5 KB)
lspci (6.9 KB)

Thanks.
Wayne

Hi WayneWWW,
Any update?

Thanks.
Wayne

Hi WayneWWW,
Another week passed.
Any update?

Thanks.
Wayne

Hi WayneWWW,
Any update for this issue?
Thanks.

Wayne

Hi WayneWWW,
Still not get anything from NVIDIA.
Please reply us for this issue.
Thanks.

Wayne

Hi,

This is strange, file system has no role in PCIe link up.
Try following things and let me know how it goes,

  1. Disable NVMe in bootloader.
    a. cd /Linux_for_Tegra/bootloader/
    b. Remove nvme from “boot-order” in cbo.dts
    c. dtc -I dts -O dtb -o cbo.dtb cbo.dts
    d. Add “-k CPUBL-CFG” to regular flash command

  2. Remove “nvidia,enable-power-down” from pcie@141a0000 node and flash DTB. Now you should see Tegra PCIe root port with domain=5. Get below register dump from root shell.

/home/ubuntu/reg_dump -a 0x141a00d0

Thanks,
Manikanta

Hi Manikanta,

Thanks for your response. There are two things:
First, I used the code base “jetson_linux_r32.6.1_aarch64” with Xavier NX SOM (eMMC version) and follow your steps 1 and 2. After flashing the image and booting into desktop, I can’t find the file “reg_dump” in any folder in the device.
I use the tool “busybox devmem” to dump:

busybox devmem 0x141a00d0 32

0x00000088

Second, I found the SSD(Kingston OM8PDP3256B-AB1) can be detected in Ubuntu after disabling NVME in the bootloader (Step 1). Does this change (disabling NVME in the bootloader) mean that we can’t boot with NVME?

Thanks,

Kunyang

Hi,

The purpose here is just for dumping the register, so you can also use the devmem tool from busybox too.

Hi,

  1. Need register dump when issue is observed with only step-2.

  2. No, we can boot with NVMe, but you can’t use NVMe as boot option.
    Can you continue your project without NVMe as boot option? and use it after boot in ubuntu?
    If yes, then you can continue with step-1 from comment #21.

Please provide NVMe make and model, will check internally if we have same NVMe to debug link up issue internally.

Thanks,
Manikanta

Hi Manikanta,

Thanks for your remind
For the first point. I dump the register (0x141a00d0) in two cases:
Before “nvidia,enable-power-down” removal: I got kernel panic when inputting the command “busybox devmem 0x141a00d0 32” and the log is in the attachment
kernel_panic_after_devdump.txt (5.1 KB)
After “nvidia,enable-power-down” removal: I got the below output and the NVMe device was still not found.
busybox devmem 0x141a00d0 32
0x00000018

For the second point, you are right and the interesting thing is that the NVME device “SSD(Kingston OM8PDP3256B-AB1)” can be detected no matter "nvidia,enable-power-down” was removed or not. We can access it in Ubuntu.
This NVMe detail info is https://www.harddrivebenchmark.net/hdd.php?hdd=KINGSTON%20OM8PDP3256B-AB1&id=29256

Thanks,

Kunyang

Hi,

I asked you to add “nvidia,enable-power-down” to access the register, without it kernel panic is expected.
Register dump in failure case tell us that link is stuck in compliance mode.

Yes, nvidia,enable-power-down” will not impact PCIe link status. You can continue with disabling NVMe in bootloader as WAR for you work. I will continue to track this internally.

I am marking comment #21 as the solution.

Thanks,
Manikanta

Hi Manikanta,

Thanks for your reply. As you said before “I asked you to add “nvidia,enable-power-down” to access the register, without it kernel panic is expected.”. The interesting thing is the kernel panic happened when “nvidia,enable-power-down” existed and NVMe (Kingston OM8PDP3256B-AB1) was on. It seems the this NVMe made PCIe link disappeared.
The setting “nvidia,enable-power-down” is default existed in the node “pcie@141a0000” within dtsi file. But once I removed the setting “nvidia,enable-power-down” in the node “pcie@141a0000”, we can see the PCIe link and dump the register without kernel panic and the NVMe device (Kingston OM8PDP3256B-AB1) was still on. I am not sure the method of removing NVMe device from booting option is a solution or workaround.

Thanks,

Kunyang

Hi,

“nvidia,enable-power-down” has no impact on functionality. This options should be removed to let controller be in power up state to dump the register. You can ignore kernel panic error when “nvidia,enable-power-down” is present, this is expected.

Are you seeing any issue after disabling NVMe in bootloader?

Thanks,
Manikanta

Hi Manikanta,

No, it seems all good after disabling NVMe in bootloader and we can use the NVMe in OS. But it may remain the limitation of no boot option with NVMe I guess.

lspci

0005:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad0 (rev a1)
0005:01:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. Device 500d (rev 01)

Thanks,

Kunyang

Hi Manikanta,

Thanks for your help. The NVMe device “(Kingston OM8PDP3256B-AB1)” can work in the Ubuntu after removing NVMe deice as a boot option. But we still hope NVMe device can be a boot option since the eMMC has storage size capacity limitation. Moreover, this device we have already purchased a lot. We are not sure how many NVMe device may have the similar interoperability problem in the bootloader. Could you help check the root cause of why this device “(Kingston OM8PDP3256B-AB1)” can’t work in the bootloader ?

Thanks,

Kunyang

Hi~ Manikanta & Wayne,
Add more information for this issue discussion.

Thanks
Ken

Hi,

  1. Are you saying that this issue only happened to emmc module? Have you done enough tests on multiple boards + modules?

  2. Are you sure these two modules are under the same test environment? For example, both have nvme in cboot enabled/disabled.

Hi~Wayne,

  1. Are you saying that this issue only happened to emmc module? Have you done enough tests on multiple boards + modules?
    Q1–>Yes, this issue just happend to eMMC mdule Xavier NX PN: 180-13668-DAAA-A03.
    Q2 → Please check wayne_liao’s reply on Jan 11

  2. Are you sure these two modules are under the same test environment? For example, both have nvme in cboot enabled/disabled.

Q1 & Q2 → Please check wayne_liao’s reply on Jan 11

Thanks
Ken

Hi,

Since bootloader is getting involved, could you share the uart log from both devices?

What you should do is

  1. (NTFS case) Disable nvme in cboot for both boards, share the uart log and dmesg.

  2. (NTFS case) Enable nvme in cboot for both boards, share the uart log and dmesg.

  3. (ext4 case) Enable nvme in cboot for both boards, share the uart log and dmesg.

Also, could you probe that de-assert signal for each case here?