NVMe is not detecting on Xavier NX Custom board with BSP version 35.4.1

Hi,

We are facing issue in detecting the NVMe on lsblk with BSP version 35.4.1 on custom board.
Where as the same NVMe we are able to detect both on lspci and lsblk with BSP version 32.6.1.

I have attached the dmesg logs and screenshot of both BSP version.



dmesg_log_35_4_1.txt (76.3 KB)
dmesg_log_32_6_1.txt (79.8 KB)

Hi vbhm,

What’s your NVMe SSD in use?(brand? model?)

Could you detect this NVMe on the Xavier NX devkit?

How did you update your board from R32.6.1 to R35.4.1?

Hi Kevin,

This is the NVMe Brand: Samsung and Model - MZ-V8P2T0.

Yes, its detecting in devkit.
What do you suggest whether we need to look into hardware or software. Because the same NVMe is getting detected on both devkit and Custom board with BSP version 32.6.1.
We are only facing problem with the version 35.4.1.

Below are the steps I followed for updating the BSP
1.Downloaded the BSP version 35.4.1 from nvidia
2.Applied the patch with our changes(We haven’t changed anything w.r.t M.2/NVMe interface.
3.Flashed with below command
sudo ./flash.sh jetson-xavier-nx-devkit-emmc mmcblk0p1

Could you check with your HW team about if there’s any difference in custom board design wrt the devkit? and you could share the information for us to do further check.

It seems it could be detected on your custom carrier board with lspci command with R35.4.1. (recognized at 0005:05:00.0, but enabling)

Could you share the dmesg with this NVMe connected on the devkit?

Yes, hardware team is looking into it.
But on the same carrier board the NVMe is getting detected with both lspci and lsblk with BSP version R32.6.1.

Here is the dmesg log of devkit with BSP version R35.4.1.
devkit_dmesg_log.txt (70.2 KB)

It seems a bug from pericom switch.
Please refer to the following thread to check if it could help for your case.
Boot stuck while enumerating NVMe via PFX switch, seems to be PCIe driver issue - #8 by WayneWWW

Please check the result of lspci -vvv for vendor ID used in PCI_DEVICE_ID_PERICOM_SWITCH_PORT

Hi Kevin,

I made the changes mentioned in the above-mentioned link. But still, I am unable to see the NVMe in lsblk.

Here, I am attaching the dmesg log of the same.
dmesg_log.txt (78.6 KB)

Before the patch
sudo dmesg | grep nvme
[sudo] password for nvidia:
[ 6.171208] nvme 0005:05:00.0: Adding to iommu group 7
[ 6.176563] nvme nvme0: pci function 0005:05:00.0
[ 66.530236] nvme nvme0: I/O 8 QID 0 timeout, disable controller
[ 66.530826] nvme nvme0: Identify Controller failed (-4)
[ 66.531004] nvme nvme0: Removing after probe failure status: -5

After the patch is applied which is mentioned in the above link
sudo dmesg | grep nvme
[ 6.504878] nvme 0005:05:00.0: Adding to iommu group 7
[ 6.510175] nvme nvme0: pci function 0005:05:00.0
[ 6.518468] nvme nvme0: Shutdown timeout set to 10 seconds
[ 6.523102] nvme nvme0: 6/0/0 default/read/poll queues
[ 6.528320] nvme0n1:
[ 41.952215] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
[ 41.996731] blk_update_request: I/O error, dev nvme0n1, sector 3907028992 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 42.032206] nvme 0005:05:00.0: enabling device (0000 → 0002)
[ 42.032298] nvme nvme0: Removing after probe failure status: -19
[ 42.048306] Buffer I/O error on dev nvme0n1, logical block 488378624, async page read

Could you share the sudo lspci -vvv status in this situation?

Hi Wayne,

Here is the log of lspci -vvv and lspci -xxx.
lspci_vvv_xxx_log.txt (44.5 KB)

Could you share the exact patch you are using ? Want to double confirm if you put it correctly.

Hi Wayne,

Here is the patch file.
patch_file.txt (1.5 KB)

Have you tried other kind of nvme SSD here?

Hi Wayne,

I tried with nvme from Marvell Technology. It is getting detected.
Here is log
nvme_marvell_detection_log.txt (43.6 KB)

What is the problem with Samsung Electronics NVMe. I tried with 2 NVMe of Samsung Electronics but both failed.

The same NVMe (From Samsung Electronics) is getting detected with BSP version R32.6.1. It is failing only with version R35.4.1.

Please contact with the PCIe switch vendor if there is firmware update that can be applied to the switch.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.