NVMe sometimes lost on reboot - pcie_aspm=off influence

We added a debug message inside the function __nvme_submit_sync_cmd right at the beginning and do not see this printed, therefore we think it happens with the kmalloc function. Or how do you understand it?

Isn’t “printk(“nvme: after kmalloc”)” already got printed?

yes it gets printed, but after nothing more.

Maybe you can just comment out nvme_submit_sync_cmd and see if this would lead to the end of nvme_identify_ctrl.

Then we get:

nvme nvme0: Identify Controller failed (-12)

However it leads to the end if the nvme_identify_ctrl function.

Then it looks like this is nvme_submit_sync_cmd got hang.

Honestly, since this is upstream nvme driver, it would be more helpful to check this issue on Linux community than here…

Hi WanyeWWW

We investigated this further and realized that when the NVMe is detected correctly, that the detection messages appeared later in the boot process compared to the failure case. We then added a sleep of 10ms in the nvme_probe function and so far, the NVMe gets always detected.

static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
struct nvme_dev *dev;
int result = -ENOMEM;
usleep_range(10000,10000);

Any idea why this helps?

1 Like

As I keep telling in this post, this issue so far is only specific to Apacer NVMe, thus it could be their firmware issue on their nvme. Maybe you could report this to them if they have any contact Window…

Ok thank you.
We are already in contact with them.

Just for letting you know, the delay does not resolve this, it just improves it. We keep checking with Apacer for a solution.
Kind regards

Hi WayneWWW

We found a workaround where we did not see any fail for around 1000 reboots (still testing). We added a sleep inside the init script:

if [[ "${rootdev}" == PARTUUID* ||  "${rootdev}" == nvme* || "${rootdev}" == sd* || "${rootdev}" == UUID* ]]; then
        if [[ "${version}" != *5\.10* ]]; then
                modprobe -v pcie-tegra194;
                modprobe -v phy-tegra194-p2u;
                sleep 0.5
        fi
        modprobe -v nvme;
        modprobe -v typec
        modprobe -v typec_ucsi
        modprobe -v ucsi_ccg
        modprobe -v tegra-xudc
fi

Is it correct that for JetPack 5, the nvme driver and the pcie-tegra194 driver were not modules but inside the kernel image?
Anyway this seems to have an impact on the NVMe recognition. We are still in contact with the manufacturer.

How can we make sure that the initrd is not updated with “apt”? Is it sufficient to hold back the nvidia-bootloader package?
Thank you.

should be this one
->nvidia-l4t-initrd

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.