Dear NVidia Team
We have a custom Orin NX carrier board where we see the issue, that sometimes, during the execution of initrd after a reboot, the nvme0n1p1 is not found and the system boots into a bash shell. Here the log files for the non-working and the working case:
dmesg_not_working.txt (41.3 KB)
dmesg_working.txt (66.6 KB)
We see the following error in the not working case:
[ 9.023576] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x402, iova=0x00000000, fsynr=0x11, cbfrsynra=0x1013, cb=2
[ 9.023831] tegra-mc 2c00000.memory-controller: pcie4w: secure write @0x00000003ffffff00 : VPR violation ((null))
After removing the Kernel argument “pcie_aspm=off” which we used so far, the NVMe is always recognized.
We are testing JetPack 6.0 GA with the standard kernel.
Any idea why this kernel argument leads to this issue?
Thank you.
Kind regards
I guess even the nvme drive itself is related to this issue.
Could you also try other brand of nvme SSD too? And please share us what is the one you are using now.
Also, does this issue happen on NV devkit?
sevm89
June 10, 2024, 8:37am
3
Dear WayneWWW
The SSD is an Apacer PV920-M280. We will check on the DevKit and also different SSDs and come back to you.
Thank you.
1 Like
kayccc
June 19, 2024, 5:10am
4
Is this still an issue to support? Any result can be shared?
sevm89
June 19, 2024, 7:54am
5
Hi kayccc
Sorry for the late answer.
We just checked today with the same SSD on the DevKit and we can see the same issue.
Using a different brand of NVMe SSD, we do not see this behavior.
Best regards
sevm89
July 2, 2024, 10:13am
6
Hi kayccc
Do you have an update for us?
With JetPack 5.1.2, we do not see this behaviour and the same NVMe is booting up fine, this as additional information.
Thank you.
Hi sevm89,
As we don’t have this NVMe, it is hard for us to check issue.
If you are willing to help check things, then we can share some test for you to run.
But if you are not able to do this, I would suggest just try to prevent such NVMe drive and use other kinds instead.
sevm89
July 2, 2024, 11:20am
8
Hi WayneWWW
Please share the tests with us so we can investigate this.
Thank you.
As you said you don’t have this issue on rel-35.4.1.
Could you cross check rel-35.5 kernel to rel-35.4 environment and see if you can still reproduce this issue?
And vice-versa could you put rel-35.4.1 kernel to rel-35.5. and cross check if you can still reproduce this issue?
sevm89
July 2, 2024, 11:36am
10
Hi WayneWWW
Do you mean porting the kernel module nvme.ko from rel-35.5 to rel-35.4 and vice versa?
Thank you.
I mean the whole kernel. Kernel images/dtbs/modules(.ko).
sevm89
July 3, 2024, 6:30am
12
As a JetPack Image 5.1.2 will not even boot with the JetPack 6.0 bootloader, we don’t think it is possible exchanging kernels between these releases, but we will try it then anyway.
Sorry, I just realized you are talking about rel-35.4.1 and rel-36?
But not rel-35.4.1 and rel-35.5?
sevm89
July 3, 2024, 6:38am
14
Yes we talk about rel-35.4.1 and rel-36.3.
Then there is nothing to cross check remotely.
Could you add some prints to nvme driver on rel-36.3 and compare what happened in NG case and OK case?
Also, what is the purpose to disable ASPM in this case? Will the issue got enhanced if not adding that in kernel cmdline?
sevm89
July 3, 2024, 1:52pm
17
Hi WanyeWWW
We added debug prints to the nvme driver and got the following result:
Working case:
[ 8.983092] nvme 0004:01:00.0: Adding to iommu group 7
[ 8.983538] nvme_pci_alloc_dev
[ 8.983586] nvme_dev_map
[ 8.983594] nvme_setup_prp_pools
[ 8.983597] nvme_pci_alloc_iod_mempool
[ 8.983599] nvme nvme0: pci function 0004:01:00.0
[ 8.983609] nvme_reset_ctrl
[ 8.983783] nvme: nvme_sync_queues
[ 8.983785] nvme: mutex_lock
[ 8.983785] nvme: nvme_pci_enable
[ 8.983826] nvme 0004:01:00.0: enabling device (0000 → 0002)
[ 8.983886] nvme: pci_set_master
[ 8.984074] nvme: pci_alloc_irq_vectors
[ 8.984080] nvme: lo_hi_readq
[ 8.984081] nvme: q_depth
[ 8.984081] nvme: ctrl.sqsize
[ 8.984082] nvme: db_stride
[ 8.984084] nvme: nvme_map_cmb
[ 8.984107] nvme: pci_enable_pcie_error_reporting
[ 8.984754] nvme: pci_save_state
[ 9.020892] nvme: nvme_pci_configure_admin_queue
[ 9.020944] nvme: nvme_alloc_admin_tags
[ 9.020946] nvme: dma_set_min_align_mask
[ 9.020947] nvme: ctrl.max_segments
[ 9.020947] nvme: dma_set_max_seg_size
[ 9.020948] nvme: mutex_unlock
[ 9.020948] nvme: nvme_change_ctrl_state
[ 9.020949] nvme: ctrl.max_integrity_segments
[ 9.020950] nvme: nvme_init_ctrl_finish function
[ 9.020952] nvme: reg_read32
[ 9.020952] nvme: min_t
[ 9.020952] nvme: NVME_CAP_NSSRC
[ 9.020953] nvme: before_nvme_init_identify
[ 9.020953] nvme_init_identify function)
[ 9.020954] nvme_identify_ctrl function
[ 9.020955] nvme: kmalloc
[ 9.020955] nvme: after kmalloc
[ 9.020956] nvme: __nvme_submit_sync_cmd
[ 9.020957] nvme: NVME_QID_ANY
[ 9.020960] nvme_alloc_request
[ 9.020961] nvme: buffer && bufferlen
[ 9.020963] nvme: blk_rq_map_kern
[ 9.042061] nvme_execute_rq
[ 9.042062] nvme: out
[ 9.042065] nvme_submit_sync_cmd
[ 9.042065] nvme_identify_ctrl
[ 9.042067] nvme: __nvme_submit_sync_cmd
[ 9.042067] nvme: NVME_QID_ANY
[ 9.042068] nvme_alloc_request
[ 9.042069] nvme: buffer && bufferlen
[ 9.042069] nvme: blk_rq_map_kern
[ 9.042374] nvme_execute_rq
[ 9.042377] nvme: out
[ 9.042382] nvme_get_effects_log
[ 9.042384] nvme: ctrl->identified
[ 9.042386] nvme: quirk_matches
[ 9.042390] nvme nvme0: missing or invalid SUBNQN field.
[ 9.042431] nvme_init_subsystem
[ 9.042432] nvme: memcpy
[ 9.042434] nvme_set_queue_limits
[ 9.042436] nvme: memcpy
[ 9.042437] nvme_mpath_init_identify
[ 9.042438] nvme: check apst_enabled
[ 9.042442] nvme: nvme_init_identify
[ 9.042444] nvme: __nvme_submit_sync_cmd
[ 9.042445] nvme: NVME_QID_ANY
[ 9.042449] nvme_alloc_request
[ 9.042450] nvme: buffer && bufferlen
[ 9.042453] nvme: blk_rq_map_kern
[ 9.042775] nvme_execute_rq
[ 9.042778] nvme: result >= 0
[ 9.042779] nvme: out
[ 9.042784] nvme_configure_apst
[ 9.042785] nvme_configure_timestamp
[ 9.042786] nvme_configure_directives
[ 9.042787] nvme_configure_acre
[ 9.042788] nvme_hwmon_init
[ 9.042788] nvme: ctrl->identified
[ 9.042790] nvme: nvme_init_ctrl_finish
[ 9.042791] nvme: free_opal_dev
[ 9.042792] nvme: nvme_setup_host_mem
[ 9.053197] nvme: nvme_alloc_host_mem
[ 9.053205] nvme nvme0: allocated 64 MiB host memory buffer.
[ 9.053211] nvme: __nvme_submit_sync_cmd
[ 9.053211] nvme: NVME_QID_ANY
[ 9.053217] nvme_alloc_request
[ 9.163259] nvme_execute_rq
[ 9.163263] nvme: out
[ 9.163269] nvme: nvme_setup_host_mem
[ 9.163272] nvme: __nvme_submit_sync_cmd
[ 9.163273] nvme: NVME_QID_ANY
[ 9.163277] nvme_alloc_request
[ 9.167308] nvme_execute_rq
[ 9.167312] nvme: result >= 0
[ 9.167313] nvme: out
[ 9.167889] nvme: __nvme_submit_sync_cmd
[ 9.167891] nvme: NVME_QID_ANY
[ 9.167896] nvme_alloc_request
[ 9.171380] nvme_execute_rq
[ 9.171380] nvme: out
[ 9.171382] nvme: __nvme_submit_sync_cmd
[ 9.171382] nvme: NVME_QID_ANY
[ 9.171383] nvme_alloc_request
[ 9.180809] nvme_execute_rq
[ 9.180811] nvme: out
[ 9.180842] nvme: __nvme_submit_sync_cmd
[ 9.180843] nvme: NVME_QID_ANY
[ 9.180847] nvme_alloc_request
[ 9.185051] nvme_execute_rq
[ 9.185054] nvme: out
[ 9.185059] nvme: __nvme_submit_sync_cmd
[ 9.185060] nvme: NVME_QID_ANY
[ 9.185065] nvme_alloc_request
[ 9.194316] nvme_execute_rq
[ 9.194319] nvme: out
[ 9.194353] nvme: __nvme_submit_sync_cmd
[ 9.194355] nvme: NVME_QID_ANY
[ 9.194359] nvme_alloc_request
[ 9.198556] nvme_execute_rq
[ 9.198559] nvme: out
[ 9.198564] nvme: __nvme_submit_sync_cmd
[ 9.198565] nvme: NVME_QID_ANY
[ 9.198570] nvme_alloc_request
[ 9.207820] nvme_execute_rq
[ 9.207824] nvme: out
[ 9.207863] nvme: __nvme_submit_sync_cmd
[ 9.207864] nvme: NVME_QID_ANY
[ 9.207870] nvme_alloc_request
[ 9.212060] nvme_execute_rq
[ 9.212063] nvme: out
[ 9.212068] nvme: __nvme_submit_sync_cmd
[ 9.212069] nvme: NVME_QID_ANY
[ 9.212074] nvme_alloc_request
[ 9.221325] nvme_execute_rq
[ 9.221329] nvme: out
[ 9.221362] nvme: __nvme_submit_sync_cmd
[ 9.221363] nvme: NVME_QID_ANY
[ 9.221368] nvme_alloc_request
[ 9.225565] nvme_execute_rq
[ 9.225569] nvme: out
[ 9.225574] nvme: __nvme_submit_sync_cmd
[ 9.225575] nvme: NVME_QID_ANY
[ 9.225579] nvme_alloc_request
[ 9.234830] nvme_execute_rq
[ 9.234833] nvme: out
[ 9.234867] nvme: __nvme_submit_sync_cmd
[ 9.234868] nvme: NVME_QID_ANY
[ 9.234873] nvme_alloc_request
[ 9.238900] nvme_execute_rq
[ 9.238901] nvme: out
[ 9.238902] nvme: __nvme_submit_sync_cmd
[ 9.238902] nvme: NVME_QID_ANY
[ 9.238903] nvme_alloc_request
[ 9.248332] nvme_execute_rq
[ 9.248336] nvme: out
[ 9.248373] nvme: __nvme_submit_sync_cmd
[ 9.248374] nvme: NVME_QID_ANY
[ 9.248379] nvme_alloc_request
[ 9.252574] nvme_execute_rq
[ 9.252577] nvme: out
[ 9.252582] nvme: __nvme_submit_sync_cmd
[ 9.252583] nvme: NVME_QID_ANY
[ 9.252587] nvme_alloc_request
[ 9.261838] nvme_execute_rq
[ 9.261842] nvme: out
[ 9.261879] nvme: __nvme_submit_sync_cmd
[ 9.261880] nvme: NVME_QID_ANY
[ 9.261886] nvme_alloc_request
[ 9.266079] nvme_execute_rq
[ 9.266082] nvme: out
[ 9.266087] nvme: __nvme_submit_sync_cmd
[ 9.266088] nvme: NVME_QID_ANY
[ 9.266093] nvme_alloc_request
[ 9.275342] nvme_execute_rq
[ 9.275346] nvme: out
[ 9.275385] nvme nvme0: 8/0/0 default/read/poll queues
[ 9.275389] nvme: nvme_setup_io_queues
[ 9.275390] nvme: online queues >2
[ 9.275544] nvme: nvme_start_ctrl
[ 9.275553] nvme: __nvme_submit_sync_cmd
[ 9.275554] nvme: NVME_QID_ANY
[ 9.275558] nvme_alloc_request
[ 9.275559] nvme: buffer && bufferlen
[ 9.275562] nvme: blk_rq_map_kern
[ 9.280203] nvme_execute_rq
[ 9.280206] nvme: out
[ 9.280213] nvme: __nvme_submit_sync_cmd
[ 9.280214] nvme: NVME_QID_ANY
[ 9.280218] nvme_alloc_request
[ 9.280219] nvme: buffer && bufferlen
[ 9.280223] nvme: blk_rq_map_kern
[ 9.280544] nvme_execute_rq
[ 9.280547] nvme: out
[ 9.280555] nvme: __nvme_submit_sync_cmd
[ 9.280556] nvme: NVME_QID_ANY
[ 9.280560] nvme_alloc_request
[ 9.280561] nvme: buffer && bufferlen
[ 9.280565] nvme: blk_rq_map_kern
[ 9.280887] nvme_execute_rq
[ 9.280890] nvme: out
[ 9.280899] nvme: __nvme_submit_sync_cmd
[ 9.280900] nvme: NVME_QID_ANY
[ 9.280904] nvme_alloc_request
[ 9.280905] nvme: buffer && bufferlen
[ 9.280909] nvme: blk_rq_map_kern
[ 9.281230] nvme_execute_rq
[ 9.281234] nvme: out
[ 9.285242] nvme0n1: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15
Not working case:
[ 5.527326] nvme 0004:01:00.0: Adding to iommu group 6
[ 5.527756] nvme_pci_alloc_dev
[ 5.527796] nvme_dev_map
[ 5.527802] nvme_setup_prp_pools
[ 5.527805] nvme_pci_alloc_iod_mempool
[ 5.527807] nvme nvme0: pci function 0004:01:00.0
[ 5.527816] nvme_reset_ctrl
[ 5.527826] nvme: nvme_sync_queues
[ 5.527827] nvme: mutex_lock
[ 5.527828] nvme: nvme_pci_enable
[ 5.527870] nvme 0004:01:00.0: enabling device (0000 → 0002)
[ 5.527931] nvme: pci_set_master
[ 5.528119] nvme: pci_alloc_irq_vectors
[ 5.528124] nvme: lo_hi_readq
[ 5.528125] nvme: q_depth
[ 5.528125] nvme: ctrl.sqsize
[ 5.528126] nvme: db_stride
[ 5.528128] nvme: nvme_map_cmb
[ 5.528151] nvme: pci_enable_pcie_error_reporting
[ 5.528795] nvme: pci_save_state
[ 5.568109] nvme: nvme_pci_configure_admin_queue
[ 5.568156] nvme: nvme_alloc_admin_tags
[ 5.568158] nvme: dma_set_min_align_mask
[ 5.568158] nvme: ctrl.max_segments
[ 5.568159] nvme: dma_set_max_seg_size
[ 5.568159] nvme: mutex_unlock
[ 5.568160] nvme: nvme_change_ctrl_state
[ 5.568160] nvme: ctrl.max_integrity_segments
[ 5.568161] nvme: nvme_init_ctrl_finish function
[ 5.568163] nvme: reg_read32
[ 5.568164] nvme: min_t
[ 5.568164] nvme: NVME_CAP_NSSRC
[ 5.568164] nvme: before_nvme_init_identify
[ 5.568165] nvme_init_identify function)
[ 5.568165] nvme_identify_ctrl function
[ 5.568168] nvme: kmalloc
[ 5.568168] nvme: after kmalloc
[ 15.443770] ERROR: nvme0n1p1 not found
It seems the error happens in the function “nvme_identify_ctrl” and with the function call “*id = kmalloc(sizeof(struct nvme_id_ctrl), GFP_KERNEL);”.
Any idea what happens here?
We see the error also without pcie_aspm=off.
Thank you.
Could you also share the driver code with the patch you added as reference for us to check?
sevm89
July 3, 2024, 2:11pm
19
Yes of course:
core.c.txt (128.8 KB)
pci.c.txt (93.9 KB)
Let us know if you need anything else.
sevm89:
nvme_submit_sync_cmd
Why does the error look more like hang in this but not the kmalloc as you said?