UEFI Does Not recognize Micron NVMe as Bootable

Similar to these two issues 1 and 2, My custom carrier card flashed with JetPack r35.4.1 will not boot from our soldered on NVMe drive. The flashing process finishes successfully and we had previously used the nvme drive as storage on the Xavier NX but are now having to use it as the primary boot drive for the Orin Nano.

The device shows in uefi shell when running pci. I enabled many uefi logs and found this error in them.

NvmExpressDriverBindingStart: start^M
Cc.En: 0^M
Cc.Css: 0^M
Cc.Mps: 0^M
Cc.Ams: 0^M
Cc.Shn: 0^M
Cc.Iosqes: 0^M
Cc.Iocqes: 0^M
NVMe controller is disabled with status [Success].^M
Private->Buffer = [0000000047C96000]^M
Admin     Submission Queue size (Aqa.Asqs) = [00000001]^M
Admin     Completion Queue size (Aqa.Acqs) = [00000001]^M
Admin     Submission Queue (SqBuffer[0]) = [0000000047C96000]^M
Admin     Completion Queue (CqBuffer[0]) = [0000000047C97000]^M
Sync  I/O Submission Queue (SqBuffer[1]) = [0000000047C98000]^M
Sync  I/O Completion Queue (CqBuffer[1]) = [0000000047C99000]^M
Async I/O Submission Queue (SqBuffer[2]) = [0000000047C9A000]^M
Async I/O Completion Queue (CqBuffer[2]) = [0000000047C9B000]^M
Aqa.Asqs: 1^M
Aqa.Acqs: 1^M
Asq: 247C96000^M
Acq: 247C97000h^M
Cc.En: 1^M
Cc.Css: 0^M
Cc.Mps: 0^M
Cc.Ams: 0^M
Cc.Shn: 0^M
Cc.Iosqes: 6^M
Cc.Iocqes: 4^M
NVMe controller is enabled with status [Success].^M
 == NVME IDENTIFY CONTROLLER DATA ==^M 
    PCI VID   : 0x1344^M
    PCI SSVID : 0x1344^M
    SN        : 22303A0579DF        ^M
    MN        : MTFDHBL128TDP                           ^M
    FR        : 0x3430554D^M
    TNVMCAP (high 8-byte) : 0x0^M
    TNVMCAP (low 8-byte)  : 0x1DCF856000^M
    RAB       : 0x6^M
    IEEE      : 0xA075^M
    AERL      : 0x8^M
    SQES      : 0x66^M
    CQES      : 0x44^M
    NN        : 0x4^M
NvmExpressDriverBindingStart: end with Device Error^M

here are the rest of the bootlogs
screenlog.0.txt (460.5 KB)

Hi,

Just to make sure:
Does the error only happen on specific Micron drives?
Have you tried other NVMe drives and found out they are working fine?

Our NVMe is a soldered on IC on our carrier card so we do not have the ability to swap drives.

Hi,

Can you please boot the device with a USB drive, login to the system, and see if the NVMe drive can be detected as /dev/nvme0n1?
Or verify the same NVMe drive with a DevKit as the carrier board.

I don’t feel like you are supposed to do this before validating the compatibility.

I can try and boot with a usb drive. We have been using this carrier card successfully with the TX2 NX and Xavier NX, but both booted from emmc and used the nvme IC for extra storage.

I do not have a usb drive where I am working remotely, but I have added some logs to the edk2-35.4.1 source code to try and narrow down the issue.

So far it seems to be an issue with the number of IO Completion Queues being created.

NvmExpressDriverBindingStart: start^M
Cc.En: 0^M
Cc.Css: 0^M
Cc.Mps: 0^M
Cc.Ams: 0^M
Cc.Shn: 0^M
Cc.Iosqes: 0^M
Cc.Iocqes: 0^M
NVMe controller is disabled with status [Success].^M
Private->Buffer = [0000000047D4B000]^M
Admin     Submission Queue size (Aqa.Asqs) = [00000001]^M
Admin     Completion Queue size (Aqa.Acqs) = [00000001]^M
Admin     Submission Queue (SqBuffer[0]) = [0000000047D4B000]^M
Admin     Completion Queue (CqBuffer[0]) = [0000000047D4C000]^M
Sync  I/O Submission Queue (SqBuffer[1]) = [0000000047D4D000]^M
Sync  I/O Completion Queue (CqBuffer[1]) = [0000000047D4E000]^M
Async I/O Submission Queue (SqBuffer[2]) = [0000000047D4F000]^M
Async I/O Completion Queue (CqBuffer[2]) = [0000000047D50000]^M
Aqa.Asqs: 1^M
Aqa.Acqs: 1^M
Asq: 247D4B000^M
Acq: 247D4C000h^M
Cc.En: 1^M
Cc.Css: 0^M
Cc.Mps: 0^M
Cc.Ams: 0^M
Cc.Shn: 0^M
Cc.Iosqes: 6^M
Cc.Iocqes: 4^M
NVMe controller is enabled with status [Success].^M
 == NVME IDENTIFY CONTROLLER DATA ==^M 
    PCI VID   : 0x1344^M
    PCI SSVID : 0x1344^M
    SN        : 22303A0579DF        ^M
    MN        : MTFDHBL128TDP                           ^M
    FR        : 0x3430554D^M
    TNVMCAP (high 8-byte) : 0x0^M
    TNVMCAP (low 8-byte)  : 0x1DCF856000^M
    RAB       : 0x6^M
    IEEE      : 0xA075^M
    AERL      : 0x8^M
    SQES      : 0x66^M
    CQES      : 0x44^M
    NN        : 0x4^M
NvmeCreateIoCompletionQueue: Failed to send command packet for queue index (2). (Device Error) ^M
NvmeControllerInit: failed to create io completion queue^M
NvmExpressDriverBindingStart: Failed to Init Controller (Device Error)^M
NvmExpressDriverBindingStart: end with Device Error^M

Notably the Admin section says there is 1 queue for submission and completion, but the source code for the Creation of the completion queue has a for loop that goes around twice 1 to index < NVME_MAX_QUEUES which is 3. On the second iteration there is a failure to send the Command Packet using the Passthru protocol.

I was able to boot the NVMe drive by making modifications to the edk2 source code. The core change was reducing the loop upper bound from NVME_MAX_QUEUES to 2 in both the NvmeCreateIoCompletionQueue and NvmeCreateIoSubmissionQueue functions. I also added a bunch of print statements for locating the issue. Use this patch at your own risk, but it did work for me to get around this issue temporarily. I have no idea what effects it will have on other NVMe drives.

queue-modifications.patch.txt (5.4 KB)

1 Like

I’ve also run into this same issue with this particular model of Micron NVMe drive. I was able to fix it by updating the firmware on the NVMe drive.

Do you know what firmware version you updated to? I did just today find out that there was a firmware update available but it didn’t appear to list my issue in it’s changelogs.

I used the “M05” version, specifically from a file named “2100AI-AT_FFU_to_MU05.zip”. I see Micron has released a M05.1 version recently, but I haven’t tried that yet.

1 Like