Sdkmanager Drive Orin Flash OS 6.0.6 fails

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.6
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-300)
DRIVE AGX Orin Developer Kit (940-63710-0010-200)
DRIVE AGX Orin Developer Kit (940-63710-0010-100)
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
1.9.3.10904
other

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

Hello
I just tried to flash the new Drive Orin with sdkmanger and the flash fails.

I can minicom to ttyAMC0 and ttyACM1 and made sure to set the recovery off on tegra and do a reset , or these sites:

I am attaching full logs and driver installer log,

Any help will be greatly appreciated.
Allen

SDKM_logs_DRIVE_OS_6.0.6_SDK_Linux_for_DRIVE_AGX_Orin_DevKits_2023-08-07_16-52-01.zip (121.3 KB)
driveinstaller.log (260.8 KB)

Dear @servanti,
What is the DRIVE OS version on target before flashing DRIVE OS 6.0.6?
Is it a new Devkit and you are flashing first time?
Could you check flashing using docker container as well?
Do you see an additional NVIDIA device on host when running lsusbdevice when you put the target in recovery mode?

  1. Check lsusb on host
  2. Run tegrarecovery x1 on and tegrareset x1 on aurix console
  3. Check lsusb on host

Dear @ SivaRamaKrishnaNV

Thank you for the prompt reply.

Yes this is a brand new Orin Drive “out of the box” , please see the current OS installed

And lsusb on the host

The host is also a brand new installation of Ubuntu 20. I just installed enough dependencies to run sdkmanger , All host and target components
were installed by sdk manager, there were some errors on the host side installation but sdkmanager recovered, install cuda , cuDNN etc, and as a test I did try to compile the host and cross compile for the target, all the samples and it all succeeded.

I did try tegrarecovery x1 on and tegrareset on the aurix console. I have not tried the docker but I have never had to do that sinc PX2, Pegasus etc. A native host system always worked. I will try it but is there a reason why a fresh new Ubuntu host should not work?

dear SivaRamaKrishnaNV

Trying the docker image but having some issues. I remember a while a go we tried to use the dockers for Pegasus and had a bunch of issues. We really prefer using native ubuntu and sdkmanger if possible. Besides this is what I see in the logs, will the docker be any different?

2023-08-08 09:29:04,174 root DEBUG utilities.py 294 23862]
[2023-08-08 09:29:04,174 root DEBUG utilities.py 294 23862] [bootburn]: [exit(82)] : Exception in critical section :<class ‘OSError’>
[2023-08-08 09:29:04,174 root DEBUG utilities.py 294 23862]
[2023-08-08 09:29:04,174 root DEBUG utilities.py 294 23862] e[01;31mException caught in bootburn e[0m
[2023-08-08 09:29:04,174 root DEBUG utilities.py 294 23862] Traceback (most recent call last):
[2023-08-08 09:29:04,174 root DEBUG utilities.py 294 23862] File “/home/allen/nvidia/nvidia_sdk/DRIVE_OS_6.0.6_SDK_Linux_DRIVE_AGX_ORIN_DEVKITS/DRIVEOS/drive-foundation/tools/flashtools/bootburn/…/bootburn_t23x_py/bootburn.py”, line 286, in bootburn
[2023-08-08 09:29:04,174 root DEBUG utilities.py 294 23862] bootburnLib.CheckRecoveryTargets()
[2023-08-08 09:29:04,174 root DEBUG utilities.py 294 23862] File “/home/allen/nvidia/nvidia_sdk/DRIVE_OS_6.0.6_SDK_Linux_DRIVE_AGX_ORIN_DEVKITS/DRIVEOS/drive-foundation/tools/flashtools/bootburn/…/bootburn_t23x_py/bootburn_lib.py”, line 3587, in CheckRecoveryTargets
[2023-08-08 09:29:04,174 root DEBUG utilities.py 294 23862] self.aurix.GetTegrasAssocWithAurix(self.targetConfig.s_AurixPort)
[2023-08-08 09:29:04,175 root DEBUG utilities.py 294 23862] File “/home/allen/nvidia/nvidia_sdk/DRIVE_OS_6.0.6_SDK_Linux_DRIVE_AGX_ORIN_DEVKITS/DRIVEOS/drive-foundation/tools/flashtools/bootburn/…/bootburn_t23x_py/bootburn_aurix.py”, line 549, in GetTegrasAssocWithAurix
[2023-08-08 09:29:04,175 root DEBUG utilities.py 294 23862] AbnormalTermination(“Could not put {} in recovery”.format(name), nverror.NvError_ResourceError)
[2023-08-08 09:29:04,175 root DEBUG utilities.py 294 23862] File “/home/allen/nvidia/nvidia_sdk/DRIVE_OS_6.0.6_SDK_Linux_DRIVE_AGX_ORIN_DEVKITS/DRIVEOS/drive-foundation/tools/flashtools/bootburn/…/bootburn_t23x_py/flashtools_nverror.py”, line 249, in AbnormalTermination
[2023-08-08 09:29:04,175 root DEBUG utilities.py 294 23862] raise OSError(errorCode)
[2023-08-08 09:29:04,175 root DEBUG utilities.py 294 23862] OSError: 15
[2023-08-08 09:29:04,175 root INFO runner.py 37 23862] Failed to bind partitions!
[2023-08-08 09:29:04,175 root DEBUG runner.py 39 23862] Error on line 958
[2023-08-08 09:29:04,175 root DEBUG runner.py 40 23862] Exception info: Exception Type <class ‘module.errors.FailedToBindPartitionsError’>, Traceback <traceback object at 0x7f5b7c897848>

Hi

The docker image also fails on the flash. looked at this post

the issue has been solved.

Dear @servanti,
I see only one NVIDIA device in attached picture of lsusb output. As asked earlier, do you see additional NVIDIA device in lsusb output after setting board in recovery mode? You need to run lsusb before and after setting the board in recovery mode to see the difference. This confirm if the board goes into recovery mode.

Note that when using docker container/sdkmanager for flashing, the board should be in normal mode. Put the board in normal mode using tegrarecovery x1 off & tegrareset x1 on aurix console.

As the board has DRIVE OS 6.0.4, Please check valid migration paths(Requirements for Your Development Environment | NVIDIA Docs). DRIVE OS 6.0.4 → 6.0.6 needs an additional setting to force wipe of partitions in sdkmanager.

Also, we have improved the docker flashing experience. It avoids host system state issues while flashing.

We request you to share text message output logs instead of pictures. So that topic can searchable by others in community for similar issues.

Dear SivaRamaKrishnaNV

So , here is lsusb with the Orin in normal mode.

And here is lsusb with Orin in recovery


Sometimes I have to issue tegrarecovery twice to make sure the device is in recovery

I have tried all the options in the link that you provided by checking force wipe of user options and at least on my flash.py --init_persistent_partition is an invalid command.

I will do a few more tests but have not been able to see multiple nvidia devices on lsusb. There maybe an issue with the device going to recovery mode. I can force it to recovery on minicom Aurix console but as I said sometime I have to issue the command twice

Update

I have done some more tests , multiple USB ports usb hubs on two separate hosts just using minicom and trying to force tegrarecovery and many times it does not work. The Orin is sitting there logged in to the user. Aurix reports that th command was executed. Also on tegra reset I get some errors in Aurix console
but not sure if this is important, I remember seeing errors on Pegasus as well but FYI

also on tegrareset I also see these msgs on Aurix console AND it appears that Orin has stopped being in recovery mode.

NvShell>tegrareset x1
Info: Executing cmd: tegrareset, argc: 1, args: x1
NvShell>INFO: MCU_PLTFPWRMGR: Reseting
MCU_FOH: MCU FOH : Power state notification for Orin reset
INFO: BtChn_Cfg: No valid next bootchain loaded
INFO: NVMCU_ORINPWRCTRL: Tegra x1 Boot Chain: A
MCU_FOH: SPI : E2E_P05Check Status : 7 : 0
ERROR: MCU_ERRHANDLER: McuFoh E2E Frame Error
INFO: MCU_PLTFPWRMGR: Tegra reset trigger is complete !
Command Executed
MCU_FOH: MCU FOH : Power state notification for Orin Power On
MCU_FOH: SOC error pin is de-asserted
MCU_FOH: SOC error pin is asserted
MCU_FOH: Spi Transmit Started
MCU_FOH: SPI : E2E_P05Check Status : 7 : 0
MCU_FOH: ErrReport: ErrorCode-0x1012 ReporterId-0xe00e Error_Attribute-0x0 Timestamp-0x612e6c5
MCU_FOH: ErrReport: ErrorCode-0x89abcdef ReporterId-0x8013 Error_Attribute-0x0 Timestamp-0x612f955
MCU_FOH: Periodic Report: KeyOfSeed-0xffff
MCU_FOH: Periodic Report[0]:SystemFailureId-0xabcd, MaturationState-0xef, Failure_Attribute-0x22
MCU_FOH: Periodic Report[9]:SystemFailureId-0x1234, MaturationState-0x56, Failure_Attribute-0xee
MCU_FOH: ErrReport: ErrorCode-0x28c7 ReporterId-0xe04c Error_Attribute-0x0 Timestamp-0x94e6f4d
INFO : MCU_ISTMGR: IST Manager initialized to send/receive commands

Dear SivaRamaKrishnaNV

I have now ran multiple tests and I do not see Orin go into recovery mode. issuing tegrarecovery x1 on on the aurix console, does not work, Aurix claims to execute the command but the Orin remain up and running. I also tried to echo from the host “echo -e ‘tegrarecovery x1 on’ > /dev/ttyACM1” and saw the command arrive on Aurix and execute and no effect. I tried sudo reboot --force force-recovery and see some errors, the Orin did not reboot and the process was aborted. Here is the log from a minicom session on ttyACM0 .
I see

Input Fault Address: 0x7ffffa0000
Access Type: Write

and

Fatal programming error:
MC Fault

recoveryreboot.log (17.8 KB)

Dear @servanti ,
I am assuming you don’t see additional NVIDIA device when you run tegrarecovery x1 on followed by tegrareset x1 aurix console.

We see restarting host/target helped to fix for a customer at Updating Drive OS via docker - #16 by 0xdeadbeef. Please check the discussion to see if it helps.

Dear @servanti,
Also, please check flashing Aurix from orin(Flashing Basics | NVIDIA Docs) and update the status for further guidance

Dear SivaRamaKrishnaNV
No I have never seen multiple nvidia devices on lsusb after doiing tegrarecovery and tegrareset command. I don’t even see Orin go in to recovery mode after I issue tegrarecovery x1 on. I also see this error when flash fails:
AbnormalTermination(“Could not put {} in recovery”.format(name), nverror.NvError_ResourceError)

Here is another odd error :
Info: Executing cmd: xxversion, argc: 0, args:
Error: Unknown command
Invalid Command

sdkmanager is sending xxversion? that is an invalid command

tried this on host while watching minicom aurix console console :
echo ‘version’ > /dev/ttyACM1
and this is on the console

NvShell>version
Info: Executing cmd: version, argc: 0, args:
DRIVE-V6.0.4-P3710-AFW-Aurix-StepB-5.05.03
Command Executed

Also per your suggestion I did
`
/etc/systemd/scripts/nv_aurix_check_fw.sh

Here are some tests based on the other thread that you suggested
aurix : tegrareset x1 h
host : lsusb -d 0955:
Bus 003 Device 013: ID 0955:7045 NVIDIA Corp.
aurix: tegrarecovery x1 on
host: Bus 003 Device 013: ID 0955:7045 NVIDIA Corp.

Anyway , I am not sure why this device does not go in to recovery mode and therefore cannot be flashed .

driveinstaller.log (70.8 KB)

Dear @servanti,
I notice you said “I can force it to recovery on minicom Aurix console but as I said sometime I have to issue the command twice”

Please confirm if you see two NVIDIA devices in lsusb in this case?

Generally, if board goes to recovery mode, we should see two NVIDIA devices in lsusb on host.

If board goes in recovery mode, we can try manual flashing steps. If not, I will check internally on how to proceed with this case.

I am assuming the board was never flashed and you are trying to flash first time with this new board. Please share below things.

  1. Is this board used at office or car?

  2. Did you receive wire to connect host and target in Devkit and using the same wire?

  3. Could you check changing host target connection wires?

  4. Could you share the board connection pictures as well?

  5. Could you check and share the events on host when running tegrarecovery x1 on tegrareset x1 . You can monitor the events using sudo udevadm monitor command on host

Dear @servanti,
Could you check the connections same as DRIVE Installer failed with status 140 - #11 by gabriel.kiss ?

Dear SivaRamaKrishnaNV

Just replied to your earlier comment. Here is what I have

It seems you haven’t connected to 'the LEFT USB Type-C port. Please refer to DRIVE Installer failed with status 140 - #9 by VickNV to resolve your problem.

dear VickNV

Thank you so much , so Orin needs tow connections for flashing, this is a first :)

Yes, indeed. To ensure a successful flashing process, please refer to the specific section mentioned in the document.

Dear @servanti,
Is the issue resolved after right connection between host and target? Could you provide an update?

Dear SivaRamaKrishnaNV
Yes thank you so much, I thought that I had replied on one of the posts. Yes the issue is resolved. None of our previous NVIDIA devices (Pegasus, PX2 etc) needed 2 USB connections.

1 Like