SDKmanager hangs while flashing (skip to comment 13 for status)

Hi,

I’ve been trying flash my jetson nano via sdkmanager. However after a failed flash, the device no longer shows up on sdkmanager, as well as ‘dmesg—follow’ and ‘lsusb` under force recovery mode .

Can anyone give some insights?

I’m using Ubuntu Installed on a Mac (No VM)

I did a serial console, and it shows this log:

[0000.126] [TegraBoot] (version 00.00.2018.01-l4t-33e7fa82)
[0000.131] Processing in cold boot mode Bootloader 2
[0000.135] A02 Bootrom Patch rev = 1023
[0000.139] Power-up reason: pmc por
[0000.142] No Battery Present
[0000.145] pmic max77620 reset reason
[0000.148] pmic max77620 NVERC : 0x40
[0000.152] RamCode = 0
[0000.154] Platform has DDR4 type RAM
[0000.157] max77620 disabling SD1 Remote Sense
[0000.161] Setting DDR voltage to 1125mv
[0000.165] Serial Number of Pmic Max77663: 0x1032e2
[0000.173] Entering ramdump check
[0000.176] Get RamDumpCarveOut = 0x0
[0000.179] RamDumpCarveOut=0x0,  RamDumperFlag=0xe59ff3f8                       
[0000.184] Last reboot was clean, booting normally!                             
[0000.189] Sdram initialization is successful                                   
[0000.193] SecureOs Carveout Base=0x00000000ff800000 Size=0x00800000            
[0000.199] Lp0 Carveout Base=0x00000000ff780000 Size=0x00001000                 
[0000.205] BpmpFw Carveout Base=0x00000000ff700000 Size=0x00080000              
[0000.211] GSC1 Carveout Base=0x00000000ff600000 Size=0x00100000                
[0000.217] GSC2 Carveout Base=0x00000000ff500000 Size=0x00100000                
[0000.223] GSC4 Carveout Base=0x00000000ff400000 Size=0x00100000                
[0000.228] GSC5 Carveout Base=0x00000000ff300000 Size=0x00100000                
[0000.234] GSC3 Carveout Base=0x000000017f300000 Size=0x00d00000                
[0000.250] RamDump Carveout Base=0x00000000ff280000 Size=0x00080000             
[0000.256] Platform-DebugCarveout: 0                                            
[0000.260] Nck Carveout Base=0x00000000ff080000 Size=0x00200000                 
[0000.266] Non secure mode, and RB not enabled.                                 
[0000.272] Invalid GPT Partition                                                
[0000.287] Using BFS PT to query partitions                                     
[0000.291] failed to load NvTbootTbootCpu from (2:0)                            
[0000.296] re-load NvTbootTbootCpu from (4:0)                                   
[0000.300] Error mask set in wait for cmd complete with error 0x3 in HwSdmmcWai 
[0000.310] Command complete wait failed with error 0x3 Interrupt 0x18001        
[0000.317] Number of retries left 4                                             
[0000.322] Error mask set in wait for cmd complete with error 0x3 in HwSdmmcWai 
[0000.332] Command complete wait failed with error 0x3 Interrupt 0x18001        
[0000.338] Number of retries left 4                                             
[0000.342] Error mask set in wait for cmd complete with error 0x3 in HwSdmmcWai 
[0000.352] Command complete wait failed with error 0x3 Interrupt 0x18000        
[0000.358] Number of retries left 3                                             
[0000.362] Error mask set in wait for cmd complete with error 0x3 in HwSdmmcWai 
[0000.372] Command complete wait failed with error 0x3 Interrupt 0x18000        
[0000.378] Number of retries left 2                                             
[0000.381] Error mask set in wait for cmd complete with error 0x3 in HwSdmmcWai 
[0000.392] Command complete wait failed with error 0x3 Interrupt 0x18001        
[0000.398] Number of retries left 1                                             
[0000.401] Error mask set in wait for cmd complete with error 0x3 in HwSdmmcWai 
[0000.411] Command complete wait failed with error 0x3 Interrupt 0x18001        
[0000.418] Number of retries left 0                                             
[0000.421] Send command failed with 0x3                                         
[0000.425] CMD41 send failed with error 0x3 in SdIdentifyCard func at 1850 line 
[0000.432] Identify card failed with 0x3                                        
[0000.435] SdIdentifyCard has failed with error 0x3 in NvTbootSdmmcInit func at 
[0000.443] Sdmmc Init failed with 0x3 error                                     
[0000.454] Error mask set in wait for cmd complete with error 0x3 in HwSdmmcWai 
[0000.464] Command complete wait failed with error 0x3 Interrupt 0x18000        
[0000.470] Number of retries left 4                                             
[0000.474] Error mask set in wait for cmd complete with error 0x3 in HwSdmmcWai 
[0000.484] Command complete wait failed with error 0x3 Interrupt 0x18000        
[0000.490] Number of retries left 3                                             
[0000.493] Error mask set in wait for cmd complete with error 0x3 in HwSdmmcWai 
[0000.504] Command complete wait failed with error 0x3 Interrupt 0x18001        
[0000.510] Number of retries left 2                                             
[0000.513] Error mask set in wait for cmd complete with error 0x3 in HwSdmmcWai 
[0000.523] Command complete wait failed with error 0x3 Interrupt 0x18000        
[0000.530] Number of retries left 1                                             
[0000.533] Error mask set in wait for cmd complete with error 0x3 in HwSdmmcWai 
[0000.543] Command complete wait failed with error 0x3 Interrupt 0x18001        
[0000.550] Number of retries left 0                                             
[0000.553] Send command failed with 0x3                                         
[0000.556] CMD41 send failed with error 0x3 in SdIdentifyCard func at 1850 line 
[0000.563] Identify card failed with 0x3                                        
[0000.567] SdIdentifyCard has failed with error 0x3 in NvTbootSdmmcInit func at 
[0000.575] Sdmmc Init failed with 0x3 error                                     
[0000.579] Error is 3

Those are USB errors. I couldn’t tell you the source though, maybe it is some quirk of the Mac. Not using the original supplied cable would be high on the list since many third party cables are really just charger cables and not data cables. Sometimes a HUB or port might be an issue.

Hi,

Thank you for the insight.

Unfortunately, the jetson nano doesn’t come with an original supplies cable. Could recommend me a third party cable that is suitable for data transfer?

The supplied cable is very reliable. I can’t actually suggest a third party cable…if the supplied cable fails, then it won’t be the cable’s fault. Even if you don’t flash from another computer, would you be able to put the Jetson in recovery mode and check the lsusb output on another Linux computer? I realize that may be problematic, but if you have a way to check this by swapping out the computer, then this would be by far the simplest next test. Right now it is either Nano failure or some quirk of interaction with that Mac. If it turns out to be a quirk with the Mac, then an RMA would be a huge waste of time for you (and you’d still be in the same situation).

Hi Linuxdev,

Thank you for the help. I really appreciate it.

As you mentioned, i got myself a data cable (unfortunately its 3rd party, but its from a reputable vendor: Anker), and my computer was finally able to detect the Jetson nano!!!

However… while flashing jetpack 4.2.1 with the sdkmanager, it’s been stuck at 99.9% for around 4-6 hours…

Like you mentioned before, do you think it’s due to the macbook usb? or third party micro usb?

You’d have to see more information on USB to see why it is hanging. One thing to do is run “dmesg --follow” on the host PC during the flash and see if any USB errors show up. There are some cases of USB hanging up, many of which are associated with a VM, and a much smaller percentage due to USB issues. If you can flash on command line and provide a log of that, then perhaps more information could be gathered.

If you’ve flashed with JetPack/SDKM, then you will have a “Linux_for_Tegra/” subdirectory within the flash. The sample rootfs would have already been set up. Start by making sure you have space (lots), and that it is type ext4 (maybe cd to that location and run “df -H -T .”). Then:

sudo ./flash jetson-nano-qspi-sd mmcblk0p1 2>&1 | tee log_flash.txt

If you hover your mouse over the quote icon of one of your existing posts, then other icons will show up. The paper clip icon is used for attaching file. Or you can click the “code” icon (looks like “</>”), and paste between the code tags.

In the case of a USB error it isn’t usually possible to say if it is a cable or the host issue unless there are other clues to go by. Definitely it doesn’t take but perhaps an hour or less in most cases. FYI, very few parts dealers even know the difference between a cable intended for charging versus one for data. It is far more rare to actually see something in a cable advertisement which identifies intended use with data.

Hi,

I followed your instructions, and got some really strange errors.

Using `dmesg --follow, there arent any errors with the connection between USB and Jetson Nano. However, while flashing the device, i’ve recieved a strange error which continues to repeat itself every 2 minutes:

[12446.442860] INFO: task tegrarcm:28111 blocked for more than 120 seconds.
[12446.442864]       Tainted: P           OE     5.0.0-23-generic #24~18.04.1-Ubuntu
[12446.442865] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12446.442867] tegrarcm        D    0 28111  28043 0x20020000
[12446.442870] Call Trace:
[12446.442878]  __schedule+0x2bd/0x850
[12446.442881]  schedule+0x2c/0x70
[12446.442883]  schedule_timeout+0x1db/0x360
[12446.442887]  ? __wake_up+0x13/0x20
[12446.442889]  wait_for_completion_timeout+0xb3/0x140
[12446.442892]  ? wake_up_q+0x80/0x80
[12446.442896]  usb_start_wait_urb+0x8c/0x180
[12446.442898]  usb_bulk_msg+0xb8/0x160
[12446.442901]  proc_bulk+0x24a/0x380
[12446.442904]  usbdev_do_ioctl+0xb8e/0x1180
[12446.442907]  ? call_rcu+0x10/0x20
[12446.442910]  ? __fput+0x14b/0x230
[12446.442913]  usbdev_compat_ioctl+0x10/0x20
[12446.442916]  __ia32_compat_sys_ioctl+0xd6/0x240
[12446.442919]  do_int80_syscall_32+0x5b/0x120
[12446.442922]  entry_INT80_compat+0x85/0x90
[12446.442924] RIP: 0023:0x80776a4
[12446.442930] Code: Bad RIP value.
[12446.442932] RSP: 002b:00000000ff933588 EFLAGS: 00000287 ORIG_RAX: 0000000000000036
[12446.442933] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00000000c0105502
[12446.442934] RDX: 00000000ff9335b0 RSI: 000000000000000c RDI: 0000000008e5e698
[12446.442935] RBP: 00000000ff933624 R08: 0000000000000000 R09: 0000000000000000
[12446.442936] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[12446.442937] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

For the Jetson nano flash logs, I’ve attached a flash_log.txt to this post. The log shows that it refuses to flash past the point at:

[   0.7937 ] Applet version 00.01.0000

It’s been stuck like this while simultaneously repeating the ‘tegrarcm:28111 blocked’ error in dmesg —follow

This tegrarcm problem is very odd. What do you make out of this?
log_flash.txt (16.2 KB)

The flash log itself looks normal, and then it just stops. I just want to verify, is the dmesg from the host PC during the flash? Assuming this is the case, then you’re getting a USB error from the host itself. I notice wake is involved, I’m thinking perhaps your host has a bug going into (or not recovering from) a sleep (or similar) mode on USB. Host side autosuspend might be doing something wrong.

If this is the case that it is autosuspend from host side (which isn’t very likely, but it is still reasonably possible, especially if this is a laptop host), I’m going to guess the computer was sitting alone by itself for some time during flash. Once it got to a point where the host would try to put the flash USB to sleep (and it shouldn’t unless something is wrong) this would occur.

I’m going to suggest something simple, but not something not normally suggested…connect your mouse and the Jetson to the same USB HUB and make sure to move the mouse once in a while during the flash. Or you could disable autosuspend by going into the GRUB command line and adding this to the kernel arguments (temporary if you only boot from GRUB command line edit):

usbcore.autosuspend=-1

(the Advanced Options from a GRUB menu offers to drop to command line and to customize an entry for one boot without it saving permanently…don’t hit the enter key, look at the shortcuts listed at the bottom and typically it is CTRL-x to boot with the edit in place)

I don’t know if that will do anything, and the easiest direction to take is to try from a different host, but I know that probably is less convenient. I’m hoping you’re just running into autosuspend (and a recovery mode device being flashed should never suspend during the flash).

Hi Linuxdev,

Thank you very much for the insight.

Just to confirm that the dmesg log is from the host pc, and the tegrarcm error shows up only when flashing.

As from your instructions, I’ve plugged both the nano and mouse on the same USB hub, as well as disabling the USB auto suspend (not via GRUB, but by using a power management tool called TLP).

Unfortunately, the logs from ‘flash’ and ’dmesg’’ still outputs the same problems. I guess the last resort is to use another pc.

Btw, I’ve been booting Ubuntu OS using either an external HDD or SD reader. Although both options have outputted the same flash results, do you think this is the potential problem?

Sorry for late reply. What is the latest status? Are you able to flash now?

If the external HDD or SD reader consumes bandwidth from the same host root HUB, then possibly there could be an issue. Use “lsusb -t” on the host to find out if the Jetson and the external HDD/SD are on the same root HUB (“Class=root_hub” will show as the root of the branch…most systems have more than one root_hub, but external ports may all branch to the same root_hub).

Hi Wayne,Thankyou for the assistance.

Up until now, i’ve had no success flashing with the sdkmanager.

For the current status, flashing with the sdkmanager hangs at this certain point on the terminal:

[   0.6255 ] Applet version 00.01.0000
[   0.6274 ] Sending ebt
[   0.6279 ] [................................................] 100%
[   0.7327 ] Sending rp1
[   0.7373 ] [................................................] 100%
[   0.7872 ] 
[   0.7895 ] tegrarcm --boot recovery
[   0.7937 ] Applet version 00.01.0000

When it reaches this certain point, the dmesg logs (Host PC) outputs USB timeouts and repeat this output every 2 minutes:

[12446.442860] INFO: task tegrarcm:28111 blocked for more than 120 seconds.
    [12446.442864]       Tainted: P           OE     5.0.0-23-generic #24~18.04.1-Ubuntu
    [12446.442865] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [12446.442867] tegrarcm        D    0 28111  28043 0x20020000
    [12446.442870] Call Trace:
    [12446.442878]  __schedule+0x2bd/0x850
    [12446.442881]  schedule+0x2c/0x70
    [12446.442883]  schedule_timeout+0x1db/0x360
    [12446.442887]  ? __wake_up+0x13/0x20
    [12446.442889]  wait_for_completion_timeout+0xb3/0x140
    [12446.442892]  ? wake_up_q+0x80/0x80
    [12446.442896]  usb_start_wait_urb+0x8c/0x180
    [12446.442898]  usb_bulk_msg+0xb8/0x160
    [12446.442901]  proc_bulk+0x24a/0x380
    [12446.442904]  usbdev_do_ioctl+0xb8e/0x1180
    [12446.442907]  ? call_rcu+0x10/0x20
    [12446.442910]  ? __fput+0x14b/0x230
    [12446.442913]  usbdev_compat_ioctl+0x10/0x20
    [12446.442916]  __ia32_compat_sys_ioctl+0xd6/0x240
    [12446.442919]  do_int80_syscall_32+0x5b/0x120
    [12446.442922]  entry_INT80_compat+0x85/0x90
    [12446.442924] RIP: 0023:0x80776a4
    [12446.442930] Code: Bad RIP value.
    [12446.442932] RSP: 002b:00000000ff933588 EFLAGS: 00000287 ORIG_RAX: 0000000000000036
    [12446.442933] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00000000c0105502
    [12446.442934] RDX: 00000000ff9335b0 RSI: 000000000000000c RDI: 0000000008e5e698
    [12446.442935] RBP: 00000000ff933624 R08: 0000000000000000 R09: 0000000000000000
    [12446.442936] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    [12446.442937] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

Thankyou for the reply. I’m not too sure what to make out of this log, i have attached both lsusb and lsusb -t below

lsusb -t
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Mass Storage, Driver=uas, 5000M
    |__ Port 3: Dev 3, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/9p, 480M
    |__ Port 2: Dev 12, If 0, Class=Vendor Specific Class, Driver=, 480M
    |__ Port 3: Dev 3, If 0, Class=Hub, Driver=hub/3p, 12M
        |__ Port 3: Dev 10, If 2, Class=Vendor Specific Class, Driver=btusb, 12M
        |__ Port 3: Dev 10, If 0, Class=Vendor Specific Class, Driver=btusb, 12M
        |__ Port 3: Dev 10, If 3, Class=Application Specific Interface, Driver=, 12M
        |__ Port 3: Dev 10, If 1, Class=Wireless, Driver=btusb, 12M
    |__ Port 5: Dev 5, If 0, Class=Human Interface Device, Driver=usbhid, 12M
    |__ Port 5: Dev 5, If 1, Class=Human Interface Device, Driver=usbhid, 12M
    |__ Port 5: Dev 5, If 2, Class=Human Interface Device, Driver=bcm5974, 12M
lsusb
Bus 002 Device 003: ID 05ac:8406 Apple, Inc. 
Bus 002 Device 002: ID 059f:106b LaCie, Ltd 
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 005: ID 05ac:0259 Apple, Inc. 
Bus 001 Device 010: ID 05ac:8289 Apple, Inc. 
Bus 001 Device 003: ID 0a5c:4500 Broadcom Corp. BCM2046B1 USB 2.0 Hub (part of BCM2046 Bluetooth)
Bus 001 Device 012: ID 0955:7f21 NVidia Corp. 
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Caldarie,

Flash problem may be due to hardware (broken nano) or maybe host issue

  1. If you have more than one jetson nano, please try to check if only one of them has the problem.
  2. If only have one nano, please try different host and see if it can flash.

If 1 & 2 are not working -> file a RMA request for new module.
https://developer.nvidia.com/embedded/faq