Production nano - stuck on nVidia bootloader screen

Hi guys,

We’re using a few hundred Jetson Nano production modules in our products, and we flash them all via USB with the same OS image. Our image is based on r32.5.2 and ubuntu-base-18.04.5-base-arm64.tar.gz and flashed to the modules eMMC media.

We have had multiple modules that seem to refuse to boot and only display the nVidia boot logo and don’t proceed past this.

The carrier board is a Leopard Imaging LI-NANO-CB.

When this occurs, we’ve just been replacing the Jetson Nano module with a different one and sending the module that fails to boot back under RMA.

Given that this seems to happen quite often - and another module with the same OS and carrier board will boot fine, we don’t believe it to be a problem with our OS image.

I’m wondering if there could happen to be a bootloader / firmware update for the nano that may be available which could resolve the booting issue - and save us from doing an RMA on these units.

Yesterday, I built up 24 x Jetson nano + carrier board modules - and only one out of 24 will constantly fail to boot and only display the nVidia boot logo. When this happens, we also see the ethernet link established - but only at 10 or 100Mbit - vs when it boots, we get full 1Gbps link speed.

Hi steven.haigh,

It seems the custom board from Leopard Imaging Inc.

Does they release current Jetson Linux R32.7.3 | NVIDIA Developer for Jetson Nano?

You could try if the issue still exists with latest Jetson Linux (r32.7.3), or you could provide the serial console log for further check.

Hi Kevin,

Thanks for the reply.

We don’t seem to get anything out of the DEBUG port. I can see that it comes up with /dev/ttyUSB0 on my laptop - and if I try a random assortment of modules, some will output a serial boot log - however the ones that fail to boot don’t seem to output anything.

If I hook up a HDMI monitor, I see the nVidia boot logo - but nothing further.

EDIT: When we write our OS Image to the nano, we use ./flash.sh jetson-nano-emmc mmcblk0p1 - is there another command that might update firmware / bootloader level stuff?

I’ve had a browse through the tools included with the BSP - but nothing stands out to me as obvious that these tools exist.

For additional info, I managed to get another module today that sometimes fails to boot.

When it does boot, I can see a few things about an incompatible EEPROM - but nothing stands out as being weird. Reproduced the boot logs below:

[0000.229] [L4T TegraBoot] (version 00.00.2018.01-l4t-dd84d362)
[0000.235] Processing in cold boot mode Bootloader 2
[0000.239] A02 Bootrom Patch rev = 1023
[0000.243] Power-up reason: pmc por
[0000.246] No Battery Present
[0000.249] pmic max77620 reset reason
[0000.252] pmic max77620 NVERC : 0xd0
[0000.255] RamCode = 0
[0000.258] Platform has DDR4 type RAM
[0000.261] max77620 disabling SD1 Remote Sense
[0000.265] Setting DDR voltage to 1125mv
[0000.269] Serial Number of Pmic Max77663: 0x32d73
[0000.277] Entering ramdump check
[0000.280] Get RamDumpCarveOut = 0x0
[0000.283] RamDumpCarveOut=0x0,  RamDumperFlag=0xe59ff3f8
[0000.288] Last reboot was clean, booting normally!
[0000.293] Sdram initialization is successful
[0000.297] SecureOs Carveout Base=0x00000000ff800000 Size=0x00800000
[0000.303] Lp0 Carveout Base=0x00000000ff780000 Size=0x00001000
[0000.309] BpmpFw Carveout Base=0x00000000ff700000 Size=0x00080000
[0000.315] GSC1 Carveout Base=0x00000000ff600000 Size=0x00100000
[0000.321] GSC2 Carveout Base=0x00000000ff500000 Size=0x00100000
[0000.326] GSC4 Carveout Base=0x00000000ff400000 Size=0x00100000
[0000.332] GSC5 Carveout Base=0x00000000ff300000 Size=0x00100000
[0000.338] GSC3 Carveout Base=0x000000017f300000 Size=0x00d00000
[0000.354] RamDump Carveout Base=0x00000000ff280000 Size=0x00080000
[0000.360] Platform-DebugCarveout: 0
[0000.364] Nck Carveout Base=0x00000000ff080000 Size=0x00200000
[0000.369] Non secure mode, and RB not enabled.
[0000.386] Csd NumOfBlocks=0
[0000.582] *** Booting BFS0.
[0000.584] Read PT from (0:3)
[0000.590] Using BFS PT to query partitions
[0000.595] PT: Partition LNX NOT found !
[0001.195] *** Booting KFS0.
[0001.197] BoardID = 3448, SKU = 0x2
[0001.201] Not Nano-SD or !QSPI-ONLY, check GPT table first ...
[0001.206] Read GPT from (0:3)
[0001.214] Using GPT Primary to query partitions
[0001.219] Loading Tboot-CPU binary
[0001.226] Verifying TBC in OdmNonSecureSBK mode
[0001.236] Bootloader load address is 0xa0000000, entry address is 0xa0000258
[0001.243] Bootloader downloaded successfully.
[0001.247] Downloaded Tboot-CPU binary to 0xa0000258
[0001.252] MAX77620_GPIO5 configured
[0001.255] CPU power rail is up
[0001.258] CPU clock enabled
[0001.262] Performing RAM repair
[0001.265] Updating A64 Warmreset Address to 0xa00002e9
[0001.270] BoardID = 3448, SKU = 0x2
[0001.273] Not Nano-SD or !QSPI-ONLY, check GPT table first ...
[0001.279] Loading NvTbootBootloaderDTB
[0001.296] Verifying NvTbootBootloaderDTB in OdmNonSecureSBK mode
[0001.370] Bootloader DTB Load Address: 0x83000000
[0001.375] BoardID = 3448, SKU = 0x2
[0001.378] Not Nano-SD or !QSPI-ONLY, check GPT table first ...
[0001.384] Loading NvTbootKernelDTB
[0001.401] Verifying NvTbootKernelDTB in OdmNonSecureSBK mode
[0001.475] Kernel DTB Load Address: 0x83100000
[0001.479] BoardID = 3448, SKU = 0x2
[0001.482] Not Nano-SD or !QSPI-ONLY, check GPT table first ...
[0001.489] Loading cboot binary
[0001.500] Verifying EBT in OdmNonSecureSBK mode
[0001.542] Bootloader load address is 0x92c00000, entry address is 0x92c00258
[0001.549] Bootloader downloaded successfully.
[0001.553] BoardID = 3448, SKU = 0x2
[0001.556] Not Nano-SD or !QSPI-ONLY, check GPT table first ...
[0001.563] PT: Partition NCT NOT found !
[0001.566] Warning: Find Partition via PT Failed
[0001.571] Next binary entry address: 0x92c00258
[0001.575] BoardId: 3448
[0001.580] Overriding pmu board id with proc board id
[0001.584] Display board id is not available
[0001.589] BoardID = 3448, SKU = 0x2
[0001.592] Not Nano-SD or !QSPI-ONLY, check GPT table first ...
[0001.605] Verifying SC7EntryFw in OdmNonSecureSBK mode
[0001.662] /bpmp deleted
[0001.665] SC7EntryFw header found loaded at 0xff700000
[0001.873] OVR2 PMIC
[0001.875] Bpmp FW successfully loaded
[0001.879] BoardID = 3448, SKU = 0x2
[0001.882] Not Nano-SD or !QSPI-ONLY, check GPT table first ...
[0001.889] WB0 init successfully at 0xff780000
[0001.893] Set NvDecSticky Bits
[0001.897] GSC2 address ff53fffc value c0edbbcc
[0001.903] GSC MC Settings done
[0001.906] BoardID = 3448, SKU = 0x2
[0001.909] Not Nano-SD or !QSPI-ONLY, check GPT table first ...
[0001.916] TOS Image length 53680
[0001.919]  Monitor size 53680
[0001.922]  OS size 0
[0001.927] Secure Os AES-CMAC Verification Success!
[0001.932] TOS image cipher info: plaintext
[0001.936] Loading and Validation of Secure OS Successful
[0001.952] SC7 Entry Firmware - 0xff700000, 0x4000
[0001.956] NvTbootPackSdramParams: start.
[0001.961] NvTbootPackSdramParams: done.
[0001.965] Tegraboot started after 158967 us
[0001.969] Basic modules init took 1478076 us
[0001.973] Sec Bootdevice Read Time = 559 ms, Read Size = 14649 KB
[0001.979] Sec Bootdevice Write Time = 0 ms, Write Size = 0 KB
[0001.985] Next stage binary read took 7384 us
[0001.989] Carveout took -28692 us
[0001.992] CPU initialization took 396939 us
[0001.996] Total time taken by TegraBoot 1853707 us

[0002.001] Starting CPU & Halting co-processor

64NOTICE:  BL31: v1.3(release):5b49e7f80
NOTICE:  BL31: Built : 08:53:41, Jul  9 2021
ERROR:   Error initializing runtime service trusty_fast
[0002.123] RamCode = 0
[0002.128] LPDDR4 Training: Read DT: Number of tables = 2
[0002.133] EMC Training (SRC-freq: 204000; DST-freq: 1600000)
[0002.146] EMC Training Successful
[0002.149] 408000 not found in DVFS table
[0002.156] RamCode = 0
[0002.159] DT Write: emc-table@204000 succeeded
[0002.164] DT Write: emc-table@1600000 succeeded
[0002.169] LPDDR4 Training: Write DT: Number of tables = 2
[0002.230]
[0002.231] Debug Init done
[0002.233] Marked DTB cacheable
[0002.236] Bootloader DTB loaded at 0x83000000
[0002.241] Marked DTB cacheable
[0002.244] Kernel DTB loaded at 0x83100000
[0002.248] DeviceTree Init done
[0002.261] Pinmux applied successfully
[0002.265] gicd_base: 0x50041000
[0002.269] gicc_base: 0x50042000
[0002.272] Interrupts Init done
[0002.276] Using base:0x60005090 & irq:208 for tick-timer
[0002.281] Using base:0x60005098 for delay-timer
[0002.285] platform_init_timer: DONE
[0002.289] Timer(tick) Init done
[0002.293] osc freq = 38400 khz
[0002.297]
[0002.298] Welcome to L4T Cboot
[0002.301]
[0002.302] Cboot Version: 00.00.2018.01-t210-39562017
[0002.307] calling constructors
[0002.310] initializing heap
[0002.312] initializing threads
[0002.315] initializing timers
[0002.318] creating bootstrap completion thread
[0002.323] top of bootstrap2()
[0002.325] CPU: ARM Cortex A57
[0002.328] CPU: MIDR: 0x411FD071, MPIDR: 0x80000000
[0002.333] initializing platform
[0002.387] Config for emmc ddr50 mode completed
[0002.392] sdmmc bdev is already initialized
[0002.396] Enable APE clock
[0002.399] Un-powergate APE partition
[0002.402] of_register: registering tegra_udc to of_hal
[0002.407] of_register: registering inv20628-driver to of_hal
[0002.413] of_register: registering ads1015-driver to of_hal
[0002.418] of_register: registering lp8557-bl-driver to of_hal
[0002.424] of_register: registering bq2419x_charger to of_hal
[0002.430] of_register: registering bq27441_fuel_gauge to of_hal
[0002.442] gpio framework initialized
[0002.445] of_register: registering tca9539_gpio to of_hal
[0002.451] of_register: registering tca9539_gpio to of_hal
[0002.456] of_register: registering i2c_bus_driver to of_hal
[0002.461] of_register: registering i2c_bus_driver to of_hal
[0002.467] of_register: registering i2c_bus_driver to of_hal
[0002.473] pmic framework initialized
[0002.476] of_register: registering max77620_pmic to of_hal
[0002.482] regulator framework initialized
[0002.485] of_register: registering tps65132_bl_driver to of_hal
[0002.491] initializing target
[0002.497] gpio_driver_register: register 'tegra_gpio_driver' driver
[0002.508] fixed regulator driver initialized
[0002.527] initializing OF layer
[0002.530] NCK carveout not present
[0002.533] Skipping dts_overrides
[0002.537] of_children_init: Ops found for compatible string nvidia,tegra210-i2c
[0002.555] I2C Bus Init done
[0002.557] of_children_init: Ops found for compatible string nvidia,tegra210-i2c
[0002.568] I2C Bus Init done
[0002.571] of_children_init: Ops found for compatible string nvidia,tegra210-i2c
[0002.581] I2C Bus Init done
[0002.584] of_children_init: Ops found for compatible string nvidia,tegra210-i2c
[0002.594] I2C Bus Init done
[0002.597] of_children_init: Ops found for compatible string nvidia,tegra210-i2c
[0002.608] I2C Bus Init done
[0002.610] of_children_init: Ops found for compatible string maxim,max77620
[0002.621] max77620_init using irq 118
[0002.626] register 'maxim,max77620' pmic
[0002.630] gpio_driver_register: register 'max77620-gpio' driver
[0002.636] of_children_init: Ops found for compatible string nvidia,tegra210-i2c
[0002.647] I2C Bus Init done
[0002.651] NCK carveout not present
[0002.661] Find /i2c@7000c000's alias i2c0
[0002.665] get eeprom at 1-a0, size 256, type 0
[0002.674] Find /i2c@7000c500's alias i2c2
[0002.678] get eeprom at 3-a0, size 256, type 0
[0002.683] get eeprom at 3-ae, size 256, type 0
[0002.687] pm_ids_update: Updating 1,a0, size 256, type 0
[0002.692] I2C slave not started
[0002.695] I2C write failed
[0002.698] Writing offset failed
[0002.701] eeprom_init: EEPROM read failed
[0002.705] pm_ids_update: eeprom init failed
[0002.709] pm_ids_update: Updating 3,a0, size 256, type 0
[0002.739] pm_ids_update: The pm board id is 3448-0002-401
[0002.746] Adding plugin-manager/ids/3448-0002-401=/i2c@7000c500:module@0x50
[0002.754] pm_ids_update: pm id update successful
[0002.759] pm_ids_update: Updating 3,ae, size 256, type 0
[0002.789] eeprom_init: EEPROM incompatible version found
[0002.794] pm_ids_update: eeprom init failed
[0002.824] eeprom_get_mac: EEPROM invalid MAC address (all 0xff)
[0002.830] shim_eeprom_update_mac:267: Failed to update 0 MAC address in DTB
[0002.838] eeprom_get_mac: EEPROM invalid MAC address (all 0xff)
[0002.844] shim_eeprom_update_mac:267: Failed to update 1 MAC address in DTB
[0002.852] updating /chosen/nvidia,ethernet-mac node 48:b0:2d:5b:19:72
[0002.859] Plugin Manager: Parse ODM data 0x000a4000
[0002.871] shim_cmdline_install: /chosen/bootargs: earlycon=uart8250,mmio32,0x70006000
[0002.886] Find /i2c@7000c000's alias i2c0
[0002.890] get eeprom at 1-a0, size 256, type 0
[0002.900] Find /i2c@7000c500's alias i2c2
[0002.904] get eeprom at 3-a0, size 256, type 0
[0002.908] get eeprom at 3-ae, size 256, type 0
[0002.912] pm_ids_update: Updating 1,a0, size 256, type 0
[0002.918] I2C slave not started
[0002.921] I2C write failed
[0002.923] Writing offset failed
[0002.926] eeprom_init: EEPROM read failed
[0002.930] pm_ids_update: eeprom init failed
[0002.934] pm_ids_update: Updating 3,a0, size 256, type 0
[0002.965] pm_ids_update: The pm board id is 3448-0002-401
[0002.971] Adding plugin-manager/ids/3448-0002-401=/i2c@7000c500:module@0x50
[0002.978] pm_ids_update: pm id update successful
[0002.982] pm_ids_update: Updating 3,ae, size 256, type 0
[0003.012] eeprom_init: EEPROM incompatible version found
[0003.018] pm_ids_update: eeprom init failed
[0003.048] Add serial number:1424521002087 as DT property
[0003.055] Applying platform configs
[0003.062] platform-init is not present. Skipping
[0003.067] calling apps_init()
[0003.088] Found 17 GPT partitions in "sdmmc3_user"
[0003.092] Proceeding to Cold Boot
[0003.096] starting app android_boot_app
[0003.099] Device state: unlocked
[0003.103] display console init
[0003.111] could not find regulator
[0003.134] hdmi cable not connected
[0003.137] is_hdmi_needed: HDMI not connected, returning false
[0003.143] hdmi is not connected
[0003.146] sor0 is notDT entry for leds-pwm not found
 [0003.155] supported
[0003.157] display_console_init: no valid display out_type
[0003.165] subnode volume_up is not found !
[0003.169] subnode back is not found !
[0003.173] subnode volume_down is not found !
[0003.177] subnode menu is not found !
[0003.180] Gpio keyboard init success
[0003.228] found decompressor handler: lz4-legacy
[0003.242] decompressing blob (type 1)...
[0003.308] display_resolution: No display init
[0003.312] Failed to retrieve display resolution
[0003.317] Could not load/initialize BMP blob...ignoring
[0003.367] decompressor handler not found
[0003.371] load_firmware_blob: Firmware blob loaded, entries=2
[0003.376] XUSB blob version 0 size 124416 @ 0x92cb328c
[0003.382] -------> se_aes_verify_sbk_clear: 747
[0003.387] se_aes_verify_sbk_clear: Error
[0003.391] SE operation failed
[0003.393] bl_battery_charging: connected to external power supply
[0003.403] display_console_ioctl: No display init
[0003.407] switch_backlight failed
[0003.417] device_query_partition_size: failed to open partition sdmmc3_user:MSC !
[0003.424] MSC Partition not found
[0003.434] device_query_partition_size: failed to open partition sdmmc3_user:USP !
[0003.441] USP partition read failed!
[0003.445] blob_init: blob-partition USP header read failed
[0003.450] android_boot Unable to update recovery partition
[0003.456] kfs_getpartname: name = LNX
[0003.459] Loading kernel from LNX
[0003.550] load kernel from storage
[0003.560] decompressor handler not found
[0003.594] Successfully loaded kernel and ramdisk images
[0003.599] board ID = D78, board SKU = 2
[0003.604] display_resolution: No display init
[0003.608] Failed to retrieve display resolution
[0003.613] bmp blob is not loaded and initialized
[0003.617] Failed to display boot-logo
[0003.621] NCK carveout not present
[0003.624] Skipping dts_overrides
[0003.628] NCK carveout not present
[0003.638] Find /i2c@7000c000's alias i2c0
[0003.642] get eeprom at 1-a0, size 256, type 0
[0003.651] Find /i2c@7000c500's alias i2c2
[0003.655] get eeprom at 3-a0, size 256, type 0
[0003.660] get eeprom at 3-ae, size 256, type 0
[0003.664] pm_ids_update: Updating 1,a0, size 256, type 0
[0003.669] I2C slave not started
[0003.672] I2C write failed
[0003.675] Writing offset failed
[0003.678] eeprom_init: EEPROM read failed
[0003.682] pm_ids_update: eeprom init failed
[0003.686] pm_ids_update: Updating 3,a0, size 256, type 0
[0003.716] pm_ids_update: The pm board id is 3448-0002-401
[0003.723] Adding plugin-manager/ids/3448-0002-401=/i2c@7000c500:module@0x50
[0003.731] pm_ids_update: pm id update successful
[0003.736] pm_ids_update: Updating 3,ae, size 256, type 0
[0003.766] eeprom_init: EEPROM incompatible version found
[0003.771] pm_ids_update: eeprom init failed
[0003.801] eeprom_get_mac: EEPROM invalid MAC address (all 0xff)
[0003.807] shim_eeprom_update_mac:267: Failed to update 0 MAC address in DTB
[0003.815] eeprom_get_mac: EEPROM invalid MAC address (all 0xff)
[0003.821] shim_eeprom_update_mac:267: Failed to update 1 MAC address in DTB
[0003.829] updating /chosen/nvidia,ethernet-mac node 48:b0:2d:5b:19:72
[0003.836] Plugin Manager: Parse ODM data 0x000a4000
[0003.848] shim_cmdline_install: /chosen/bootargs: earlycon=uart8250,mmio32,0x70006000
[0003.856] Add serial number:1424521002087 as DT property
[0003.865] "bpmp" doesn't exist, creating
[0003.871] Updated bpmp info to DTB
[0003.876] Updated initrd info to DTB
[0003.879] "proc-board" doesn't exist, creating
[0003.885] Updated board info to DTB
[0003.889] "pmu-board" doesn't exist, creating
[0003.895] Updated board info to DTB
[0003.898] "display-board" doesn't exist, creating
[0003.904] Updated board info to DTB
[0003.907] "reset" doesn't exist, creating
[0003.912] Updated reset info to DTB
[0003.915] display_console_ioctl: No display init
[0003.920] display_console_ioctl: No display init
[0003.924] display_console_ioctl: No display init
[0003.929] Cmdline: tegraid=21.1.2.0.0 ddr_die=4096M@2048M section=512M memtype=0 vpr_resize usb_port_owner_info=0 lane_owner_info=0 emc_max_dvfs=0 touch_id=0@63 video=tegrafb no_console_suspend=1 console=ttyS0,115200n8 debug_uartport=lsport,4 earlyprintk=uart8250-32bit,0x70006000 maxcpus=4 usbcore.old_scheme_first=1 lp0_vec=0x1000@0xff780000 core_edp_mv=1125 core_edp_ma=4000 gpt
[0003.963] DTB cmdline: earlycon=uart8250,mmio32,0x70006000
[0003.969] boot image cmdline: root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 console=tty0 fbcon=map:0 net.ifnames=0 sdhci_tegra.en_boot_part_access=1
[0003.983] Updated bootarg info to DTB
[0003.987] Adding uuid 00000001643228441c00000003ff85c0 to DT
[0003.993] Adding eks info 0 to DT
[0003.998] WARNING: Failed to pass NS DRAM ranges to TOS, err: -7
[0004.004] Updated memory info to DTB
[0004.013] set vdd_core voltage to 1125 mv
[0004.017] setting 'vdd-core' regulator to 1125000 micro volts
[0004.022] Found secure-pmc; disable BPMP


U-Boot 2020.04-g6b630d64fd (Jul 09 2021 - 08:53:46 -0700)

SoC: tegra210
Model: NVIDIA Jetson Nano Developer Kit
Board: NVIDIA P3450-0000
DRAM:  4 GiB
MMC:   sdhci@700b0000: 1, sdhci@700b0600: 0
Loading Environment from SPI Flash... *** Warning - spi_flash_probe_bus_cs() failed, using default environment

In:    serial
Out:   serial
Err:   serial
Net:   No ethernet found.
Hit any key to stop autoboot:  0
Card did not respond to voltage select!
switch to partitions #0, OK
mmc0(part 0) is current device
Scanning mmc 0:1...
Found /boot/extlinux/extlinux.conf
Retrieving file: /boot/extlinux/extlinux.conf
856 bytes read in 20 ms (41 KiB/s)
1:      primary kernel
Retrieving file: /boot/initrd
7159329 bytes read in 174 ms (39.2 MiB/s)
Retrieving file: /boot/Image
34404360 bytes read in 779 ms (42.1 MiB/s)
append: tegraid=21.1.2.0.0 ddr_die=4096M@2048M section=512M memtype=0 vpr_resize usb_port_owner_info=0 lane_owner_info=0 emc_max_dvfs=0 touch_id=0@63 video=tegrafb no_console_suspend=1 console=ttyS0,115200n8 debug_uartport=lsport,4 earlyprintk=uart8250-32bit,0x70006000 maxcpus=4 usbcore.old_scheme_first=1 lp0_vec=0x1000@0xff780000 core_edp_mv=1125 core_edp_ma=4000 gpt  earlycon=uart8250,mmio32,0x70006000  root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 console=tty0 fbcon=map:0 net.ifnames=0 sdhci_tegra.en_boot_part_access=1 quiet root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 console=tty0 fbcon=map:0 net.ifnames=0 sdhci_tegra.en_boot_part_access=1
## Flattened Device Tree blob at 83100000
   Booting using the fdt blob at 0x83100000
ERROR: reserving fdt memory region failed (addr=0 size=0)
ERROR: reserving fdt memory region failed (addr=0 size=0)
   Using Device Tree in place at 0000000083100000, end 000000008317f4ed
copying carveout for /host1x@50000000/dc@54200000...
copying carveout for /host1x@50000000/dc@54240000...

Starting kernel ...

At this point, it successfully boots into our OS.

Again, when this same module doesn’t boot, I get nothing on the debug UART…

Do you mean the module sometimes can boot into desktop successfully?
But sometimes it can’t boot and w/o any message output from UART?

Or they are different modules? only one module could not boot?

Even it does boot, there are still many errors from the log.

I have had two modules now with similar faults - one completely refuses to boot with no output on the DEBUG UART.

The second one randomly decided not to boot - and when it didn’t boot, there was also no output on the DEBUG UART.

The boot log above was take from this second module when it was able to boot. Interestingly, since I managed to get it to boot multiple times in a row - I now can’t cause it to fail to boot! I’ll continue to monitor this module for the next few weeks to see if it shows problems again.

We have around 350 of these modules deployed, and probably had close to 25-30 of them that we have had fail in this manner. Most have already been sent for RMA - but I’ve been grabbing the latest set of failures to see if we can do something about it and reduce our RMA quantities.

We’ve seen less failures in our latest batch of modules ordered - which is a good sign - but we still see these kinda regularly… These two modules I’ve grabbed are from the faults we’ve seen in the last week of building our devices.

EDIT: For clarity, we use our own rootfs which bootstraps our load process for our own docker based containers for our workloads.

For the problematic module, it would be hard to investigate the issue if there’s no serial console log.
It’s custom board and not the devkit. I would suggest to fix every errors/fails in the serial console log of the module which could boot up successfully.

Do you happen to have any hints on this? As I understand, the kernel source / uboot etc is cloned directly from nVidia’s git - therefore the eeprom problems (if they are problems) are for an EEPROM existing on the nano module itself?

That being said, considering we get no DEBUG UART output when modules refuse to boot - yet we have ~320 or so modules that boot just fine with the same OS image, do we just continue to RMA ones that refuse to boot?

No, EEPROM is on the carrier board. Some custom board might not come with EEPROM, so they might do some modification in device tree.

That is one option for you to do that, but I suggest to debug the errors/fails on the modules which could boot up success with serial debug message output first.

Are those “cannot dump uart log” modules tested on your custom board or on devkit?

Are they still able to get flashed by sdkmanager?

These are all on the custom board. As a test, I’ve been swapping modules in and out of the same carrier board on my desk - the ones that don’t boot won’t have any debug output on the debug uart.

Swapping just the nano module in the same carrier board will give UART output for modules that boot fine.

I can still flash the nanos that won’t boot via the flash.sh script and holding the RECOVERY button, then tapping the RESET button. I haven’t actually tried via the nVidia SDK manager - although I believe it uses the same mechanisms…

Hi,

Could you put those problematic module back to devkit and see if you can dump the log?
This is the SOP here.

The reason to use sdkmanager is to make sure it is pure BSP. If you are sure the BSP is clean, then you can use flash.sh too.

Understood. I’m not sure if we have an official nVidia carrier board in stock - but I’ll take a look next week to see if we have one.

Ahh - I understand what you mean - flash a complete nVidia stock package to the module + nVidia carrier board to see if that fails in the same way.

I’m only going off the fact that we have ~350 nano modules with these carrier boards that work fine with the same OS image indicates that the problem isn’t with our image that we build. Maybe that’s a bad assumption - but its probably around 5% of modules won’t boot - whereas the majority boot and function perfectly.

Hello,

is there any update for this part?

Hi Wayne,

Still working on it - I got an nVidia dev kit board today and passed that module and the devkit carrier to our jetson guy to see if he could use the SDK Manager to flash a basic image via the SDK Manager.

Our next day back in the office together is likely next monday - so hopefully I’ll have some information around then as to how successful this has been.

1 Like

Hi all, Just posting to keep this thread alive… I’ve been sent to the other side of Australia this week for an onsite testing session… Will get back to this next week (hopefully).

Any result can be shared? Thanks

Hi Kayccc,

You wouldn’t believe it, but I managed to catch COVID on the trip so have been forced to isolate this week.

Sorry about the seemingly endless delays on this - you just can’t make this stuff up :(

Take care, hope everything goes well now.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.