An unstable USB connection causes the flashing to stop without exiting when AdbPush raises an error

Hello,

I am using L4T 38.4 Jetpack 7.1 for the Thor 5000 and I have noticed that the flashing script is stuck when any AdbPush-call is throwing an error. This could happen, when the usb connection is lost by a bad usb cable or faulty hardware.

['/media/auvidea/images/nvidia/nvidia_sdk/38.4_MASSFLASH_TEST/Linux_for_Tegra/mfi_auvidea-p3834-0008-x242/unified_flash/out/bsp_images/tools/flashtools/bootburn/flash_bsp_images.py', '-b', 'jetson-t264', '--l4t', '-D', '-P', '/media/auvidea/images/nvidia/nvidia_sdk/38.4_MASSFLASH_TEST/Linux_for_Tegra/mfi_auvidea-p3834-0008-x242/unified_flash/out/bsp_images/flash_workspace', '--l4t_boot_chain_select', 'A', '--usb-instance', '2-1.4']


Adb push failed -- /media/auvidea/images/nvidia/nvidia_sdk/38.4_MASSFLASH_TEST/Linux_for_Tegra/mfi_auvidea-p3834-0008-x242/unified_flash/out/bsp_images/tools/flashtools/flash/adb -s 2U10U1118000007G6060 push die0_bctCopiesBlob.tmp  /tmp

Process Process-1:1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/media/auvidea/images/nvidia/nvidia_sdk/38.4_MASSFLASH_TEST/Linux_for_Tegra/mfi_auvidea-p3834-0008-x242/unified_flash/out/bsp_images/tools/flashtools/bootburn/../bootburn_t264_py/bootburn_adb.py", line 1257, in FlashUsingADB
    result = self.SendFileUsingADB(partitionInfo, SkipWriteZeroChunk, TempDir, queue, UpdatePartitions)
  File "/media/auvidea/images/nvidia/nvidia_sdk/38.4_MASSFLASH_TEST/Linux_for_Tegra/mfi_auvidea-p3834-0008-x242/unified_flash/out/bsp_images/tools/flashtools/bootburn/../bootburn_t264_py/bootburn_adb.py", line 699, in SendFileUsingADB
    self.AdbPush(partitionInfo.FileName)
  File "/media/auvidea/images/nvidia/nvidia_sdk/38.4_MASSFLASH_TEST/Linux_for_Tegra/mfi_auvidea-p3834-0008-x242/unified_flash/out/bsp_images/tools/flashtools/bootburn/../bootburn_t264_py/bootburn_adb.py", line 344, in AdbPush
    AbnormalTermination("Adb push failed -- " + adbCommand, nverror.NvError_Adb)
  File "/media/auvidea/images/nvidia/nvidia_sdk/38.4_MASSFLASH_TEST/Linux_for_Tegra/mfi_auvidea-p3834-0008-x242/unified_flash/out/bsp_images/tools/flashtools/bootburn/../bootburn_t264_py/flashtools_nverror.py", line 260, in AbnormalTermination
    raise OSError(errorCode)
OSError: 53

I still need to evaluate py-spy output here. I have added it, because it might be a good starting point for your team to evaluate this issue.

root@auvidea-HP-Z620-Workstation:/media/auvidea/images/nvidia/nvidia_sdk/38.4_MASSFLASH_TEST/Linux_for_Tegra# py-spy dump --pid 724656
Process 724656: python3 /media/auvidea/images/nvidia/nvidia_sdk/38.4_MASSFLASH_TEST/Linux_for_Tegra/unified_flash/out/bsp_images/tools/flashtools/bootburn/flash_bsp_images.py -b jetson-t264 --l4t -D -P /media/auvidea/images/nvidia/nvidia_sdk/38.4_MASSFLASH_TEST/Linux_for_Tegra/unified_flash/out/bsp_images/flash_workspace --l4t_boot_chain_select A --usb-instance 2-1.4
Python v3.8.10 (/usr/bin/python3.8)

Thread 724656 (idle): "MainThread"
    poll (multiprocessing/popen_fork.py:27)
    wait (multiprocessing/popen_fork.py:47)
    join (multiprocessing/process.py:149)
    _exit_function (multiprocessing/util.py:357)
    _bootstrap (multiprocessing/process.py:318)
    _launch (multiprocessing/popen_fork.py:75)
    __init__ (multiprocessing/popen_fork.py:19)
    _Popen (multiprocessing/context.py:277)
    _Popen (multiprocessing/context.py:224)
    start (multiprocessing/process.py:121)
    ParallelFlashImages (bootburn_t264_py/bootburn_lib.py:2795)
    FlashImages (bootburn_t264_py/bootburn_lib.py:3236)
    flash_bsp_active (bootburn_t264_py/flash_bsp_images.py:119)
    run (multiprocessing/process.py:108)
    _bootstrap (multiprocessing/process.py:315)
    _launch (multiprocessing/popen_fork.py:75)
    __init__ (multiprocessing/popen_fork.py:19)
    _Popen (multiprocessing/context.py:277)
    _Popen (multiprocessing/context.py:224)
    start (multiprocessing/process.py:121)
    flash_bsp (bootburn_t264_py/flash_bsp_images.py:216)
    <module> (flash_bsp_images.py:40)
root@auvidea-HP-Z620-Workstation:/media/auvidea/images/nvidia/nvidia_sdk/38.4_MASSFLASH_TEST/Linux_for_Tegra# py-spy dump --pid 724066
Process 724066: python3 /media/auvidea/images/nvidia/nvidia_sdk/38.4_MASSFLASH_TEST/Linux_for_Tegra/unified_flash/out/bsp_images/tools/flashtools/bootburn/flash_bsp_images.py -b jetson-t264 --l4t -D -P /media/auvidea/images/nvidia/nvidia_sdk/38.4_MASSFLASH_TEST/Linux_for_Tegra/unified_flash/out/bsp_images/flash_workspace --l4t_boot_chain_select A --usb-instance 2-1.4
Python v3.8.10 (/usr/bin/python3.8)

Thread 724066 (idle): "MainThread"
    poll (multiprocessing/popen_fork.py:27)
    wait (multiprocessing/popen_fork.py:47)
    join (multiprocessing/process.py:149)
    _exit_function (multiprocessing/util.py:357)
    _bootstrap (multiprocessing/process.py:318)
    _launch (multiprocessing/popen_fork.py:75)
    __init__ (multiprocessing/popen_fork.py:19)
    _Popen (multiprocessing/context.py:277)
    _Popen (multiprocessing/context.py:224)
    start (multiprocessing/process.py:121)
    flash_bsp (bootburn_t264_py/flash_bsp_images.py:216)
    <module> (flash_bsp_images.py:40)
root@auvidea-HP-Z620-Workstation:/media/auvidea/images/nvidia/nvidia_sdk/38.4_MASSFLASH_TEST/Linux_for_Tegra# py-spy dump --pid 724058
Process 724058: python3 /media/auvidea/images/nvidia/nvidia_sdk/38.4_MASSFLASH_TEST/Linux_for_Tegra/unified_flash/out/bsp_images/tools/flashtools/bootburn/flash_bsp_images.py -b jetson-t264 --l4t -D -P /media/auvidea/images/nvidia/nvidia_sdk/38.4_MASSFLASH_TEST/Linux_for_Tegra/unified_flash/out/bsp_images/flash_workspace --l4t_boot_chain_select A --usb-instance 2-1.4
Python v3.8.10 (/usr/bin/python3.8)

Thread 724058 (idle): "MainThread"
    poll (multiprocessing/popen_fork.py:27)
    wait (multiprocessing/popen_fork.py:47)
    join (multiprocessing/process.py:149)
    flash_bsp (bootburn_t264_py/flash_bsp_images.py:220)
    <module> (flash_bsp_images.py:40)

This issue might break a lot of automation scripts (including ours for our production line)

I this issue can be reliably be reproduced by removing the USB cable during one of the AdbPush-transfers

I am not quite sure where the exact issue is, but I was able to get the script to exit as expected when adding partitionWriterProcess.daemon = True in unified_flash/out/bsp_images/tools/flashtools/bootburn_t264_py/bootburn_adb.py

            # Start Partition writer process
            partitionWriterProcess = Process(target=self.StartPartitiionWriterProcess, args=(adbTaskQueue, ))
            partitionWriterProcess.daemon = True
            partitionWriterProcess.start()

I got to test this a lot faster by using the following flashing commands:

sudo ./tools/kernel_flash/l4t_initrd_flash.sh --no-flash <board_config> internal
sudo ./tools/kernel_flash/l4t_initrd_flash.sh --flash-only <board_config> internal

Do you already got a chance to replicate this issue?

FYK, This also causes the massflash script to be stuck in an infinite loop, without exiting.