ORIN 32GB USB3 hard drives disconnect

I have encountered had drive disconnections on the ORIN 32 developer kit that I have never seen with the Xavier AGX 32 GB. Here is what happens.

I have a 2TB NVMe on the PCIe mini at the bottom at the kit that contains large data (let’s say 500GB) and I would like to move them off the NVMe to a USB3-attached IcyBox (or also sshfs-mounted network storage.

I have just upgraded to Linux meqneuropatlp35 5.10.65-tegra #1 SMP PREEMPT Mon May 16 20:58:07 PDT 2022 aarch64 aarch64 aarch64 GNU/Linux with no effect.

When I start copying, the speed stabilizes at about 200 MBytes/s. It continues like this for a couple of minutes until I reproducibly get this (dmesg output):

[ 4085.368540] usb 2-3.4.2.3: reset SuperSpeed Gen 1 USB device number 14 using tegra-xusb
[ 4085.397502] scsi host2: uas_eh_device_reset_handler success
[ 4086.469730] scsi host2: uas_eh_device_reset_handler start
[ 4086.469996] sd 2:0:0:0: [sdb] tag#5 uas_zap_pending 0 uas-tag 2 inflight: CMD 
[ 4086.470009] sd 2:0:0:0: [sdb] tag#5 CDB: opcode=0x8a 8a 00 00 00 00 03 a3 44 0f 28 00 00 00 08 00 00
[ 4086.552979] usb 2-3.4.2.3: reset SuperSpeed Gen 1 USB device number 14 using tegra-xusb
[ 4086.582296] scsi host2: uas_eh_device_reset_handler success
[ 4089.135370] sd 2:0:0:0: [sdb] tag#5 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD OUT 
[ 4089.135399] sd 2:0:0:0: [sdb] tag#5 CDB: opcode=0x8a 8a 00 00 00 00 03 a3 44 10 00 00 00 00 08 00 00
[ 4089.194940] scsi host2: uas_eh_device_reset_handler start
[ 4089.276280] usb 2-3.4.2.3: reset SuperSpeed Gen 1 USB device number 14 using tegra-xusb
[ 4089.305668] scsi host2: uas_eh_device_reset_handler success
[ 4090.375083] scsi host2: uas_eh_device_reset_handler start
[ 4090.375668] sd 2:0:0:0: [sdb] tag#20 uas_zap_pending 0 uas-tag 3 inflight: CMD 
[ 4090.375718] sd 2:0:0:0: [sdb] tag#20 CDB: opcode=0x8a 8a 00 00 00 00 03 a3 44 10 00 00 00 00 08 00 00
[ 4090.461063] usb 2-3.4.2.3: reset SuperSpeed Gen 1 USB device number 14 using tegra-xusb
[ 4090.490716] scsi host2: uas_eh_device_reset_handler success
[ 4091.559753] scsi host2: uas_eh_device_reset_handler start
[ 4091.560280] sd 2:0:0:0: [sdb] tag#23 uas_zap_pending 0 uas-tag 2 inflight: CMD 
[ 4091.560331] sd 2:0:0:0: [sdb] tag#23 CDB: opcode=0x8a 8a 00 00 00 00 03 a3 44 10 00 00 00 00 08 00 00
[ 4091.644592] usb 2-3.4.2.3: reset SuperSpeed Gen 1 USB device number 14 using tegra-xusb
[ 4091.673607] scsi host2: uas_eh_device_reset_handler success
[ 4092.750564] scsi host2: uas_eh_device_reset_handler start
[ 4092.751113] sd 2:0:0:0: [sdb] tag#22 uas_zap_pending 0 uas-tag 2 inflight: CMD 
[ 4092.751165] sd 2:0:0:0: [sdb] tag#22 CDB: opcode=0x8a 8a 00 00 00 00 03 a3 44 10 00 00 00 00 08 00 00
[ 4092.835872] usb 2-3.4.2.3: reset SuperSpeed Gen 1 USB device number 14 using tegra-xusb
[ 4092.864980] scsi host2: uas_eh_device_reset_handler success
[ 4093.943950] scsi host2: uas_eh_device_reset_handler start
[ 4093.944699] sd 2:0:0:0: [sdb] tag#22 uas_zap_pending 0 uas-tag 3 inflight: CMD 
[ 4093.944774] sd 2:0:0:0: [sdb] tag#22 CDB: opcode=0x8a 8a 00 00 00 00 03 a3 44 10 00 00 00 00 08 00 00
[ 4094.028725] usb 2-3.4.2.3: reset SuperSpeed Gen 1 USB device number 14 using tegra-xusb
[ 4094.057293] scsi host2: uas_eh_device_reset_handler success
[ 4095.107909] scsi host2: uas_eh_device_reset_handler start
[ 4095.108459] sd 2:0:0:0: [sdb] tag#23 uas_zap_pending 0 uas-tag 2 inflight: CMD 
[ 4095.108529] sd 2:0:0:0: [sdb] tag#23 CDB: opcode=0x8a 8a 00 00 00 00 03 a3 44 10 00 00 00 00 08 00 00
[ 4095.192596] usb 2-3.4.2.3: reset SuperSpeed Gen 1 USB device number 14 using tegra-xusb
[ 4095.225897] scsi host2: uas_eh_device_reset_handler success
[ 4095.226468] sd 2:0:0:0: [sdb] tag#23 timing out command, waited 6s
[ 4095.226903] sd 2:0:0:0: [sdb] tag#23 UNKNOWN(0x2003) Result: hostbyte=0x08 driverbyte=0x08 cmd_age=7s
[ 4095.226938] sd 2:0:0:0: [sdb] tag#23 Sense Key : 0x2 [current] 
[ 4095.226966] sd 2:0:0:0: [sdb] tag#23 ASC=0x4 ASCQ=0x1 
[ 4095.227001] sd 2:0:0:0: [sdb] tag#23 CDB: opcode=0x8a 8a 00 00 00 00 03 a3 44 10 00 00 00 00 08 00 00
[ 4095.227042] blk_update_request: I/O error, dev sdb, sector 15624048640 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[ 4095.227787] blk_update_request: I/O error, dev sdb, sector 15624048640 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[ 4095.228328] Aborting journal on device sdb1-8.
[ 4095.500281] EXT4-fs error (device sdb1): ext4_journal_check_start:83: Detected aborted journal
[ 4095.500331] EXT4-fs error (device sdb1): ext4_journal_check_start:83: Detected aborted journal
[ 4095.500781] EXT4-fs (sdb1): Remounting filesystem read-only
[ 4095.516984] EXT4-fs error (device sdb1): ext4_journal_check_start:83: Detected aborted journal
[ 4095.517100] EXT4-fs error (device sdb1): ext4_journal_check_start:83: Detected aborted journal
[ 4095.517329] EXT4-fs (sdb1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 23527540, error -30)
[ 4095.517615] EXT4-fs (sdb1): ext4_writepages: jbd2_start: 26624 pages, ino 23527541; err -30
[ 4095.522547] EXT4-fs (sdb1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 23527540, error -30)
[ 4095.530139] EXT4-fs (sdb1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 23527540, error -30)
[ 4095.535771] EXT4-fs (sdb1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 23527540, error -30)
[ 4095.541035] EXT4-fs (sdb1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 23527541, error -30)

The drive is then re-mounted read-only, and the copy process is interrupted. When I copy the other way round (from the HDD/USB3 to the NVMe), everything works fine. When I use rsync - same thing, it’s fine. When I copy to an sshfs-mounted computer next to the orin (both on the same router), I get up to 100 MBytes/s (gigbit ethernet - i.e., OK) and I also get disconnected from the sshfs moutpoint.

I have swapped the icyboxes, also tried them of different linux PCs, changed the hard drives - the problem persists. I have also tried copying to a USB3 thumbdrive (which is much slower than the destination 22TB HDD - the transfer is much slower (some 20MB/s) but more stable.

Is this a known problem? Is there anything I could to avoid this? Would it make sense to connect a USB3 PCIe card to the PCIe connector? I read that the connector does not have power by default. I added such a card, and it does not show up in lsusb.

Any suggestions on how to attach a stack of 22TB hard drives to the ORIN 32GB? I urgently need this for automatically offloading data (being generated on the device from a nanopore sequencer).

BTW - I have a +/- replica setup with x86_64 mainboard and NVIDIA GPUs as the “pc platform”, same Icyboxes, hard drives etc. without any issues.

Thanks for your help.

Dear @juergen.hench ,
May I know the used platform? Is it DRIVE or Jetson?

P.S.: interestingly, when I use gnome-disk-utility and start HDD formatting with the erase option, I am getting a write rate of 85 MB/s which is much slower than what I get when copying.

Jetson ORIN 32GB developer kit.

Would it make sense to set a rate limit?

I just did the following experiment which works fine so far. 3 of the 4 disks have not been used yet, i.e. I can format them and write zeroes to them. I started this process in gnome-disk-utility for all three in parallel - all 3 stay stably at ~85 MB/s. dmesg does not show errors related to this process.

Lastly: is there a way to get mdam to work on the ORIN 32GB dev kit? This would have the option to ratelimit.

Moving to Jetson Orin forum(Jetson AGX Orin - NVIDIA Developer Forums )

Hi,
The latest Jetpack releases for AGX Orin are 5.1.3 and 6.0GA. Please check which version you are using. If you use previous release, would suggest upgrade to the latest version.

And does the USB device have external power supply? We would suggest connect to a external power supply if possible.

Thanks. I am using L4T 34.1.1 and uname output is 5.10.65-tegra #1 SMP PREEMPT Mon May 16 20:58:07 PDT 2022 aarch64 aarch64 aarch64 GNU/Linux. I can’t currently upgrade since I made quite some modifications. I have been zero-ing three hard drives in parallel without any hickups on the Jetson started last night, 1.5 days left to go. Is it possible to use mdadm? That would solve two issues at once; I had anyway intended to use RAID1 for data security. Any suggestion is highly appreciated.

P.S.: Yes, the USB3 device is an IcyBox with it’s own PSU. All attached USB 3 devices are externally powered, some through a powered USB3 hub.

I am not quite sure whether this has actually do to with the USB connection since through that very same connection I can zero the drives. When I use rsync (can also ratelimit the transfer), I also don’t get disconnections. This happes only when I copy or move files.

Is there a safe way to upgrade the OS without a fresh install?

Hi,
R34(Jetpack 5.0.1) is developer preview, so please upgrade to either 5.1.3 or 6.0GA. Later versions are production releases.

Thanks. How can I upgrade without a fresh install?

I have added the sync option to fstab for respective USB3 hard drives. That boils it down to the actual write speed. Furthermore, I adjusted mount option for USB devices. I also tried disabling hdd write cache with hdparm but that did not solve the issue, so I re-activated it.

uname: 5.10.65-tegra #1 SMP PREEMPT Mon May 16 20:58:07 PDT 2022 aarch64 aarch64 aarch64 GNU/Linux
Jetpack 5.0.1 DP [34.1.1] - preinstalled on the device when purchased from NVIDIA.

Code used:

# run with root privileges
# disable write caching
hdparm -W 0 /dev/sda1 # disable write cache

# enable write caching
hdparm -W 1 /dev/sda1 # disable write cache
#!/bin/bash

# run with root priviledges
# disable write caches for USB devices (tend to crash otherwise)
# see https://unix.stackexchange.com/questions/637598/how-to-truly-disable-the-write-cache-functionin-dmesg-in-ubuntu-for-an-externa

f=/etc/udev/rules.d/99-udisks2-usb_mount.rules
echo 'SUBSYSTEMS=="usb", SUBSYSTEM=="block", ENV{ID_FS_USAGE}=="filesystem", ENV{UDISKS_MOUNT_OPTIONS_DEFAULTS}+="sync", ENV{UDISKS_MOUNT_OPTIONS_ALLOW}+="sync"' > $f
udevadm control --reload-rules

This significantly lowers the transfer rate, and the errors / warnings seen in dmesg vanish. When reading from HDD to NVMe, speeds are about 10x as high as for writing, which is OK on an archival system.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.