I have encountered had drive disconnections on the ORIN 32 developer kit that I have never seen with the Xavier AGX 32 GB. Here is what happens.
I have a 2TB NVMe on the PCIe mini at the bottom at the kit that contains large data (let’s say 500GB) and I would like to move them off the NVMe to a USB3-attached IcyBox (or also sshfs-mounted network storage.
I have just upgraded to Linux meqneuropatlp35 5.10.65-tegra #1 SMP PREEMPT Mon May 16 20:58:07 PDT 2022 aarch64 aarch64 aarch64 GNU/Linux
with no effect.
When I start copying, the speed stabilizes at about 200 MBytes/s. It continues like this for a couple of minutes until I reproducibly get this (dmesg output):
[ 4085.368540] usb 2-3.4.2.3: reset SuperSpeed Gen 1 USB device number 14 using tegra-xusb
[ 4085.397502] scsi host2: uas_eh_device_reset_handler success
[ 4086.469730] scsi host2: uas_eh_device_reset_handler start
[ 4086.469996] sd 2:0:0:0: [sdb] tag#5 uas_zap_pending 0 uas-tag 2 inflight: CMD
[ 4086.470009] sd 2:0:0:0: [sdb] tag#5 CDB: opcode=0x8a 8a 00 00 00 00 03 a3 44 0f 28 00 00 00 08 00 00
[ 4086.552979] usb 2-3.4.2.3: reset SuperSpeed Gen 1 USB device number 14 using tegra-xusb
[ 4086.582296] scsi host2: uas_eh_device_reset_handler success
[ 4089.135370] sd 2:0:0:0: [sdb] tag#5 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD OUT
[ 4089.135399] sd 2:0:0:0: [sdb] tag#5 CDB: opcode=0x8a 8a 00 00 00 00 03 a3 44 10 00 00 00 00 08 00 00
[ 4089.194940] scsi host2: uas_eh_device_reset_handler start
[ 4089.276280] usb 2-3.4.2.3: reset SuperSpeed Gen 1 USB device number 14 using tegra-xusb
[ 4089.305668] scsi host2: uas_eh_device_reset_handler success
[ 4090.375083] scsi host2: uas_eh_device_reset_handler start
[ 4090.375668] sd 2:0:0:0: [sdb] tag#20 uas_zap_pending 0 uas-tag 3 inflight: CMD
[ 4090.375718] sd 2:0:0:0: [sdb] tag#20 CDB: opcode=0x8a 8a 00 00 00 00 03 a3 44 10 00 00 00 00 08 00 00
[ 4090.461063] usb 2-3.4.2.3: reset SuperSpeed Gen 1 USB device number 14 using tegra-xusb
[ 4090.490716] scsi host2: uas_eh_device_reset_handler success
[ 4091.559753] scsi host2: uas_eh_device_reset_handler start
[ 4091.560280] sd 2:0:0:0: [sdb] tag#23 uas_zap_pending 0 uas-tag 2 inflight: CMD
[ 4091.560331] sd 2:0:0:0: [sdb] tag#23 CDB: opcode=0x8a 8a 00 00 00 00 03 a3 44 10 00 00 00 00 08 00 00
[ 4091.644592] usb 2-3.4.2.3: reset SuperSpeed Gen 1 USB device number 14 using tegra-xusb
[ 4091.673607] scsi host2: uas_eh_device_reset_handler success
[ 4092.750564] scsi host2: uas_eh_device_reset_handler start
[ 4092.751113] sd 2:0:0:0: [sdb] tag#22 uas_zap_pending 0 uas-tag 2 inflight: CMD
[ 4092.751165] sd 2:0:0:0: [sdb] tag#22 CDB: opcode=0x8a 8a 00 00 00 00 03 a3 44 10 00 00 00 00 08 00 00
[ 4092.835872] usb 2-3.4.2.3: reset SuperSpeed Gen 1 USB device number 14 using tegra-xusb
[ 4092.864980] scsi host2: uas_eh_device_reset_handler success
[ 4093.943950] scsi host2: uas_eh_device_reset_handler start
[ 4093.944699] sd 2:0:0:0: [sdb] tag#22 uas_zap_pending 0 uas-tag 3 inflight: CMD
[ 4093.944774] sd 2:0:0:0: [sdb] tag#22 CDB: opcode=0x8a 8a 00 00 00 00 03 a3 44 10 00 00 00 00 08 00 00
[ 4094.028725] usb 2-3.4.2.3: reset SuperSpeed Gen 1 USB device number 14 using tegra-xusb
[ 4094.057293] scsi host2: uas_eh_device_reset_handler success
[ 4095.107909] scsi host2: uas_eh_device_reset_handler start
[ 4095.108459] sd 2:0:0:0: [sdb] tag#23 uas_zap_pending 0 uas-tag 2 inflight: CMD
[ 4095.108529] sd 2:0:0:0: [sdb] tag#23 CDB: opcode=0x8a 8a 00 00 00 00 03 a3 44 10 00 00 00 00 08 00 00
[ 4095.192596] usb 2-3.4.2.3: reset SuperSpeed Gen 1 USB device number 14 using tegra-xusb
[ 4095.225897] scsi host2: uas_eh_device_reset_handler success
[ 4095.226468] sd 2:0:0:0: [sdb] tag#23 timing out command, waited 6s
[ 4095.226903] sd 2:0:0:0: [sdb] tag#23 UNKNOWN(0x2003) Result: hostbyte=0x08 driverbyte=0x08 cmd_age=7s
[ 4095.226938] sd 2:0:0:0: [sdb] tag#23 Sense Key : 0x2 [current]
[ 4095.226966] sd 2:0:0:0: [sdb] tag#23 ASC=0x4 ASCQ=0x1
[ 4095.227001] sd 2:0:0:0: [sdb] tag#23 CDB: opcode=0x8a 8a 00 00 00 00 03 a3 44 10 00 00 00 00 08 00 00
[ 4095.227042] blk_update_request: I/O error, dev sdb, sector 15624048640 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[ 4095.227787] blk_update_request: I/O error, dev sdb, sector 15624048640 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[ 4095.228328] Aborting journal on device sdb1-8.
[ 4095.500281] EXT4-fs error (device sdb1): ext4_journal_check_start:83: Detected aborted journal
[ 4095.500331] EXT4-fs error (device sdb1): ext4_journal_check_start:83: Detected aborted journal
[ 4095.500781] EXT4-fs (sdb1): Remounting filesystem read-only
[ 4095.516984] EXT4-fs error (device sdb1): ext4_journal_check_start:83: Detected aborted journal
[ 4095.517100] EXT4-fs error (device sdb1): ext4_journal_check_start:83: Detected aborted journal
[ 4095.517329] EXT4-fs (sdb1): failed to convert unwritten extents to written extents -- potential data loss! (inode 23527540, error -30)
[ 4095.517615] EXT4-fs (sdb1): ext4_writepages: jbd2_start: 26624 pages, ino 23527541; err -30
[ 4095.522547] EXT4-fs (sdb1): failed to convert unwritten extents to written extents -- potential data loss! (inode 23527540, error -30)
[ 4095.530139] EXT4-fs (sdb1): failed to convert unwritten extents to written extents -- potential data loss! (inode 23527540, error -30)
[ 4095.535771] EXT4-fs (sdb1): failed to convert unwritten extents to written extents -- potential data loss! (inode 23527540, error -30)
[ 4095.541035] EXT4-fs (sdb1): failed to convert unwritten extents to written extents -- potential data loss! (inode 23527541, error -30)
The drive is then re-mounted read-only, and the copy process is interrupted. When I copy the other way round (from the HDD/USB3 to the NVMe), everything works fine. When I use rsync - same thing, it’s fine. When I copy to an sshfs-mounted computer next to the orin (both on the same router), I get up to 100 MBytes/s (gigbit ethernet - i.e., OK) and I also get disconnected from the sshfs moutpoint.
I have swapped the icyboxes, also tried them of different linux PCs, changed the hard drives - the problem persists. I have also tried copying to a USB3 thumbdrive (which is much slower than the destination 22TB HDD - the transfer is much slower (some 20MB/s) but more stable.
Is this a known problem? Is there anything I could to avoid this? Would it make sense to connect a USB3 PCIe card to the PCIe connector? I read that the connector does not have power by default. I added such a card, and it does not show up in lsusb.
Any suggestions on how to attach a stack of 22TB hard drives to the ORIN 32GB? I urgently need this for automatically offloading data (being generated on the device from a nanopore sequencer).
BTW - I have a +/- replica setup with x86_64 mainboard and NVIDIA GPUs as the “pc platform”, same Icyboxes, hard drives etc. without any issues.
Thanks for your help.