Fq2bam stalls in single-end recovery mode

I am using parabricks fq2bam to align a ~100 samples to a reference genome. This command is integrated into a snakemake pipeline that is run on a SLURM HPCC. This has worked well for all but a few samples, which consistently time-out. Looking at the logs, it seems like fq2bam simply stalls at a certain point. There is nothing special about these samples in terms of their read depth or file size. Similar coverage samples take 1-2 hours to align, but the problematic samples have timed out after 24 hours. These samples can be aligned with bwa-mem in 24 hours just fine.

I would appreciate any insights into why this may be happening, and potential fixes.

Parabricks code:


pbrun fq2bam --ref {input.ref} \
            --in-fq {input.read1} {input.read2} \
            --out-bam {output} \
            --tmp-dir /tmp \
            --bwa-nstreams 2 \
            --num-gpus 2 \
            --memory-limit 154

I have also run it without --bwa-nstreams , --num-gpus or --memory-limit, and there is no difference for the problematic samples.

Log file:

[PB Info 2026-Feb-22 14:36:53] ------------------------------------------------------------------------------
[PB Info 2026-Feb-22 14:36:53] ||                 Parabricks accelerated Genomics Pipeline                 ||
[PB Info 2026-Feb-22 14:36:53] ||                              Version 4.5.0-1                             ||
[PB Info 2026-Feb-22 14:36:53] ||                      GPU-PBBWA mem, Sorting Phase-I                      ||
[PB Info 2026-Feb-22 14:36:53] ------------------------------------------------------------------------------
[PB Info 2026-Feb-22 14:36:53] Mode = pair-ended-gpu
[PB Info 2026-Feb-22 14:36:53] Running with 2 GPU(s), using 4 stream(s) per device with 16 worker threads per GPU
[PB Info 2026-Feb-22 14:37:03] #  0  0  0  0  0   0 pool:  0 0 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:37:13] #  0  0  0  0  0   0 pool:  0 0 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:37:23] #  0  0  0  0  0   0 pool:  0 0 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:37:33] # 20  0  0  0  0   0 pool:  0 0 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:37:43] # 20  0  3  0  0   0 pool:  1 359750676 bases/GPU/minute: 1079252028.0
[PB Info 2026-Feb-22 14:37:53] # 20  0  4  0  0   0 pool:  3 1185508179 bases/GPU/minute: 2477272509.0
[PB Info 2026-Feb-22 14:37:53] Single-ended recovery mode for batch with 9895936 (both ends) reads before itself
[PB Info 2026-Feb-22 14:37:53] Single-ended recovery mode for batch with 9961472 (both ends) reads before itself
[PB Info 2026-Feb-22 14:37:53] Single-ended recovery mode for batch with 10027008 (both ends) reads before itself
[PB Info 2026-Feb-22 14:37:53] Single-ended recovery mode for batch with 10092544 (both ends) reads before itself
[PB Info 2026-Feb-22 14:37:53] Single-ended recovery mode for batch with 10158080 (both ends) reads before itself
[PB Info 2026-Feb-22 14:37:53] Single-ended recovery mode for batch with 10223616 (both ends) reads before itself
[PB Info 2026-Feb-22 14:37:53] Single-ended recovery mode for batch with 10289152 (both ends) reads before itself
[PB Info 2026-Feb-22 14:37:53] Single-ended recovery mode for batch with 10354688 (both ends) reads before itself
[PB Info 2026-Feb-22 14:37:53] Single-ended recovery mode for batch with 10485760 (both ends) reads before itself
[PB Info 2026-Feb-22 14:37:54] Single-ended recovery mode for batch with 10747904 (both ends) reads before itself
[PB Info 2026-Feb-22 14:37:54] Single-ended recovery mode for batch with 10616832 (both ends) reads before itself
[PB Info 2026-Feb-22 14:37:54] Single-ended recovery mode for batch with 10944512 (both ends) reads before itself
[PB Info 2026-Feb-22 14:37:54] Single-ended recovery mode for batch with 10682368 (both ends) reads before itself
[PB Info 2026-Feb-22 14:37:54] Single-ended recovery mode for batch with 10420224 (both ends) reads before itself
[PB Info 2026-Feb-22 14:37:54] Single-ended recovery mode for batch with 10813440 (both ends) reads before itself
[PB Info 2026-Feb-22 14:37:54] Single-ended recovery mode for batch with 10551296 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:00] Single-ended recovery mode for batch with 12976128 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:03] # 20 25  2  7 10   0 pool:  1 1254744218 bases/GPU/minute: 207708117.0
[PB Info 2026-Feb-22 14:38:13] # 20 27  6  5  4   0 pool:  3 1306916634 bases/GPU/minute: 156517248.0
[PB Info 2026-Feb-22 14:38:13] Single-ended recovery mode for batch with 13500416 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:13] Single-ended recovery mode for batch with 13697024 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:14] Single-ended recovery mode for batch with 13762560 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:23] # 20 22 13  7  2   0 pool:  4 1350581912 bases/GPU/minute: 130995834.0
[PB Info 2026-Feb-22 14:38:27] Single-ended recovery mode for batch with 13959168 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:27] Single-ended recovery mode for batch with 14024704 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:27] Single-ended recovery mode for batch with 14221312 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:27] Single-ended recovery mode for batch with 14090240 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:27] Single-ended recovery mode for batch with 14155776 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:27] Single-ended recovery mode for batch with 14286848 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:27] Single-ended recovery mode for batch with 14352384 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:27] Single-ended recovery mode for batch with 14417920 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:27] Single-ended recovery mode for batch with 14483456 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:27] Single-ended recovery mode for batch with 14614528 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:27] Single-ended recovery mode for batch with 14548992 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:28] Single-ended recovery mode for batch with 14876672 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:28] Single-ended recovery mode for batch with 15007744 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:28] Single-ended recovery mode for batch with 14942208 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:28] Single-ended recovery mode for batch with 15073280 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:28] Single-ended recovery mode for batch with 15138816 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:28] Single-ended recovery mode for batch with 15204352 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:28] Single-ended recovery mode for batch with 15269888 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:29] Single-ended recovery mode for batch with 15335424 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:31] Single-ended recovery mode for batch with 15532032 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:31] Single-ended recovery mode for batch with 15466496 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:31] Single-ended recovery mode for batch with 15597568 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:33] # 20 22  2 22 18   0 pool:  1 1420567853 bases/GPU/minute: 209957823.0
[PB Info 2026-Feb-22 14:38:38] Single-ended recovery mode for batch with 15859712 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:38] Single-ended recovery mode for batch with 15794176 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:42] Single-ended recovery mode for batch with 15925248 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:42] Single-ended recovery mode for batch with 16056320 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:42] Single-ended recovery mode for batch with 16187392 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:42] Single-ended recovery mode for batch with 16121856 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:42] Single-ended recovery mode for batch with 15990784 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:42] Single-ended recovery mode for batch with 16252928 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:42] Single-ended recovery mode for batch with 16384000 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:42] Single-ended recovery mode for batch with 16318464 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:42] Single-ended recovery mode for batch with 16449536 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:43] # 20 23  0 27 20   0 pool:  0 1464450411 bases/GPU/minute: 131647674.0
[PB Info 2026-Feb-22 14:38:44] Single-ended recovery mode for batch with 16515072 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:53] # 20 18  6 22 20   0 pool:  4 1507928409 bases/GPU/minute: 130433994.0
[PB Info 2026-Feb-22 14:38:59] Single-ended recovery mode for batch with 16646144 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:59] Single-ended recovery mode for batch with 16580608 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:59] Single-ended recovery mode for batch with 16777216 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:59] Single-ended recovery mode for batch with 16711680 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:59] Single-ended recovery mode for batch with 16908288 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:59] Single-ended recovery mode for batch with 16842752 (both ends) reads before itself
[PB Info 2026-Feb-22 14:38:59] Single-ended recovery mode for batch with 16973824 (both ends) reads before itself
[PB Info 2026-Feb-22 14:39:03] # 20 12 11 23 22   0 pool:  3 1551427683 bases/GPU/minute: 130497822.0
[PB Info 2026-Feb-22 14:39:13] # 20 13 14 24 17   0 pool:  3 1594740700 bases/GPU/minute: 129939051.0
[PB Info 2026-Feb-22 14:39:23] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 51469554.0
[PB Info 2026-Feb-22 14:39:33] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:39:43] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:39:53] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:40:03] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:40:13] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:40:23] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:40:33] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:40:43] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:40:53] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:41:03] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:41:13] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:41:23] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:41:33] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:41:43] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:41:53] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:42:03] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:42:13] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:42:23] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:42:33] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:42:43] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0
[PB Info 2026-Feb-22 14:42:53] # 20  9 20 22 15   0 pool:  5 1611897218 bases/GPU/minute: 0.0

I was running into a similar issue and was able to solve it consistently by reducing the chunk size being fed to each GPU thread so that it wouldn’t overwhelm the I/O of my CPUs/disk.

--bwa-options "-K 5000000"

Try adding this to your command and see if it helps, I believe the default value if not specified is 10,000,000

I have the same problem. I wonder if you have found a solution to it.

I found that when I upgraded to parabricks 4.6 from 4.5 that the problem was solved. In the version notes they mentioned performance improvements to fq2bam, which may have solved this problem.