I’m running Parabricks 4.1.1-1 fq2bam as follows:
pbrun fq2bam \
--logfile output/$1.$LOG_FILE \
--out-bam output/$1.$OUT_FILE \
--in-fq input/unmerged_1.fastq.gz input/unmerged_2.fastq.gz --fix-mate \
--ref input/$GENOME_FASTA \
--bwa-options=-Y \
--tmp-dir tmp \
--filter-flag 256 \
--gpuwrite --gpusort \
--no-markdups
Although the documentation states:
The user can turn-off marking of duplicates by adding the –no-markdups option. The BQSR step is only performed if the –knownSites input and –out-recal-file output options are provided
After successful alignment and sorting, the job above resulted in:
[PB Info 2023-Jul-13 22:14:04] ------------------------------------------------------------------------------
[PB Info 2023-Jul-13 22:14:04] || Parabricks accelerated Genomics Pipeline ||
[PB Info 2023-Jul-13 22:14:04] || Version 4.1.1-1 ||
[PB Info 2023-Jul-13 22:14:04] || Marking Duplicates, BQSR ||
[PB Info 2023-Jul-13 22:14:04] ------------------------------------------------------------------------------
[PB Info 2023-Jul-13 22:14:39] progressMeter - Percentage
[PB Info 2023-Jul-13 22:14:49] 0.2 4.59 GB
[PB Info 2023-Jul-13 22:14:59] 0.6 7.94 GB
Am I misinterpreting? It appears to be running at least one of Marking Duplicates, or BQSR despite setting --no-markdups and not setting -–knownSites or -–out-recal-file.
Is this a bug or my error or misunderstanding? It’s an immediate problem for me because the mark dups/BQSR eventually got killed OOM (a different issue…)