Can pbrun germline take stdin?

Hi,
according to GATK best practices unmapped reads should be stored as .ubam files.
pbrun or bwa do not take .bam as input.
Currently we do sth like this:

picard SamToFastq \
		I= input.ubam \
		FASTQ=/dev/stdout \
		CLIPPING_ATTRIBUTE=XT \
		CLIPPING_ACTION=2 \
		INTERLEAVE=true \
		NON_PF=true \
		TMP_DIR= tmp/ | \
               bwa mem -M -Y -t 16 -p ref/Homo_sapiens_assembly38.fasta /dev/stdin

INTERLEAVE=true generates an interleaved .fastq, which bwa accepts via -Y

I can not do the same in pbrun:

picard SamToFastq \
		I= input.ubam \
		FASTQ=/dev/stdout \
		CLIPPING_ATTRIBUTE=XT \
		CLIPPING_ACTION=2 \
		**INTERLEAVE=true** \
		NON_PF=true \
		TMP_DIR= tmp/ | \
pbrun germline --ref ref/Homo_sapiens_assembly38.fasta \
		                 --bwa-options "-Y"
				 --in-fq /dev/stdin
				 --knownSites ref/Homo_sapiens_assembly38.dbsnp138.vcf \
				 --knownSites ref/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \
				 --out-bam out.bam \
				 --out-variants out.vcf \
				 --out-recal-file out.report.txt \
				 --out-duplicate-metrics out.duplicatemetrics.txt \
				 --tmp-dir /tmp \

Output:
[Parabricks Options Error]: Input file /dev/stdin not found. Exiting…

Anyone tried something similar yet?

Best,
Max

Hey @Benutzer1756,

I have not tested this, but my theory for why it doesn’t work is because the pbrun command is a wrapper. The work is done by a docker container. The container uses a different shell instance and therefore probably can’t read in from the original shell’s /dev/stdin

Hi, thanks for your response. This is too bad, as the issue expands to variant calling and recalibration, which are not able to read or write gzipped vcfs. As g.vcfs can get quite big, this costs a lot of time and space.
Do you happen to know a workaround for this issue?

Hey @Benutzer1756, which variant callers are you trying to use that don’t read or write gzipped files?

Hi,
pbrun germline with --gvcf and --out-variants.g.vcf.gz produces no gzipped vcf.gz (which can get pretty large for WGS)
pbrun haplotypecaller on the other hand does gzip the output, if a g.vcf.gz is specified.

pbrun genotypegvcf accepts .g.vcf.gz but does not produce gzipped
pbrun vqsr does not accept gzipped vcf files, neither as input nor as training set.

I think working with ungzipped vcfs from haplotypecaller on is fine, as the size is managable (~1GB per 30xWGS) but the g.vcf produced by pbrun germline should be compressed.

edit: to come back to the original question: it would also be great if pbrun germline and fastq2bam accept unaligned .bam-files as input to concord with GATK best practices

Best,
Max

Hey @Benutzer1756,

Thank you for this feedback. This is not available in the current version of Parabricks, but I’ve sent this as a feature request to the engineering team for future versions.

1 Like