Does the germline pipeline call through the ApplyBQSR process?

Hi, there

https://docs.nvidia.com/clara/parabricks/v3.5/text/germline_pipeline.html


The figure shows that it will call the ApplyBQSR process.

When I tried to run the germline pipeline, I found that there was no ApplyBQSR info in the log.

Here is the log from the sample run.

$ pbrun germline \
>   --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
>   --in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz \
>   --knownSites parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz \
>   --out-bam output.bam \
>   --out-variants output.vcf \
>   --out-recal-file report.txt \
>   --x3
Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation


[Parabricks Options Mesg]: Automatically generating ID prefix
[Parabricks Options Mesg]: Read group created for /uploads/workspace/parabricks_sample/Data/sample_1.fq.gz and
/uploads/workspace/parabricks_sample/Data/sample_2.fq.gz
[Parabricks Options Mesg]: @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1

docker run --gpus all -u=1000:1000 --rm -w=/uploads/workspace --net=host -v /opt/parabricks:/INSTALL/ -v /uploads/workspace/WODDX80V:/uploads/workspace/WODDX80V -v /uploads/workspace:/uploads/workspace -v /uploads/workspace/parabricks_sample/Ref:/uploads/workspace/parabricks_sample/Ref -v /uploads/workspace/parabricks_sample/Data:/uploads/workspace/parabricks_sample/Data parabricks/release:v3.5.0 fq2bam --ref /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta --in-fq /uploads/workspace/parabricks_sample/Data/sample_1.fq.gz /uploads/workspace/parabricks_sample/Data/sample_2.fq.gz @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1 --knownSites /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz --out-bam /uploads/workspace/output.bam --out-recal-file /uploads/workspace/report.txt --memory-limit 110 --num-cpu-threads 0 --tmp-dir /uploads/workspace/WODDX80V --num-gpus 2 --x3
Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation


[Parabricks Options Mesg]: Checking argument compatibility
[Parabricks Options Mesg]: Read group created for /uploads/workspace/parabricks_sample/Data/sample_1.fq.gz and
/uploads/workspace/parabricks_sample/Data/sample_2.fq.gz
[Parabricks Options Mesg]: @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
g 2 b 0 B 2 P 4 s 1 r 0 o 2 m 1 z 4 f 2 v 0 M 2 name /uploads/workspace/output.bam report /uploads/workspace/report.txt K /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz
/usr/local/cuda/.pb/binaries//bin/bwa mem /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta /uploads/workspace/parabricks_sample/Data/sample_1.fq.gz /uploads/workspace/parabricks_sample/Data/sample_2.fq.gz @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1 -Z ./pbOpts.txt
------------------------------------------------------------------------------
||                 Parabricks accelerated Genomics Pipeline                 ||
||                              Version v3.5.0                              ||
||                       GPU-BWA mem, Sorting Phase-I                       ||
||                  Contact: Parabricks-Support@nvidia.com                  ||
------------------------------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs

GPU-BWA mem
ProgressMeter	Reads		Base Pairs Aligned


WARNING
The system has 12 threads, however recommended number of threads with 2 GPU is 24.
The run might not finish or might have less than expected performance.
[09:00:49]	5043564		590000000
[09:01:15]	10087128	1160000000
[09:01:41]	15130692	1730000000
[09:02:07]	20174256	2310000000
[09:02:33]	25217820	2900000000
[09:02:59]	30261384	3490000000
[09:03:25]	35304948	4060000000
[09:03:51]	40348512	4640000000
[09:04:18]	45392076	5220000000
[09:04:44]	50435640	5800000000

GPU-BWA Mem time: 287.420745 seconds
GPU-BWA Mem is finished.

GPU Sorting, Marking Dups, BQSR
ProgressMeter	SAM Entries Completed

Total GPU-BWA Mem + Sorting + MarkingDups + BQSR Generation + BAM writing
Processing time: 287.421802 seconds

[main] CMD: PARABRICKS mem -Z ./pbOpts.txt /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta /uploads/workspace/parabricks_sample/Data/sample_1.fq.gz /uploads/workspace/parabricks_sample/Data/sample_2.fq.gz @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
[main] Real time: 291.557 sec; CPU: 3389.752 sec
------------------------------------------------------------------------------
||        Program:                      GPU-BWA mem, Sorting Phase-I        ||
||        Version:                                            v3.5.0        ||
||        Start Time:                       Thu Jun 10 09:00:09 2021        ||
||        End Time:                         Thu Jun 10 09:05:05 2021        ||
||        Total Time:                           4 minutes 56 seconds        ||
------------------------------------------------------------------------------
/usr/local/cuda/.pb/binaries//bin/sort -sort_unmapped -ft 10 -gb 110
------------------------------------------------------------------------------
||                 Parabricks accelerated Genomics Pipeline                 ||
||                              Version v3.5.0                              ||
||                             Sorting Phase-II                             ||
||                  Contact: Parabricks-Support@nvidia.com                  ||
------------------------------------------------------------------------------
progressMeter - Percentage
[09:05:06]	0.0	 0.00 GB
Sorting and Marking: 10.000 seconds
------------------------------------------------------------------------------
||        Program:                                  Sorting Phase-II        ||
||        Version:                                            v3.5.0        ||
||        Start Time:                       Thu Jun 10 09:05:06 2021        ||
||        End Time:                         Thu Jun 10 09:05:16 2021        ||
||        Total Time:                                     10 seconds        ||
------------------------------------------------------------------------------
/usr/local/cuda/.pb/binaries//bin/postsort /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta -o /uploads/workspace/output.bam -sort_unmapped -ft 4 -wt 2 -zt 3 -bq 2 -gb 110 -a /uploads/workspace/report.txt /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz
------------------------------------------------------------------------------
||                 Parabricks accelerated Genomics Pipeline                 ||
||                              Version v3.5.0                              ||
||                         Marking Duplicates, BQSR                         ||
||                  Contact: Parabricks-Support@nvidia.com                  ||
------------------------------------------------------------------------------
progressMeter -	Percentage
[09:05:27]	0.0	 19.33 GB
[09:05:37]	0.3	 19.23 GB
[09:05:47]	43.7	 10.67 GB
[09:05:57]	79.4	 3.01 GB
[09:06:07]	100.0	 0.00 GB
BQSR and writing final BAM:  55.401 seconds
------------------------------------------------------------------------------
||        Program:                          Marking Duplicates, BQSR        ||
||        Version:                                            v3.5.0        ||
||        Start Time:                       Thu Jun 10 09:05:16 2021        ||
||        End Time:                         Thu Jun 10 09:06:13 2021        ||
||        Total Time:                                     57 seconds        ||
------------------------------------------------------------------------------
docker run --gpus all -u=1000:1000 --rm -w=/uploads/workspace --net=host -v /opt/parabricks:/INSTALL/ -v /uploads/workspace/WODDX80V:/uploads/workspace/WODDX80V -v /uploads/workspace:/uploads/workspace -v /uploads/workspace/parabricks_sample/Ref:/uploads/workspace/parabricks_sample/Ref -v /uploads/workspace/parabricks_sample/Data:/uploads/workspace/parabricks_sample/Data parabricks/release:v3.5.0 haplotypecaller --ref /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta --in-bam /uploads/workspace/output.bam --out-variants /uploads/workspace/output.vcf --ploidy 2 --num-htvc-threads 5 --in-recal-file /uploads/workspace/report.txt --tmp-dir /uploads/workspace/WODDX80V --num-gpus 2 --x3
Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation

/usr/local/cuda/.pb/binaries//bin/htvc /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta /uploads/workspace/output.bam 2 -o /uploads/workspace/output.vcf -nt 5 -a /uploads/workspace/report.txt
------------------------------------------------------------------------------
||                 Parabricks accelerated Genomics Pipeline                 ||
||                              Version v3.5.0                              ||
||                         GPU-GATK4 HaplotypeCaller                        ||
||                  Contact: Parabricks-Support@nvidia.com                  ||
------------------------------------------------------------------------------
ProgressMeter -	Current-Locus	Elapsed-Minutes	Regions-Processed	Regions/Minute
0 /uploads/workspace/output.bam /uploads/workspace/output.vcf
[09:06:45]	chr1:69736213	0.2	295788	1774728
[09:06:55]	chr1:172127728	0.3	638210	1914630
[09:07:05]	chr2:24575905	0.5	1059308	2118616
[09:07:15]	chr2:118607996	0.7	1438509	2157763
[09:07:25]	chr2:210110165	0.8	1822251	2186701
[09:07:35]	chr3:53063751	1.0	2183068	2183068
[09:07:45]	chr3:143860641	1.2	2555289	2190247
[09:07:55]	chr4:62855995	1.3	3034038	2275528
[09:08:05]	chr4:171853801	1.5	3487445	2324963
[09:08:15]	chr5:84331180	1.7	3892371	2335422
[09:08:25]	chr5:173260116	1.8	4264213	2325934
[09:08:35]	chr6:74481799	2.0	4586215	2293107
[09:08:45]	chr7:11246134	2.2	5032603	2322739
[09:08:55]	chr7:130997412	2.3	5521721	2366451
[09:09:05]	chr8:61243172	2.5	5878954	2351581
[09:09:15]	chr9:20644571	2.7	6315026	2368134
[09:09:25]	chr10:3110359	2.8	6733447	2376510
[09:09:35]	chr10:117102296	3.0	7207833	2402611
[09:09:45]	chr11:73843030	3.2	7574909	2392076
[09:09:55]	chr12:26831401	3.3	7945179	2383553
[09:10:05]	chr13:28406382	3.5	8431610	2409031
[09:10:15]	chr14:34871914	3.7	8861037	2416646
[09:10:25]	chr15:57763138	3.8	9295695	2424963
[09:10:35]	chr16:64607746	4.0	9700290	2425072
[09:10:45]	chr17:68337341	4.2	10069197	2416607
[09:10:55]	chr18:68500686	4.3	10399080	2399787
[09:11:05]	chr20:38097406	4.5	10836101	2408022
[09:11:15]	chr22:44141670	4.7	11223280	2404988
[09:11:25]	chr17_GL000258v2_alt:1521348	4.8	11959301	2474338
Total time taken: 304.593
------------------------------------------------------------------------------
||        Program:                         GPU-GATK4 HaplotypeCaller        ||
||        Version:                                            v3.5.0        ||
||        Start Time:                       Thu Jun 10 09:06:17 2021        ||
||        End Time:                         Thu Jun 10 09:11:36 2021        ||
||        Total Time:                           5 minutes 19 seconds        ||
------------------------------------------------------------------------------

Below is the binary list from the container.

usr/local/cuda-10.1/.pb/binaries/bin$ ls -ls
total 467644
   360 -rwxrwxrwx    367256 Feb 23 applyBQSR
  1192 -rwxrwxrwx   1218248 Feb 23 bamreadcount
   296 -rwxrwxrwx    301024 Feb 23 bcftoolscall
   432 -rwxrwxrwx    440368 Feb 23 bcftoolsmpileup
    96 -rwxrwxrwx     96584 Feb 23 bedcov
   500 -rwxrwxrwx    510696 Feb 23 bqsr
  2424 -rwxrwxrwx   2481896 Feb 23 bwa
   708 -rwxrwxrwx    723584 Feb 23 cnnscorevariants
   220 -rwxrwxrwx    223192 Feb 23 cnvkit
   844 -rwxrwxrwx    863712 Feb 23 collectmultiplemetrics
   336 -rwxrwxrwx    342408 Feb 23 coverage
   272 -rwxrwxrwx    276456 Feb 23 dbsnp
  2252 -rwxrwxrwx   2305216 Feb 23 deepvariant
   652 -rwxrwxrwx    664832 Feb 23 deviceQuery
296520 -rwxrwxrwx 303633183 Feb 23 gatk-package-4.1.0.0-local.jar
    20 -rwxrwxrwx     19197 Feb 23 gatk_cpu
   572 -rwxrwxrwx    584152 Feb 23 genotypegvcf
146552 -rwxrwxrwx 150068464 Feb 23 glnexus
  2132 -rwxrwxrwx   2182792 Feb 23 htvc
   252 -rwxrwxrwx    255960 Feb 23 indexgvcf
   220 -rwxrwxrwx    223168 Feb 23 licenseManagerTool
   324 -rwxrwxrwx    329688 Feb 23 licenseinfo
   220 -rwxrwxrwx    223104 Feb 23 licensereturn
    44 -rwxrwxrwx     42968 Feb 23 markQueryName
    32 -rwxrwxrwx     31848 Feb 23 mergegvcf_humanpar
  1824 -rwxrwxrwx   1867488 Feb 23 mutect
    12 -rwxrwxrwx     10120 Feb 23 pb_driver
  1000 -rwxrwxrwx   1022960 Feb 23 postsort
   316 -rwxrwxrwx    321512 Feb 23 samtoolsmpileup
   340 -rwxrwxrwx    346616 Feb 23 somaticsniper
   432 -rwxrwxrwx    440296 Feb 23 sort
   364 -rwxrwxrwx    370976 Feb 23 splitncigar
  2920 -rwxrwxrwx   2989784 Feb 23 star
   624 -rwxrwxrwx    636896 Feb 23 starfusion
   468 -rwxrwxrwx    477168 Feb 23 trioCombineGVCF
   768 -rwxrwxrwx    784640 Feb 23 variantfiltration
   312 -rwxrwxrwx    317400 Feb 23 varscan
   792 -rwxrwxrwx    809120 Feb 23 vqsr

The germline pipeline seems to just use:

  • /usr/local/cuda/.pb/binaries//bin/bwa mem ...
  • /usr/local/cuda/.pb/binaries//bin/sort ...
  • /usr/local/cuda/.pb/binaries//bin/postsort ...
  • /usr/local/cuda/.pb/binaries//bin/htvc ...

Does the germline pipeline call through the ApplyBQSR process?

Hey,

The BQSR is in the first step in the output log file.

GPU Sorting, Marking Dups, BQSR
ProgressMeter  SAM Entries Completed
Total GPU-BWA Mem + Sorting + MarkingDups + BQSR Generation + BAM writing
Processing time: 287.421802 seconds

It’s not in the banner (which can be confusing), but it’s in the

GPU Sorting, Marking Dups, BQSR
...
Total GPU-BWA Mem + Sorting + MarkingDups + BQSR Generation + BAM writing
...
BQSR and writing final BAM:  55.401 seconds
------------------------------------------------------------------------------
||        Program:                          Marking Duplicates, BQSR        ||
||        Version:                                            v3.5.0        ||
||        Start Time:                       Thu Jun 10 09:05:16 2021        ||
||        End Time:                         Thu Jun 10 09:06:13 2021        ||
||        Total Time:                                     57 seconds        ||
------------------------------------------------------------------------------
...

I mean ApplyBQSR (apply the BQSR report) instead of BQSR (generate the BQSR report). Or does BQSR include ApplyBQSR?

Parabricks v3.5 - doc:

I tried the tools below:

  • case 1: pbrun germline
  • case 2: pbrun fq2bam + pbrun haplotypecaller

    (which not include applybqsr)
  • case 3: pbrun fq2bam + pbrun applybqsr + pbrun haplotypecaller

I found:

  • the VCF results of case1 and case2 are the same.
  • the VCF results of case1 and case3 are not the same.