Hi, yes that is correct, I am trying to effectively run the RNAseq short variant discovery (SNPs + Indels) pipeline. So I am first using rna-fq2bam to STAR align RNAseq reads. I’m then using haplotype caller with the --rna flag to try and call the variants from the bam files.
Is there a problem with using the --rna flag? I understood that was the flag to use if working with RNAseq data.
I should mention I couldn’t fugure out how to create the star index using nvidia fq2bam so I generated that using cpu equivalent (STAR v2.7.2a) before starting.
Here is an example command used to generate the bam
docker run --user 1003:1003 --rm --gpus all --volume /mnt/hp_scratch/Users/dyern/Clara_Parabricks/rna_seq_pop_pexa2:/workdir --workdir /workdir nvcr.io/nvidia/clara/clara-parabricks:4.3.0-1
pbrun rna_fq2bam
–in-fq /workdir/resources/reads/Unknown_BY068-001T0022_1.fq.gz /workdir/resources/reads/Unknown_BY068-001T0022_2.fq.gz
–genome-lib-dir /workdir/resources/reference/star_index
–output-dir /workdir
–ref /workdir/resources/reference/VectorBase-68_AgambiaePEST_Genome.fasta
–out-bam /workdir/results/alignments/Kisumu10.star.bam
–read-files-command zcat 2> logs/parabricks/fq2bam-Kisumu10.log
The log file is rather long (1835 lines) so I’ve just put tha tail of it here to aid readability:
[PB Info 2024-Oct-01 21:39:59] ReadQueue:10, Stage0:0, Stage1:11, Stage1InnerVec:2000, Stage1InnerChunk:10, copyQ:0, poolQ:6, Stage2:0, Stage3 input:0
[PB Info 2024-Oct-01 21:39:59] Thread 0 windowPool:600, transPool:400, Thread 1 windowPool:200, transPool:400, Thread 2 windowPool:600, transPool:400, Thread 3 windowPool:400, transPool:400,
[PB Info 2024-Oct-01 21:39:59] PoolQ:18
[PB Info 2024-Oct-01 21:39:59] reader done
[PB Info 2024-Oct-01 21:40:00] Stage 1 gpu thread done
[PB Info 2024-Oct-01 21:40:00] Stage 1 cpu threads done
[PB Info 2024-Oct-01 21:40:00] Stage 0 gpu thread done
[PB Info 2024-Oct-01 21:40:00] Stage 0 cpu threads done
[PB Info 2024-Oct-01 21:40:00] Stage 0 clearAllQueue done
[PB Info 2024-Oct-01 21:40:00] Stage 1 clearAllQueue done
[PB Info 2024-Oct-01 21:40:00] all workers joined
[PB Info 2024-Oct-01 21:40:00] reader done
[PB Info 2024-Oct-01 21:40:00] … finished mapping
[PB Info 2024-Oct-01 21:40:01] All processing done, time 1322.708
[PB Info 2024-Oct-01 21:40:05] … finished successfully
[PB Info 2024-Oct-01 21:40:05] All done, total running time is 1328.960
[PB Info 2024-Oct-01 21:40:05] Memory free
[PB Info 2024-Oct-01 21:40:05] ------------------------------------------------------------------------------
[PB Info 2024-Oct-01 21:40:05] || Program: star ||
[PB Info 2024-Oct-01 21:40:05] || Version: 4.3.0-1 ||
[PB Info 2024-Oct-01 21:40:05] || Start Time: Tue Oct 1 21:17:56 2024 ||
[PB Info 2024-Oct-01 21:40:05] || End Time: Tue Oct 1 21:40:05 2024 ||
[PB Info 2024-Oct-01 21:40:05] || Total Time: 22 minutes 9 seconds ||
[PB Info 2024-Oct-01 21:40:05] ------------------------------------------------------------------------------
[PB Info 2024-Oct-01 21:40:08] ------------------------------------------------------------------------------
[PB Info 2024-Oct-01 21:40:08] || Parabricks accelerated Genomics Pipeline ||
[PB Info 2024-Oct-01 21:40:08] || Version 4.3.0-1 ||
[PB Info 2024-Oct-01 21:40:08] || Sorting Phase-II ||
[PB Info 2024-Oct-01 21:40:08] ------------------------------------------------------------------------------
[PB Info 2024-Oct-01 21:40:08] progressMeter - Percentage
[PB Info 2024-Oct-01 21:40:08] 0.0
[PB Info 2024-Oct-01 21:40:13] 24.9
[PB Info 2024-Oct-01 21:40:18] 62.2
[PB Info 2024-Oct-01 21:40:23] 99.9
[PB Info 2024-Oct-01 21:40:28] 99.9
[PB Info 2024-Oct-01 21:40:33] 99.9
[PB Info 2024-Oct-01 21:40:38] 99.9
[PB Info 2024-Oct-01 21:40:43] 99.9
[PB Info 2024-Oct-01 21:40:48] Sorting and Marking: 40.003 seconds
[PB Info 2024-Oct-01 21:40:48] ------------------------------------------------------------------------------
[PB Info 2024-Oct-01 21:40:48] || Program: Sorting Phase-II ||
[PB Info 2024-Oct-01 21:40:48] || Version: 4.3.0-1 ||
[PB Info 2024-Oct-01 21:40:48] || Start Time: Tue Oct 1 21:40:08 2024 ||
[PB Info 2024-Oct-01 21:40:48] || End Time: Tue Oct 1 21:40:48 2024 ||
[PB Info 2024-Oct-01 21:40:48] || Total Time: 40 seconds ||
[PB Info 2024-Oct-01 21:40:48] ------------------------------------------------------------------------------
[PB Info 2024-Oct-01 21:40:48] ------------------------------------------------------------------------------
[PB Info 2024-Oct-01 21:40:48] || Parabricks accelerated Genomics Pipeline ||
[PB Info 2024-Oct-01 21:40:48] || Version 4.3.0-1 ||
[PB Info 2024-Oct-01 21:40:48] || Marking Duplicates, BQSR ||
[PB Info 2024-Oct-01 21:40:48] ------------------------------------------------------------------------------
[PB Info 2024-Oct-01 21:40:48] Using PBBinBamFile for BAM writing
[PB Info 2024-Oct-01 21:40:48] progressMeter - Percentage
[PB Info 2024-Oct-01 21:40:58] 24.8
[PB Info 2024-Oct-01 21:41:08] 56.3
[PB Info 2024-Oct-01 21:41:18] 100.0
[PB Info 2024-Oct-01 21:41:18] BQSR and writing final BAM: 30.026 seconds
[PB Info 2024-Oct-01 21:41:18] ------------------------------------------------------------------------------
[PB Info 2024-Oct-01 21:41:18] || Program: Marking Duplicates, BQSR ||
[PB Info 2024-Oct-01 21:41:18] || Version: 4.3.0-1 ||
[PB Info 2024-Oct-01 21:41:18] || Start Time: Tue Oct 1 21:40:48 2024 ||
[PB Info 2024-Oct-01 21:41:18] || End Time: Tue Oct 1 21:41:18 2024 ||
[PB Info 2024-Oct-01 21:41:18] || Total Time: 30 seconds ||
[PB Info 2024-Oct-01 21:41:18] ------------------------------------------------------------------------------
This generates the bam file as expected and when checked with samtools it looks good:
samtools flagstat Kisumu10.star.bam give this output
147705766 + 0 in total (QC-passed reads + QC-failed reads)
10165766 + 0 secondary
0 + 0 supplementary
64833336 + 0 duplicates
147705766 + 0 mapped (100.00% : N/A)
137540000 + 0 paired in sequencing
68770000 + 0 read1
68770000 + 0 read2
137540000 + 0 properly paired (100.00% : N/A)
137540000 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Then I proceeded to run haplotypecaller using the following command:
docker run --gpus all --rm --volume $(pwd):/workdir --volume $(pwd):/outputdir nvcr.io/nvidia/clara/clara-parabricks:4.3.0-1 pbrun haplotypecaller --rna --ref /workdir/resources/reference/VectorBase-68_AgambiaePEST_Genome.fasta --in-bam /workdir/results/alignments/Kisumu10.star.bam --out-variants /workdir/outputdir/Kisumu10.basic_testrun.vcf --log /workdir/outputdir/Kisumu10.basic_testrun.vcf.log
Log file looks like this:
[PB Info 2024-Oct-09 10:48:23] ------------------------------------------------------------------------------
[PB Info 2024-Oct-09 10:48:23] || Parabricks accelerated Genomics Pipeline ||
[PB Info 2024-Oct-09 10:48:23] || Version 4.3.0-1 ||
[PB Info 2024-Oct-09 10:48:23] || GPU-GATK4 HaplotypeCaller ||
[PB Info 2024-Oct-09 10:48:23] ------------------------------------------------------------------------------
[PB Info 2024-Oct-09 10:48:25] 0 /workdir/results/alignments/Kisumu10.star.bam/workdir/outputdir/Kisumu10.basic_testrun.vcf
[PB Info 2024-Oct-09 10:48:25] ProgressMeter - Current-Locus Elapsed-Minutes Regions-Processed Regions/Minute
[PB Info 2024-Oct-09 10:48:35] :0 0.2 0 0
[PB Info 2024-Oct-09 10:48:45] :0 0.3 0 0
[PB Info 2024-Oct-09 10:48:55] :0 0.5 0 0
[PB Info 2024-Oct-09 10:49:05] :0 0.7 0 0
[PB Info 2024-Oct-09 10:49:15] :0 0.8 0 0
[PB Info 2024-Oct-09 10:49:25] :0 1.0 0 0
[PB Info 2024-Oct-09 10:49:35] :0 1.2 0 0
[PB Info 2024-Oct-09 10:49:45] :0 1.3 0 0
[PB Info 2024-Oct-09 10:49:55] Total time taken: 90.290000
[PB Info 2024-Oct-09 10:49:55] ------------------------------------------------------------------------------
[PB Info 2024-Oct-09 10:49:55] || Program: GPU-GATK4 HaplotypeCaller ||
[PB Info 2024-Oct-09 10:49:55] || Version: 4.3.0-1 ||
[PB Info 2024-Oct-09 10:49:55] || Start Time: Wed Oct 9 10:48:23 2024 ||
[PB Info 2024-Oct-09 10:49:55] || End Time: Wed Oct 9 10:49:55 2024 ||
[PB Info 2024-Oct-09 10:49:55] || Total Time: 1 minute 32 seconds ||
[PB Info 2024-Oct-09 10:49:55] ------------------------------------------------------------------------------
This produces a vcf file with no variants (header only) so last line of vcf looks like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
I think what might be happening is that the variants are all being filtered, because the nvidia rna-fq2bam is based on STAR version 2.7.2a. The default MAPQ for unquely mapping reads is 255 (see MAPQ 255 doesn't work with GATK · Issue #1069 · alexdobin/STAR · GitHub).
If I rerun with a flag to remove the filter for MAPQ being available like this:
docker run --gpus all --rm --volume $(pwd):/workdir --volume $(pwd):/outputdir nvcr.io/nvidia/clara/clara-parabricks:4.3.0-1 pbrun haplotypecaller --rna --ref /workdir/resources/reference/VectorBase-68_AgambiaePEST_Genome.fasta --disable-read-filter MappingQualityAvailableReadFilter --in-bam /workdir/results/alignments/Kisumu10.star.bam --out-variants /workdir/outputdir/Kisumu10.noMapQ_filter_testrun.vcf --log /workdir/outputdir/Kisumu10.noMapQ_filter_testrun.vcf.log
Then this does work;
Log file:
[PB Info 2024-Oct-09 10:53:59] ------------------------------------------------------------------------------
[PB Info 2024-Oct-09 10:53:59] || Parabricks accelerated Genomics Pipeline ||
[PB Info 2024-Oct-09 10:53:59] || Version 4.3.0-1 ||
[PB Info 2024-Oct-09 10:53:59] || GPU-GATK4 HaplotypeCaller ||
[PB Info 2024-Oct-09 10:53:59] ------------------------------------------------------------------------------
[PB Info 2024-Oct-09 10:54:02] 0 /workdir/results/alignments/Kisumu10.star.bam/workdir/outputdir/Kisumu10.noMapQ_filter_testrun.vcf
[PB Info 2024-Oct-09 10:54:02] ProgressMeter - Current-Locus Elapsed-Minutes Regions-Processed Regions/Minute
[PB Info 2024-Oct-09 10:54:12] 2L:4766401 0.2 37124 222744
[PB Info 2024-Oct-09 10:54:22] 2L:9883201 0.3 55775 167325
[PB Info 2024-Oct-09 10:54:32] 2L:11400001 0.5 72253 144506
[PB Info 2024-Oct-09 10:54:42] 2L:15187201 0.7 85154 127731
[PB Info 2024-Oct-09 10:54:52] 2L:17318401 0.8 95103 114123
[PB Info 2024-Oct-09 10:55:02] 2L:20496001 1.0 109615 109615
[PB Info 2024-Oct-09 10:55:12] 2L:23937601 1.2 121388 104046
[PB Info 2024-Oct-09 10:55:22] 2L:25521601 1.3 131509 98631
[PB Info 2024-Oct-09 10:55:32] 2L:27820801 1.5 139627 93084
[PB Info 2024-Oct-09 10:55:42] 2L:28238401 1.7 149060 89436
[PB Info 2024-Oct-09 10:55:52] 2L:31785601 1.8 155522 84830
[PB Info 2024-Oct-09 10:56:02] 2L:34108801 2.0 166186 83093
[PB Info 2024-Oct-09 10:56:12] 2L:37876801 2.2 180237 83186
[PB Info 2024-Oct-09 10:56:22] 2L:39705601 2.3 189365 81156
[PB Info 2024-Oct-09 10:56:32] 2L:40300801 2.5 198722 79488
[PB Info 2024-Oct-09 10:56:42] 2L:44059201 2.7 211726 79397
[PB Info 2024-Oct-09 10:56:52] 2L:46339201 2.8 224469 79224
[PB Info 2024-Oct-09 10:57:02] 2L:47481601 3.0 232065 77355
[PB Info 2024-Oct-09 10:57:12] 2L:48604801 3.2 241122 76143
[PB Info 2024-Oct-09 10:57:22] 2R:676801 3.3 248297 74489
[PB Info 2024-Oct-09 10:57:32] 2R:2880001 3.5 257713 73632
[PB Info 2024-Oct-09 10:57:42] 2R:5419201 3.7 268601 73254
[PB Info 2024-Oct-09 10:57:52] 2R:6484801 3.8 281628 73468
[PB Info 2024-Oct-09 10:58:02] 2R:8875201 4.0 291133 72783
[PB Info 2024-Oct-09 10:58:12] 2R:12379201 4.2 301802 72432
[PB Info 2024-Oct-09 10:58:22] 2R:15158401 4.3 319449 73719
[PB Info 2024-Oct-09 10:58:32] 2R:19051201 4.5 336612 74802
[PB Info 2024-Oct-09 10:58:42] 2R:20740801 4.7 342042 73294
[PB Info 2024-Oct-09 10:58:52] 2R:22171201 4.8 353034 73041
[PB Info 2024-Oct-09 10:59:02] 2R:24561601 5.0 360767 72153
[PB Info 2024-Oct-09 10:59:12] 2R:26673601 5.2 374631 72509
[PB Info 2024-Oct-09 10:59:22] 2R:28675201 5.3 381948 71615
[PB Info 2024-Oct-09 10:59:32] 2R:31420801 5.5 395880 71978
[PB Info 2024-Oct-09 10:59:42] 2R:34320001 5.7 409179 72208
[PB Info 2024-Oct-09 10:59:52] 2R:38803201 5.8 431566 73982
[PB Info 2024-Oct-09 11:00:02] 2R:43128001 6.0 447785 74630
[PB Info 2024-Oct-09 11:00:12] 2R:48264001 6.2 469453 76127
[PB Info 2024-Oct-09 11:00:22] 2R:50966401 6.3 484400 76484
[PB Info 2024-Oct-09 11:00:32] 2R:54931201 6.5 499599 76861
[PB Info 2024-Oct-09 11:00:42] 2R:57451201 6.7 512872 76930
[PB Info 2024-Oct-09 11:00:52] 3L:3192001 6.8 558400 81717
[PB Info 2024-Oct-09 11:01:02] 3L:8323201 7.0 570813 81544
[PB Info 2024-Oct-09 11:01:12] 3L:12681601 7.2 588374 82098
[PB Info 2024-Oct-09 11:01:22] 3L:15787201 7.3 603631 82313
[PB Info 2024-Oct-09 11:01:32] 3L:18230401 7.5 614768 81969
[PB Info 2024-Oct-09 11:01:42] 3L:20961601 7.7 624765 81491
[PB Info 2024-Oct-09 11:01:52] 3L:26913601 7.8 648714 82814
[PB Info 2024-Oct-09 11:02:02] 3L:31392001 8.0 663978 82997
[PB Info 2024-Oct-09 11:02:12] 3L:34632001 8.2 675457 82709
[PB Info 2024-Oct-09 11:02:22] 3L:37449601 8.3 690167 82820
[PB Info 2024-Oct-09 11:02:32] 3L:41760001 8.5 708425 83344
[PB Info 2024-Oct-09 11:02:42] 3R:1123201 8.7 716255 82644
[PB Info 2024-Oct-09 11:02:52] 3R:3216001 8.8 723273 81879
[PB Info 2024-Oct-09 11:03:02] 3R:3835201 9.0 731280 81253
[PB Info 2024-Oct-09 11:03:12] 3R:5092801 9.2 744724 81242
[PB Info 2024-Oct-09 11:03:22] 3R:8625601 9.3 757654 81177
[PB Info 2024-Oct-09 11:03:32] 3R:13262401 9.5 777460 81837
[PB Info 2024-Oct-09 11:03:42] 3R:18494401 9.7 794747 82215
[PB Info 2024-Oct-09 11:03:52] 3R:20760001 9.8 811421 82517
[PB Info 2024-Oct-09 11:04:02] 3R:24465601 10.0 825074 82507
[PB Info 2024-Oct-09 11:04:12] 3R:28142401 10.2 838170 82442
[PB Info 2024-Oct-09 11:04:22] 3R:30672001 10.3 848943 82155
[PB Info 2024-Oct-09 11:04:32] 3R:34713601 10.5 858959 81805
[PB Info 2024-Oct-09 11:04:42] 3R:37152001 10.7 874923 82024
[PB Info 2024-Oct-09 11:04:52] 3R:44323201 10.8 890589 82208
[PB Info 2024-Oct-09 11:05:02] 3R:46982401 11.0 909907 82718
[PB Info 2024-Oct-09 11:05:12] 3R:49483201 11.2 918663 82268
[PB Info 2024-Oct-09 11:05:22] UNKN:26428801 11.3 1027564 90667
[PB Info 2024-Oct-09 11:05:32] X:878401 11.5 1075921 93558
[PB Info 2024-Oct-09 11:05:42] X:1780801 11.7 1087566 93219
[PB Info 2024-Oct-09 11:05:52] X:5750401 11.8 1107240 93569
[PB Info 2024-Oct-09 11:06:02] X:8016001 12.0 1122733 93561
[PB Info 2024-Oct-09 11:06:12] X:11654401 12.2 1139458 93654
[PB Info 2024-Oct-09 11:06:22] X:14438401 12.3 1160493 94094
[PB Info 2024-Oct-09 11:06:32] X:19651201 12.5 1199452 95956
[PB Info 2024-Oct-09 11:06:42] AAAB01008965:48001 12.7 1209199 95463
[PB Info 2024-Oct-09 11:06:52] Mt:4801 12.8 1209199 94223
[PB Info 2024-Oct-09 11:07:03] Total time taken: 781.434274
[PB Info 2024-Oct-09 11:07:03] ------------------------------------------------------------------------------
[PB Info 2024-Oct-09 11:07:03] || Program: GPU-GATK4 HaplotypeCaller ||
[PB Info 2024-Oct-09 11:07:03] || Version: 4.3.0-1 ||
[PB Info 2024-Oct-09 11:07:03] || Start Time: Wed Oct 9 10:53:59 2024 ||
[PB Info 2024-Oct-09 11:07:03] || End Time: Wed Oct 9 11:07:03 2024 ||
[PB Info 2024-Oct-09 11:07:03] || Total Time: 13 minutes 4 seconds ||
[PB Info 2024-Oct-09 11:07:03] ------------------------------------------------------------------------------
This produces a vcf file with 872132 variants.