Very low Recall for HG002 germline benchmark

kuhnsa3 · July 8, 2024, 2:58pm

Hello there,

Has anyone benchmarked the parabricks germline workflow with GIAB sample HG002 using hap.py?

Here are the results of my comparison using different options related to best performance and hardware configurations (checking to see if the options related to performance impact the accuracy of the results in any way):

Benchmark                     gres                     cpus  mem    queued_time  elapsed_time  Type   METRIC.Recall  METRIC.Precision  METRIC.F1_Score
1gpu_high_memory              gpu:a100:1,lscratch:500  16    120GB  2:51:42      1:43:06       INDEL  0.942778       0.982765          0.962356
1gpu_high_memory              gpu:a100:1,lscratch:500  16    120GB  2:51:42      1:43:06       INDEL  0.942778       0.982765          0.962356
1gpu_high_memory              gpu:a100:1,lscratch:500  16    120GB  2:51:42      1:43:06       SNP    0.941673       0.993356          0.966824
1gpu_high_memory              gpu:a100:1,lscratch:500  16    120GB  2:51:42      1:43:06       SNP    0.941673       0.993356          0.966824
1gpu_low_memory               gpu:a100:1,lscratch:500  16    60GB   1:20         1:51:55       INDEL  0.942778       0.982765          0.962356
1gpu_low_memory               gpu:a100:1,lscratch:500  16    60GB   1:20         1:51:55       INDEL  0.942778       0.982765          0.962356
1gpu_low_memory               gpu:a100:1,lscratch:500  16    60GB   1:20         1:51:55       SNP    0.941673       0.993356          0.966824
1gpu_low_memory               gpu:a100:1,lscratch:500  16    60GB   1:20         1:51:55       SNP    0.941673       0.993356          0.966824
1gpu_normal_memory            gpu:a100:1,lscratch:500  16    60GB   1:23         1:42:47       INDEL  0.942778       0.982765          0.962356
1gpu_normal_memory            gpu:a100:1,lscratch:500  16    60GB   1:23         1:42:47       INDEL  0.942778       0.982765          0.962356
1gpu_normal_memory            gpu:a100:1,lscratch:500  16    60GB   1:23         1:42:47       SNP    0.941673       0.993356          0.966824
1gpu_normal_memory            gpu:a100:1,lscratch:500  16    60GB   1:23         1:42:47       SNP    0.941673       0.993356          0.966824
1gpu_normal_memory_optimized  gpu:a100:1,lscratch:500  16    60GB   1:16         1:20:27       INDEL  0.942778       0.982765          0.962356
1gpu_normal_memory_optimized  gpu:a100:1,lscratch:500  16    60GB   1:16         1:20:27       INDEL  0.942778       0.982765          0.962356
1gpu_normal_memory_optimized  gpu:a100:1,lscratch:500  16    60GB   1:16         1:20:27       SNP    0.941673       0.993356          0.966824
1gpu_normal_memory_optimized  gpu:a100:1,lscratch:500  16    60GB   1:16         1:20:27       SNP    0.941673       0.993356          0.966824
2gpu_low_memory               gpu:a100:2,lscratch:500  32    120GB  2:38:51      41:33         INDEL  0.942778       0.982765          0.962356
2gpu_low_memory               gpu:a100:2,lscratch:500  32    120GB  2:38:51      41:33         INDEL  0.942778       0.982765          0.962356
2gpu_low_memory               gpu:a100:2,lscratch:500  32    120GB  2:38:51      41:33         SNP    0.941673       0.993356          0.966824
2gpu_low_memory               gpu:a100:2,lscratch:500  32    120GB  2:38:51      41:33         SNP    0.941673       0.993356          0.966824
2gpu_normal_memory            gpu:a100:2,lscratch:500  32    120GB  1:57:22      40:44         INDEL  0.942778       0.982765          0.962356
2gpu_normal_memory            gpu:a100:2,lscratch:500  32    120GB  1:57:22      40:44         INDEL  0.942778       0.982765          0.962356
2gpu_normal_memory            gpu:a100:2,lscratch:500  32    120GB  1:57:22      40:44         SNP    0.941673       0.993356          0.966824
2gpu_normal_memory            gpu:a100:2,lscratch:500  32    120GB  1:57:22      40:44         SNP    0.941673       0.993356          0.966824
2gpu_normal_memory_optimized  gpu:a100:2,lscratch:500  32    120GB  1:22:34      34:32         INDEL  0.942778       0.982765          0.962356
2gpu_normal_memory_optimized  gpu:a100:2,lscratch:500  32    120GB  1:22:34      34:32         INDEL  0.942778       0.982765          0.962356
2gpu_normal_memory_optimized  gpu:a100:2,lscratch:500  32    120GB  1:22:34      34:32         SNP    0.941673       0.993356          0.966824
2gpu_normal_memory_optimized  gpu:a100:2,lscratch:500  32    120GB  1:22:34      34:32         SNP    0.941673       0.993356          0.966824
4gpu_normal_memory            gpu:a100:4,lscratch:500  64    240GB  2:45:17      23:51         INDEL  0.942778       0.982765          0.962356
4gpu_normal_memory            gpu:a100:4,lscratch:500  64    240GB  2:45:17      23:51         INDEL  0.942778       0.982765          0.962356
4gpu_normal_memory            gpu:a100:4,lscratch:500  64    240GB  2:45:17      23:51         SNP    0.941673       0.993356          0.966824
4gpu_normal_memory            gpu:a100:4,lscratch:500  64    240GB  2:45:17      23:51         SNP    0.941673       0.993356          0.966824
4gpu_normal_memory_optimized  gpu:a100:4,lscratch:500  64    240GB  2:47:02      21:00         INDEL  0.942778       0.982765          0.962356
4gpu_normal_memory_optimized  gpu:a100:4,lscratch:500  64    240GB  2:47:02      21:00         INDEL  0.942778       0.982765          0.962356
4gpu_normal_memory_optimized  gpu:a100:4,lscratch:500  64    240GB  2:47:02      21:00         SNP    0.941673       0.993356          0.966824
4gpu_normal_memory_optimized  gpu:a100:4,lscratch:500  64    240GB  2:47:02      21:00         SNP    0.941673       0.993356          0.966824

While taking a look into this, I noticed that the recall report by hap.py is very low ~0.94.

Here are the exact commands I ran. The only thing not shown here is the fastq files were pre-processed with fastp. The trimmed fastq files were then provided as input to pbrun germline. If you would like to see the exact commands that were run, please click on the Dry-run with test data tab in this github actions workflow.

kuhnsa3 · July 24, 2024, 1:56pm

Is there anyone on NIVIDA’s side that can look into this? This seems like a potential issue with the tool that may need to be resolved.

Have you benchmarked parabricks germline callers against any of the GIAB samples using hap.py? This is much lower than I would have expected. If I run a vanilla GATK germline workflow (using The Broad’s GATK) against the same input, I am seeing higher recall (see screen shot below). I also benchmarked the parabricks deepvariant germline caller and its performance is also much lower than the native tool from Google. I am seeing similar recall results with the parabricks germline deepvariant command with the same input sample:

I would be happy to provide any additional information you may need. Please let me know what you think. If some one from NIVIDA could look into this issue, that would be amazing!

Best regards,
@skchronicles

Topic		Replies	Views
Looking for GPU Benchmarking Parabricks	3	493	July 25, 2024
Latest Parabricks Releases Parabricks	4	183	July 24, 2024
PARABRICKS mem from pbrun germline command hanging and not finishing Parabricks	7	1493	July 5, 2022
Could not run fq2bam as part of germline pipeline Parabricks	9	2211	April 8, 2022
Let's run germline pipeline + fq2bamfast on RTX 4090 24GB VRAM with 24 cores CPU and 64GB RAM Parabricks cuda	11	738	June 12, 2024
Does the germline pipeline call through the ApplyBQSR process? Parabricks	3	794	July 9, 2021
Germline pipeline exit in MarkDuplicate step with unknown reasons Parabricks ai , nvidia-smi	0	697	November 12, 2023
"Could not run fq2bam" Is the only verbose output from Parabricks 4.4.0-1 and 4.3.2-1 on tutorial data Parabricks ai , demos-and-tutorials , fq2bam	15	226	March 3, 2025
Could not run fq2bam as part of germline pipeline (Version 4.0.1-1 ) Parabricks ai , nvidia-smi , fq2bam	11	174	December 9, 2024
Robin_hood::map overflow in GATK HaplotypeCaller (parabricks 4.0.0) Parabricks ai	6	991	November 13, 2023

Very low Recall for HG002 germline benchmark

Related topics