Very low Recall for HG002 germline benchmark

Hello there,

Has anyone benchmarked the parabricks germline workflow with GIAB sample HG002 using hap.py?

Here are the results of my comparison using different options related to best performance and hardware configurations (checking to see if the options related to performance impact the accuracy of the results in any way):

Benchmark                     gres                     cpus  mem    queued_time  elapsed_time  Type   METRIC.Recall  METRIC.Precision  METRIC.F1_Score
1gpu_high_memory              gpu:a100:1,lscratch:500  16    120GB  2:51:42      1:43:06       INDEL  0.942778       0.982765          0.962356
1gpu_high_memory              gpu:a100:1,lscratch:500  16    120GB  2:51:42      1:43:06       INDEL  0.942778       0.982765          0.962356
1gpu_high_memory              gpu:a100:1,lscratch:500  16    120GB  2:51:42      1:43:06       SNP    0.941673       0.993356          0.966824
1gpu_high_memory              gpu:a100:1,lscratch:500  16    120GB  2:51:42      1:43:06       SNP    0.941673       0.993356          0.966824
1gpu_low_memory               gpu:a100:1,lscratch:500  16    60GB   1:20         1:51:55       INDEL  0.942778       0.982765          0.962356
1gpu_low_memory               gpu:a100:1,lscratch:500  16    60GB   1:20         1:51:55       INDEL  0.942778       0.982765          0.962356
1gpu_low_memory               gpu:a100:1,lscratch:500  16    60GB   1:20         1:51:55       SNP    0.941673       0.993356          0.966824
1gpu_low_memory               gpu:a100:1,lscratch:500  16    60GB   1:20         1:51:55       SNP    0.941673       0.993356          0.966824
1gpu_normal_memory            gpu:a100:1,lscratch:500  16    60GB   1:23         1:42:47       INDEL  0.942778       0.982765          0.962356
1gpu_normal_memory            gpu:a100:1,lscratch:500  16    60GB   1:23         1:42:47       INDEL  0.942778       0.982765          0.962356
1gpu_normal_memory            gpu:a100:1,lscratch:500  16    60GB   1:23         1:42:47       SNP    0.941673       0.993356          0.966824
1gpu_normal_memory            gpu:a100:1,lscratch:500  16    60GB   1:23         1:42:47       SNP    0.941673       0.993356          0.966824
1gpu_normal_memory_optimized  gpu:a100:1,lscratch:500  16    60GB   1:16         1:20:27       INDEL  0.942778       0.982765          0.962356
1gpu_normal_memory_optimized  gpu:a100:1,lscratch:500  16    60GB   1:16         1:20:27       INDEL  0.942778       0.982765          0.962356
1gpu_normal_memory_optimized  gpu:a100:1,lscratch:500  16    60GB   1:16         1:20:27       SNP    0.941673       0.993356          0.966824
1gpu_normal_memory_optimized  gpu:a100:1,lscratch:500  16    60GB   1:16         1:20:27       SNP    0.941673       0.993356          0.966824
2gpu_low_memory               gpu:a100:2,lscratch:500  32    120GB  2:38:51      41:33         INDEL  0.942778       0.982765          0.962356
2gpu_low_memory               gpu:a100:2,lscratch:500  32    120GB  2:38:51      41:33         INDEL  0.942778       0.982765          0.962356
2gpu_low_memory               gpu:a100:2,lscratch:500  32    120GB  2:38:51      41:33         SNP    0.941673       0.993356          0.966824
2gpu_low_memory               gpu:a100:2,lscratch:500  32    120GB  2:38:51      41:33         SNP    0.941673       0.993356          0.966824
2gpu_normal_memory            gpu:a100:2,lscratch:500  32    120GB  1:57:22      40:44         INDEL  0.942778       0.982765          0.962356
2gpu_normal_memory            gpu:a100:2,lscratch:500  32    120GB  1:57:22      40:44         INDEL  0.942778       0.982765          0.962356
2gpu_normal_memory            gpu:a100:2,lscratch:500  32    120GB  1:57:22      40:44         SNP    0.941673       0.993356          0.966824
2gpu_normal_memory            gpu:a100:2,lscratch:500  32    120GB  1:57:22      40:44         SNP    0.941673       0.993356          0.966824
2gpu_normal_memory_optimized  gpu:a100:2,lscratch:500  32    120GB  1:22:34      34:32         INDEL  0.942778       0.982765          0.962356
2gpu_normal_memory_optimized  gpu:a100:2,lscratch:500  32    120GB  1:22:34      34:32         INDEL  0.942778       0.982765          0.962356
2gpu_normal_memory_optimized  gpu:a100:2,lscratch:500  32    120GB  1:22:34      34:32         SNP    0.941673       0.993356          0.966824
2gpu_normal_memory_optimized  gpu:a100:2,lscratch:500  32    120GB  1:22:34      34:32         SNP    0.941673       0.993356          0.966824
4gpu_normal_memory            gpu:a100:4,lscratch:500  64    240GB  2:45:17      23:51         INDEL  0.942778       0.982765          0.962356
4gpu_normal_memory            gpu:a100:4,lscratch:500  64    240GB  2:45:17      23:51         INDEL  0.942778       0.982765          0.962356
4gpu_normal_memory            gpu:a100:4,lscratch:500  64    240GB  2:45:17      23:51         SNP    0.941673       0.993356          0.966824
4gpu_normal_memory            gpu:a100:4,lscratch:500  64    240GB  2:45:17      23:51         SNP    0.941673       0.993356          0.966824
4gpu_normal_memory_optimized  gpu:a100:4,lscratch:500  64    240GB  2:47:02      21:00         INDEL  0.942778       0.982765          0.962356
4gpu_normal_memory_optimized  gpu:a100:4,lscratch:500  64    240GB  2:47:02      21:00         INDEL  0.942778       0.982765          0.962356
4gpu_normal_memory_optimized  gpu:a100:4,lscratch:500  64    240GB  2:47:02      21:00         SNP    0.941673       0.993356          0.966824
4gpu_normal_memory_optimized  gpu:a100:4,lscratch:500  64    240GB  2:47:02      21:00         SNP    0.941673       0.993356          0.966824

While taking a look into this, I noticed that the recall report by hap.py is very low ~0.94.

Here are the exact commands I ran. The only thing not shown here is the fastq files were pre-processed with fastp. The trimmed fastq files were then provided as input to pbrun germline. If you would like to see the exact commands that were run, please click on the Dry-run with test data tab in this github actions workflow.

1 Like

Is there anyone on NIVIDA’s side that can look into this? This seems like a potential issue with the tool that may need to be resolved.

Have you benchmarked parabricks germline callers against any of the GIAB samples using hap.py? This is much lower than I would have expected. If I run a vanilla GATK germline workflow (using The Broad’s GATK) against the same input, I am seeing higher recall (see screen shot below). I also benchmarked the parabricks deepvariant germline caller and its performance is also much lower than the native tool from Google. I am seeing similar recall results with the parabricks germline deepvariant command with the same input sample:

I would be happy to provide any additional information you may need. Please let me know what you think. If some one from NIVIDA could look into this issue, that would be amazing!

Best regards,
@skchronicles