Hello there,
Has anyone benchmarked the parabricks germline workflow with GIAB sample HG002 using hap.py?
Here are the results of my comparison using different options related to best performance and hardware configurations (checking to see if the options related to performance impact the accuracy of the results in any way):
Benchmark gres cpus mem queued_time elapsed_time Type METRIC.Recall METRIC.Precision METRIC.F1_Score
1gpu_high_memory gpu:a100:1,lscratch:500 16 120GB 2:51:42 1:43:06 INDEL 0.942778 0.982765 0.962356
1gpu_high_memory gpu:a100:1,lscratch:500 16 120GB 2:51:42 1:43:06 INDEL 0.942778 0.982765 0.962356
1gpu_high_memory gpu:a100:1,lscratch:500 16 120GB 2:51:42 1:43:06 SNP 0.941673 0.993356 0.966824
1gpu_high_memory gpu:a100:1,lscratch:500 16 120GB 2:51:42 1:43:06 SNP 0.941673 0.993356 0.966824
1gpu_low_memory gpu:a100:1,lscratch:500 16 60GB 1:20 1:51:55 INDEL 0.942778 0.982765 0.962356
1gpu_low_memory gpu:a100:1,lscratch:500 16 60GB 1:20 1:51:55 INDEL 0.942778 0.982765 0.962356
1gpu_low_memory gpu:a100:1,lscratch:500 16 60GB 1:20 1:51:55 SNP 0.941673 0.993356 0.966824
1gpu_low_memory gpu:a100:1,lscratch:500 16 60GB 1:20 1:51:55 SNP 0.941673 0.993356 0.966824
1gpu_normal_memory gpu:a100:1,lscratch:500 16 60GB 1:23 1:42:47 INDEL 0.942778 0.982765 0.962356
1gpu_normal_memory gpu:a100:1,lscratch:500 16 60GB 1:23 1:42:47 INDEL 0.942778 0.982765 0.962356
1gpu_normal_memory gpu:a100:1,lscratch:500 16 60GB 1:23 1:42:47 SNP 0.941673 0.993356 0.966824
1gpu_normal_memory gpu:a100:1,lscratch:500 16 60GB 1:23 1:42:47 SNP 0.941673 0.993356 0.966824
1gpu_normal_memory_optimized gpu:a100:1,lscratch:500 16 60GB 1:16 1:20:27 INDEL 0.942778 0.982765 0.962356
1gpu_normal_memory_optimized gpu:a100:1,lscratch:500 16 60GB 1:16 1:20:27 INDEL 0.942778 0.982765 0.962356
1gpu_normal_memory_optimized gpu:a100:1,lscratch:500 16 60GB 1:16 1:20:27 SNP 0.941673 0.993356 0.966824
1gpu_normal_memory_optimized gpu:a100:1,lscratch:500 16 60GB 1:16 1:20:27 SNP 0.941673 0.993356 0.966824
2gpu_low_memory gpu:a100:2,lscratch:500 32 120GB 2:38:51 41:33 INDEL 0.942778 0.982765 0.962356
2gpu_low_memory gpu:a100:2,lscratch:500 32 120GB 2:38:51 41:33 INDEL 0.942778 0.982765 0.962356
2gpu_low_memory gpu:a100:2,lscratch:500 32 120GB 2:38:51 41:33 SNP 0.941673 0.993356 0.966824
2gpu_low_memory gpu:a100:2,lscratch:500 32 120GB 2:38:51 41:33 SNP 0.941673 0.993356 0.966824
2gpu_normal_memory gpu:a100:2,lscratch:500 32 120GB 1:57:22 40:44 INDEL 0.942778 0.982765 0.962356
2gpu_normal_memory gpu:a100:2,lscratch:500 32 120GB 1:57:22 40:44 INDEL 0.942778 0.982765 0.962356
2gpu_normal_memory gpu:a100:2,lscratch:500 32 120GB 1:57:22 40:44 SNP 0.941673 0.993356 0.966824
2gpu_normal_memory gpu:a100:2,lscratch:500 32 120GB 1:57:22 40:44 SNP 0.941673 0.993356 0.966824
2gpu_normal_memory_optimized gpu:a100:2,lscratch:500 32 120GB 1:22:34 34:32 INDEL 0.942778 0.982765 0.962356
2gpu_normal_memory_optimized gpu:a100:2,lscratch:500 32 120GB 1:22:34 34:32 INDEL 0.942778 0.982765 0.962356
2gpu_normal_memory_optimized gpu:a100:2,lscratch:500 32 120GB 1:22:34 34:32 SNP 0.941673 0.993356 0.966824
2gpu_normal_memory_optimized gpu:a100:2,lscratch:500 32 120GB 1:22:34 34:32 SNP 0.941673 0.993356 0.966824
4gpu_normal_memory gpu:a100:4,lscratch:500 64 240GB 2:45:17 23:51 INDEL 0.942778 0.982765 0.962356
4gpu_normal_memory gpu:a100:4,lscratch:500 64 240GB 2:45:17 23:51 INDEL 0.942778 0.982765 0.962356
4gpu_normal_memory gpu:a100:4,lscratch:500 64 240GB 2:45:17 23:51 SNP 0.941673 0.993356 0.966824
4gpu_normal_memory gpu:a100:4,lscratch:500 64 240GB 2:45:17 23:51 SNP 0.941673 0.993356 0.966824
4gpu_normal_memory_optimized gpu:a100:4,lscratch:500 64 240GB 2:47:02 21:00 INDEL 0.942778 0.982765 0.962356
4gpu_normal_memory_optimized gpu:a100:4,lscratch:500 64 240GB 2:47:02 21:00 INDEL 0.942778 0.982765 0.962356
4gpu_normal_memory_optimized gpu:a100:4,lscratch:500 64 240GB 2:47:02 21:00 SNP 0.941673 0.993356 0.966824
4gpu_normal_memory_optimized gpu:a100:4,lscratch:500 64 240GB 2:47:02 21:00 SNP 0.941673 0.993356 0.966824
While taking a look into this, I noticed that the recall report by hap.py is very low ~0.94.
Here are the exact commands I ran. The only thing not shown here is the fastq files were pre-processed with fastp. The trimmed fastq files were then provided as input to pbrun germline
. If you would like to see the exact commands that were run, please click on the Dry-run with test data tab in this github actions workflow.