Encountering Bugs/Errors with Germline pipeline - Seeking Help!

Hello everyone,

We are encountering some issues with whole-genome sequencing (WGS) analysis using Parabricks Version 4.4.0-1. Has anyone else encountered similar issues? Thank you very much in advance.

Problem Description

When running Parabricks for whole-genome analysis, we observed that some datasets encounter a similar error, while others did not. The specific error message is as follows:

[PB Info 2025-Jan-11 12:40:48] 2:1416001 0.5 1338291 2676582

terminate called after throwing an instance of ‘std::overflow_error’

what(): robin_hood::map overflow

[PB Error 2025-Jan-11 12:40:55][-unknown-:0] Received signal: 6

For technical support visit NVIDIA Clara - NVIDIA Docs, exiting.

Troubleshooting Steps

To further diagnose the issue, we firstly tested the BAM files output from the fq2bam process and found that they could be processed normally using GATK. This led us to suspect that the error is primarily related to the HTVC (HaplotypeCaller) process.

Then, we narrowed down the analysis to specific intervals, which allowed us to reproduce this error. The problematic intervals are primarily located around GL000220.1:143740-143850 and chr2:55884500-55884610. The reference genome sequences in these regions are as follows

>GL000220.1:143740-143850
CAGTTAGTTTTTGTAATTTTTTTTTTTTTTTTTTTTTTTTGAGACGAGGTTTCACCGTGTTGCCAAGGCTTGGACCGAGGGATCCACCGGCCCTCGGCCTCCCAAAAGTGC

>chr2:55884500-55884610
CAGATTAACAAGAATTTTTTTTTTGTTTTTTCTTTTTTTTTAAGACAGAGTTCTGCTCTTGTTGCCCAGGCTGGCGTGCAATGGTGCAATCTCGGCTCACTGCAACCTCTG

We observed that both of these sequences contain long stretches of polyT sequences, but we don’t know if it has anything to do with this error. The bam file related to this question is in https://pan.quark.cn/s/58f79294bc0b , which encountered an error in chr2:55884500-55884610.

Additionally, we have tested the impact of changing the sequence tags and base quality values in the FASTQ files, but these changes did not resolve this error.

Potential Wider Impact

Although we’ve only identified this issue in the GL000220.1:143740-143850 and chr2:55884500-55884610 intervals so far, we suspect that similar problems might exist in other chromosomes or segments.

We are reaching out to see if anyone else has encountered similar issues or has suggestions on how to resolve this. Any insights or advice would be greatly appreciated!

Best regards

Hi @jiangyuzhou

If the fq2bam is working, could you share the following information regarding haplotypecaller? Thanks.

  1. The command you run haplotypecaller.
  2. GPU Type
  3. Input file size.
  1. The command is
pbrun haplotypecaller \
    --ref ${reference_path}/GRCh37.fa \
    --in-bam sub.bam \
    --out-variants sub.vcf
  1. The GPU type is A5000.
  2. We firstly use the whole genome, and the size of the bam file is 13G. Then, we narrowed down the analysis to specific intervals, and the size of the final bam file is 30k (https://pan.quark.cn/s/58f79294bc0b).
    Thank you very much!

Thank you.

Could you confirm if the Compatible GATK4 Command work?
ref: haplotypecaller - NVIDIA Docs

If it does, try to add “–htvc-low-memory” into your PB command. If error persists, please share the complete log. Thanks.

Thanks. GATK does work. We tried “–htvc-low-memory”, but the error still exist. The complete log is as follows. We have only hidden the file paths and haven’t changed any information.

Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation

/usr/local/parabricks/binaries/bin/htvc ${REFERENCE_PATH}/GRCh37.fa ${BAM_PATH}/sub.bam 2 -o ${OUTPUT_PATH}/sub.vcf -nt 5 --low-memory
[PB Info 2025-Jan-13 02:45:13] ------------------------------------------------------------------------------
[PB Info 2025-Jan-13 02:45:13] || Parabricks accelerated Genomics Pipeline ||
[PB Info 2025-Jan-13 02:45:13] || Version 4.4.0-1 ||
[PB Info 2025-Jan-13 02:45:13] || GPU-GATK4 HaplotypeCaller ||
[PB Info 2025-Jan-13 02:45:13] ------------------------------------------------------------------------------
[PB Info 2025-Jan-13 02:45:26] 0 ${BAM_PATH}/sub.bam${OUTPUT_PATH}/low_sub.vcf
[PB Info 2025-Jan-13 02:45:26] ProgressMeter - Current-Locus Elapsed-Minutes Regions-Processed Regions/Minute
terminate called after throwing an instance of 'std::overflow_error'
  what(): robin_hood::map overflow
[PB Error 2025-Jan-13 02:45:28][-unknown-:0] Received signal: 6
For technical support visit https://docs.nvidia.com/clara/index.html#parabricks, exiting.
Exit with error: 1
For technical support visit https://docs.nvidia.com/clara/index.html#parabricks
Exiting...

Could not run haplotypecaller
Exiting pbrun ...

Please try the latest version: clara-parabricks:4.4.0.

It works from my end.