Robin_hood::map overflow in GATK HaplotypeCaller (parabricks 4.0.0)

zhuw10 · November 6, 2023, 7:18pm

Dear Parabricks developers;

I ran haplotypecaller (parabricks 4.0.0) using the command as below:
pbrun haplotypecaller --gvcf --ref ref/Homo_sapiens_assembly38.fasta
–in-bam output/cram/SC501096.cram --out-variants output/gatkhc_pb/SC501096.gatk.g.vcf
bgzip -c output/gatkhc_pb/SC501096.gatk.g.vcf > output/gatkhc_pb/SC501096.gatk.g.vcf.gz
tabix output/gatkhc_pb/SC501096.gatk.g.vcf.gz

And I got segmentation fault with the error message as below:
…
[PB Info 2023-Nov-03 15:42:46] chrY:11304001 95.5 16680519 174665
[PB Info 2023-Nov-03 15:42:56] chrY:56832001 95.7 16683660 174393
terminate called recursively
terminate called recursively
terminate called after throwing an instance of ‘std::overflow_error’
what(): robin_hood::map overflow
[PB ESC[31mErrorESC[0m 2023-Nov-03 15:43:03][-unknown-:0] Received signal: 6
[PB ESC[31mErrorESC[0m 2023-Nov-03 15:43:03][-unknown-:0] [PB ESC[31mErrorESC[0m 2023-Nov-03 15:43:03][-unknown-:0] Received signal: 6
[PB ESC[31mErrorESC[0m 2023-Nov-03 15:43:03][-unknown-:0] Received signal: 11
For technical support visit Help - NVIDIA Docs, exiting.
[PB ESC[31mErrorESC[0m 2023-Nov-03 15:43:03][-unknown-:0] Received signal: 11
For technical support visit Help - NVIDIA Docs, exiting.
Segmentation fault (core dumped)

I had repeated several times and got similar kind of errors as below:
…
[PB Info 2023-Nov-03 17:54:13] chrY:56827201 98.7 16631689 168564
terminate called recursively
terminate called after throwing an instance of ‘std::overflow_error’
what(): robin_hood::map overflow
[PB ESC[31mErrorESC[0m 2023-Nov-03 17:54:16][-unknown-:0] Received signal: 6
For technical support visit [PB ESC[31mErrorESC[0m 2023-Nov-03 17:54:16][-unknown-:0] Received signal: 6
For technical support visit Help - NVIDIA Docs, exiting.
[PB ESC[31mErrorESC[0m 2023-Nov-03 17:54:16][-unknown-:0] Received signal: 11
terminate called recursively
[PB ESC[31mErrorESC[0m 2023-Nov-03 17:54:16][-unknown-:0] Received signal: 6
[PB ESC[31mErrorESC[0m 2023-Nov-03 17:54:16][src/likehood_test.cu:654] cudaSafeCall() failed: driver shutting down, exiting.
[PB Warning 2023-Nov-03 17:54:16][src/regions.cpp:2780] Haplotype length 354 < kmerSize 1446944784

[PB ESC[31mErrorESC[0m 2023-Nov-03 17:54:17][src/likehood_test.cu:654] cudaSafeCall() failed: driver shuttin
g down, exiting.
terminate called recursively
[PB ESC[31mErrorESC[0m 2023-Nov-03 17:54:17][-unknown-:0] Received signal: 6
[PB Info 2023-Nov-03 17:54:23] chrY:56827201 98.8 16632898 168292
[PB ESC[31mErrorESC[0m 2023-Nov-03 17:54:28][-unknown-:0] Received signal: 11
…
For technical support visit Help - NVIDIA Docs, exiting.
[PB ESC[31mErrorESC[0m 2023-Nov-03 17:54:28][-unknown-:0] Received signal: 11
[PB Info 2023-Nov-03 17:54:33] chrY:56827201 99.0 16632898 168009
[PB Info 2023-Nov-03 17:54:43] chrY:56827201 99.2 16632898 167726
…
[PB Info 2023-Nov-06 14:04:08] chrY:56827201 4248.5 16632898 3915
[PB Info 2023-Nov-06 14:04:18] chrY:56827201 4248.7 16632898 3914
[PB Info 2023-Nov-06 14:04:28] chrY:56827201 4248.8 16632898 3914
[PB Info 2023-Nov-06 14:04:38] chrY:56827201 4249.0 16632898 3914
[PB Info 2023-Nov-06 14:04:48] chrY:56827201 4249.2 16632898 3914

Different from the previous one, the job kept running forever (wo segmentation fault).

Below is the information of the nvidia driver installed at our server:
nvidia-smi -L
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-d0bd9105-731d-58af-2b04-27ca2770e0e2)
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-ba02d87e-2d94-d55f-0504-cf980c663070)

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3933025 C …bricks/binaries//bin/htvc 13019MiB |
| 1 N/A N/A 3933025 C …bricks/binaries//bin/htvc 13019MiB |
±----------------------------------------------------------------------------+

Your help is greatly appreciated. Thanks,

Wei

Wei Zhu

gburnett · November 6, 2023, 10:22pm

Hello @zhuw10,

Thank you for posting about your error. I have a few questions / requests:

Do you need version 4.0.0, would it be possible to use the latest version 4.2.0?
You can add the --low-memory flag and see if you get the same error
Would it be possible for you to run on a newer driver? It looks like you’re using 470 and the latest driver version is 525.

Thank you.

zhuw10 · November 7, 2023, 2:03pm

Thank you for the tips and the prompt reply. I cannot update the driver as our HPC cluster is shared by many different users, and I am just one of the users. I may test to use the newer version and/or --low-memory flag.

I had not run into such problem before when using bam files as input. I wonder whether there is some issue related with the cram input files in the use of GATK HaplotypeCaller in Parabricks.

Thanks,
Wei

zhuw10 · November 8, 2023, 4:41pm

By the way, I cannot find any option to use “–low-memory flag” in pbrun haplotypecaller. Could you specify some details?

Thanks,
Wei

gburnett · November 8, 2023, 6:13pm

Hi @zhuw10,

The tool should be able to handle both CRAM and BAM files, unless there is something wrong with the file itself. Can you tell me more about what’s in this CRAM file? Sequencing coverage, size in GB, etc.?

And apologies, the low-memory option is not available for this tool.

You could also maybe try using the --run-parition flag to break up the processing of the file and see if that helps.

zhuw10 · November 13, 2023, 6:09pm

I figured out the cause of the issue: “picard AddOrReplaceReadGroups” had been used to replace read group AND also save the output as cram files. The resultant cram files are problematic. After regenerating cram files using samtools, all work fine now.

Thanks for your helps any way and please close this issue.

Thanks,

Wei

system · November 27, 2023, 6:10pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Encountering Bugs/Errors with Germline pipeline - Seeking Help! Parabricks ai , fq2bam	5	94	January 17, 2025
Parabricks 4.2.0-1 haplotypecaller error - cudaSafeCall() failed - out of memory Parabricks cuda , ai	3	1280	October 23, 2023
Pbrun haplotypecaller running error Parabricks	0	35	December 12, 2024
Parabricks:4.0.0-1 Illegal instruction (core dumped) in haplotypecaller step Parabricks ai	0	872	July 20, 2023
Error on haplotypecaller after minimap2 Parabricks ai	0	713	April 23, 2024
Haplotypecaller producing an empty VCF file Parabricks inception	7	509	October 9, 2024
4.5.0-1 : Haplotype Caller : Number of GPUs requested (2) is more than number of GPUs (0) in the system., exitin Parabricks ai , fq2bam	0	14	September 5, 2025
Clara parabricks deepvariant error Parabricks cuda	6	1260	October 26, 2022
Minimap2 Memory Error with Parabricks 4.3.1 Parabricks ai	4	588	July 23, 2024
Error/Bug In running HaplotypeCaller Parabricks	0	741	February 20, 2024

Robin_hood::map overflow in GATK HaplotypeCaller (parabricks 4.0.0)

Related topics