Dear Parabricks developers;
I ran haplotypecaller (parabricks 4.0.0) using the command as below:
pbrun haplotypecaller --gvcf --ref ref/Homo_sapiens_assembly38.fasta
–in-bam output/cram/SC501096.cram --out-variants output/gatkhc_pb/SC501096.gatk.g.vcf
bgzip -c output/gatkhc_pb/SC501096.gatk.g.vcf > output/gatkhc_pb/SC501096.gatk.g.vcf.gz
tabix output/gatkhc_pb/SC501096.gatk.g.vcf.gz
And I got segmentation fault with the error message as below:
…
[PB Info 2023-Nov-03 15:42:46] chrY:11304001 95.5 16680519 174665
[PB Info 2023-Nov-03 15:42:56] chrY:56832001 95.7 16683660 174393
terminate called recursively
terminate called recursively
terminate called after throwing an instance of ‘std::overflow_error’
what(): robin_hood::map overflow
[PB ESC[31mErrorESC[0m 2023-Nov-03 15:43:03][-unknown-:0] Received signal: 6
[PB ESC[31mErrorESC[0m 2023-Nov-03 15:43:03][-unknown-:0] [PB ESC[31mErrorESC[0m 2023-Nov-03 15:43:03][-unknown-:0] Received signal: 6
[PB ESC[31mErrorESC[0m 2023-Nov-03 15:43:03][-unknown-:0] Received signal: 11
For technical support visit Help - NVIDIA Docs, exiting.
[PB ESC[31mErrorESC[0m 2023-Nov-03 15:43:03][-unknown-:0] Received signal: 11
For technical support visit Help - NVIDIA Docs, exiting.
Segmentation fault (core dumped)
I had repeated several times and got similar kind of errors as below:
…
[PB Info 2023-Nov-03 17:54:13] chrY:56827201 98.7 16631689 168564
terminate called recursively
terminate called after throwing an instance of ‘std::overflow_error’
what(): robin_hood::map overflow
[PB ESC[31mErrorESC[0m 2023-Nov-03 17:54:16][-unknown-:0] Received signal: 6
For technical support visit [PB ESC[31mErrorESC[0m 2023-Nov-03 17:54:16][-unknown-:0] Received signal: 6
For technical support visit Help - NVIDIA Docs, exiting.
[PB ESC[31mErrorESC[0m 2023-Nov-03 17:54:16][-unknown-:0] Received signal: 11
terminate called recursively
[PB ESC[31mErrorESC[0m 2023-Nov-03 17:54:16][-unknown-:0] Received signal: 6
[PB ESC[31mErrorESC[0m 2023-Nov-03 17:54:16][src/likehood_test.cu:654] cudaSafeCall() failed: driver shutting down, exiting.
[PB Warning 2023-Nov-03 17:54:16][src/regions.cpp:2780] Haplotype length 354 < kmerSize 1446944784
[PB ESC[31mErrorESC[0m 2023-Nov-03 17:54:17][src/likehood_test.cu:654] cudaSafeCall() failed: driver shuttin
g down, exiting.
terminate called recursively
[PB ESC[31mErrorESC[0m 2023-Nov-03 17:54:17][-unknown-:0] Received signal: 6
[PB Info 2023-Nov-03 17:54:23] chrY:56827201 98.8 16632898 168292
[PB ESC[31mErrorESC[0m 2023-Nov-03 17:54:28][-unknown-:0] Received signal: 11
…
For technical support visit Help - NVIDIA Docs, exiting.
[PB ESC[31mErrorESC[0m 2023-Nov-03 17:54:28][-unknown-:0] Received signal: 11
[PB Info 2023-Nov-03 17:54:33] chrY:56827201 99.0 16632898 168009
[PB Info 2023-Nov-03 17:54:43] chrY:56827201 99.2 16632898 167726
…
[PB Info 2023-Nov-06 14:04:08] chrY:56827201 4248.5 16632898 3915
[PB Info 2023-Nov-06 14:04:18] chrY:56827201 4248.7 16632898 3914
[PB Info 2023-Nov-06 14:04:28] chrY:56827201 4248.8 16632898 3914
[PB Info 2023-Nov-06 14:04:38] chrY:56827201 4249.0 16632898 3914
[PB Info 2023-Nov-06 14:04:48] chrY:56827201 4249.2 16632898 3914
Different from the previous one, the job kept running forever (wo segmentation fault).
Below is the information of the nvidia driver installed at our server:
nvidia-smi -L
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-d0bd9105-731d-58af-2b04-27ca2770e0e2)
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-ba02d87e-2d94-d55f-0504-cf980c663070)
nvidia-smi
Mon Nov 6 14:06:40 2023
±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03 Driver Version: 470.182.03 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2… On | 00000000:8A:00.0 Off | Off |
| N/A 35C P0 59W / 300W | 13026MiB / 32510MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 Tesla V100-SXM2… On | 00000000:B2:00.0 Off | Off |
| N/A 35C P0 56W / 300W | 13026MiB / 32510MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3933025 C …bricks/binaries//bin/htvc 13019MiB |
| 1 N/A N/A 3933025 C …bricks/binaries//bin/htvc 13019MiB |
±----------------------------------------------------------------------------+
Your help is greatly appreciated. Thanks,
Wei
Wei Zhu