[v3.8][A100][germline] got stuck in the BWA program

Hi, there,

  • stage: production (ready to deliver to our customer)
  • environment: super node
    • GPU: A100(40GB) x4
    • CPU: 126 cores
    • RAM: 480 GiB
    • Parabricks: v3.8.0-ampere
  • datasets
    • the simplest fastq: R1 & R2
    • WGS: R1 & R2 (44.48x)

We got stuck twice in the BWA program in the germline piepeline.
One case was for the simplest fastq (3 records each for R1 & R2, 150 bases per records)
Another case was for WGS (R1 & R2) (sequencing depth: 44.48x)
(We’ve run the germline pipeline for this WGS more than 20 times.)

The abnormal case like below:

[PB Info 2022-May-27 20:49:53] ------------------------------------------------------------------------------
[PB Info 2022-May-27 20:49:53] ||                 Parabricks accelerated Genomics Pipeline                 ||
[PB Info 2022-May-27 20:49:53] ||                          Version 3.8.0-1.ampere                          ||
[PB Info 2022-May-27 20:49:53] ||                       GPU-BWA mem, Sorting Phase-I                       ||
[PB Info 2022-May-27 20:49:53] ||                  Contact: Parabricks-Support@nvidia.com                  ||
[PB Info 2022-May-27 20:49:53] ------------------------------------------------------------------------------

(get stocks for a long time)

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[PB Info 2022-May-27 20:49:56] GPU-BWA mem
[PB Info 2022-May-27 20:49:56] ProgressMeter    Reads           Base Pairs Aligned
[PB Info 2022-May-27 20:50:26] 5033176          770000000
[PB Info 2022-May-27 20:50:33] 10066352 1540000000
[PB Info 2022-May-27 20:50:40] 15099528 2290000000
[PB Info 2022-May-27 20:50:48] 20132704 3040000000
[PB Info 2022-May-27 20:50:55] 25165880 3790000000

The normal case is:

[PB Info 2022-May-27 20:49:53] ------------------------------------------------------------------------------
[PB Info 2022-May-27 20:49:53] ||                 Parabricks accelerated Genomics Pipeline                 ||
[PB Info 2022-May-27 20:49:53] ||                          Version 3.8.0-1.ampere                          ||
[PB Info 2022-May-27 20:49:53] ||                       GPU-BWA mem, Sorting Phase-I                       ||
[PB Info 2022-May-27 20:49:53] ||                  Contact: Parabricks-Support@nvidia.com                  ||
[PB Info 2022-May-27 20:49:53] ------------------------------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[PB Info 2022-May-27 20:49:56] GPU-BWA mem
[PB Info 2022-May-27 20:49:56] ProgressMeter    Reads           Base Pairs Aligned
[PB Info 2022-May-27 20:50:26] 5033176          770000000
[PB Info 2022-May-27 20:50:33] 10066352 1540000000
[PB Info 2022-May-27 20:50:40] 15099528 2290000000
[PB Info 2022-May-27 20:50:48] 20132704 3040000000
[PB Info 2022-May-27 20:50:55] 25165880 3790000000

Are there any possible causes or clues to the symptom? (We’ve never met the abnormal case for non-ampere versions)

1 Like