Clara parabricks deepvariant error

i am getting this error while running clara parabricks deepvariant from singularity container,
gpu specs → Tesla V100-PCIE-32GB
run command → singularity run --nv clara-parabricks_4.0.0-1.sif pbrun deepvariant --ref /genome.fa --in-bam fq2bam_output.bam --out-variants output/deepvariant_output.vcf


[PB Error 2022-Oct-11 17:24:15][src/fileHandleCommon.cpp:732] Loop count 4294967277 was too big., expected count < LOOPSIZE, exiting.
[PB Error 2022-Oct-11 17:24:15][./inc/common.h:108] NvInfer ERROR: ../rtSafe/cuda/cudaConvolutionRunner.cpp (483) - Cudnn Error in executeConv: 8 (CUDNN_STATUS_EXECUTION_FAILED), exiting.
/usr/local/parabricks/binaries//bin/deepvariant hg38/genome.fa /path/fq2bam_output.bam 1 -o /storage/scratch2/clara_parabricks/output/deepvariant_output.vcf --model /usr/local/parabricks/binaries//model/70/shortread/deepvariant.eng -n 4 --channel_insert_size --pileup_image_width 221 --max_reads_per_partition 1500 --partition_size 1000 --vsc_min_count_snps 2 --vsc_min_count_indels 2 --vsc_min_fraction_snps 0.12 --min_mapping_quality 5 --min_base_quality 10 --alt_aligned_pileup none --variant_caller VERY_SENSITIVE_CALLER
Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation

also, i am getting a similar error for haplotypecaller as well,

[PB Info 2022-Oct-11 18:15:58] chr5:177456001	1.5	3777815	2518543
[PB Error 2022-Oct-11 18:15:58][src/fileHandleCommon.cpp:732] Loop count 4294967277 was too big., expected count < LOOPSIZE, exiting.
/usr/local/parabricks/binaries//bin/htvc hg38/genome.fa /path/fq2bam_output.bam 1 -o output/haplotype_caller_output.vcf -nt 5
Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation

Hello @devendra.kumar,

Thank you for submitting your question. Do you happen to have the full log file so I can look through it more closely?

Thanks!

Log file

Detected 1 CUDA Capable device(s), considering 1 device(s)
  CUDA Driver Version / Runtime Version          11.6 / 11.2
Using model for CUDA Capability Major/Minor version number:    70
[PB Info 2022-Oct-13 14:27:46] ------------------------------------------------------------------------------
[PB Info 2022-Oct-13 14:27:46] ||                 Parabricks accelerated Genomics Pipeline                 ||
[PB Info 2022-Oct-13 14:27:46] ||                              Version 4.0.0-1                             ||
[PB Info 2022-Oct-13 14:27:46] ||                                deepvariant                               ||
[PB Info 2022-Oct-13 14:27:46] ------------------------------------------------------------------------------
[PB Info 2022-Oct-13 14:27:47] Starting DeepVariant
[PB Info 2022-Oct-13 14:27:47] Running with 1 gpu, each with 4 workers
[PB Info 2022-Oct-13 14:28:01] ProgressMeter -	Current-Locus	Elapsed-Minutes
[PB Info 2022-Oct-13 14:28:07] ProgressMeter -	chr1:10000	0.1
[PB Info 2022-Oct-13 14:28:13] ProgressMeter -	chr1:10000	0.2
[PB Info 2022-Oct-13 14:28:19] ProgressMeter -	chr1:1964000	0.3
[PB Info 2022-Oct-13 14:28:25] ProgressMeter -	chr1:6128000	0.4
[PB Info 2022-Oct-13 14:28:31] ProgressMeter -	chr1:9709000	0.5
[PB Info 2022-Oct-13 14:28:37] ProgressMeter -	chr1:14272000	0.6
[PB Info 2022-Oct-13 14:28:43] ProgressMeter -	chr1:17501000	0.7
[PB Info 2022-Oct-13 14:28:49] ProgressMeter -	chr1:24035000	0.8
[PB Info 2022-Oct-13 14:28:55] ProgressMeter -	chr1:28066000	0.9
[PB Info 2022-Oct-13 14:29:01] ProgressMeter -	chr1:37376000	1.0
[PB Info 2022-Oct-13 14:29:07] ProgressMeter -	chr1:42204000	1.1
[PB Info 2022-Oct-13 14:29:13] ProgressMeter -	chr1:47455000	1.2
[PB Info 2022-Oct-13 14:29:19] ProgressMeter -	chr1:53859000	1.3
[PB Info 2022-Oct-13 14:29:25] ProgressMeter -	chr1:73098000	1.4
[PB Info 2022-Oct-13 14:29:31] ProgressMeter -	chr1:88212000	1.5
[PB Info 2022-Oct-13 14:29:37] ProgressMeter -	chr1:106792000	1.6
[PB Info 2022-Oct-13 14:29:43] ProgressMeter -	chr1:111282000	1.7
[PB Info 2022-Oct-13 14:29:49] ProgressMeter -	chr1:115945000	1.8
[PB Info 2022-Oct-13 14:29:55] ProgressMeter -	chr1:124464000	1.9
[PB Info 2022-Oct-13 14:30:01] ProgressMeter -	chr1:150878000	2.0
[PB Info 2022-Oct-13 14:30:07] ProgressMeter -	chr1:159168000	2.1
[PB Info 2022-Oct-13 14:30:13] ProgressMeter -	chr1:170353000	2.2
[PB Info 2022-Oct-13 14:30:19] ProgressMeter -	chr1:178455000	2.3
[PB Info 2022-Oct-13 14:30:25] ProgressMeter -	chr1:193764000	2.4
[PB Info 2022-Oct-13 14:30:31] ProgressMeter -	chr1:204470000	2.5
[PB Info 2022-Oct-13 14:30:37] ProgressMeter -	chr1:214341000	2.6
[PB Info 2022-Oct-13 14:30:43] ProgressMeter -	chr1:225764000	2.7
[PB Info 2022-Oct-13 14:30:49] ProgressMeter -	chr1:239090000	2.8
[PB Info 2022-Oct-13 14:30:55] ProgressMeter -	chr2:2007000	2.9
[PB Info 2022-Oct-13 14:31:01] ProgressMeter -	chr2:16960000	3.0
[PB Info 2022-Oct-13 14:31:07] ProgressMeter -	chr2:30157000	3.1
[PB Info 2022-Oct-13 14:31:13] ProgressMeter -	chr2:45822000	3.2
[PB Info 2022-Oct-13 14:31:19] ProgressMeter -	chr2:63061000	3.3
[PB Info 2022-Oct-13 14:31:25] ProgressMeter -	chr2:74015000	3.4
[PB Info 2022-Oct-13 14:31:31] ProgressMeter -	chr2:85903000	3.5
[PB Info 2022-Oct-13 14:31:37] ProgressMeter -	chr2:105362000	3.6
[PB Info 2022-Oct-13 14:31:43] ProgressMeter -	chr2:119161000	3.7
[PB Info 2022-Oct-13 14:31:49] ProgressMeter -	chr2:135490000	3.8
[PB Info 2022-Oct-13 14:31:55] ProgressMeter -	chr2:152341000	3.9
[PB Info 2022-Oct-13 14:32:01] ProgressMeter -	chr2:166457000	4.0
[PB Info 2022-Oct-13 14:32:07] ProgressMeter -	chr2:181684000	4.1
[PB Info 2022-Oct-13 14:32:13] ProgressMeter -	chr2:197832000	4.2
[PB Info 2022-Oct-13 14:32:19] ProgressMeter -	chr2:204659000	4.3
[PB Info 2022-Oct-13 14:32:25] ProgressMeter -	chr2:218758000	4.4
[PB Info 2022-Oct-13 14:32:31] ProgressMeter -	chr2:230872000	4.5
[PB Info 2022-Oct-13 14:32:37] ProgressMeter -	chr2:240147000	4.6
[PB Info 2022-Oct-13 14:32:43] ProgressMeter -	chr3:14540000	4.7
[PB Info 2022-Oct-13 14:32:49] ProgressMeter -	chr3:25630000	4.8
[PB Info 2022-Oct-13 14:32:55] ProgressMeter -	chr3:33771000	4.9
[PB Info 2022-Oct-13 14:33:01] ProgressMeter -	chr3:41799000	5.0
[PB Info 2022-Oct-13 14:33:07] ProgressMeter -	chr3:50431000	5.1
[PB Info 2022-Oct-13 14:33:13] ProgressMeter -	chr3:72313000	5.2
[PB Info 2022-Oct-13 14:33:19] ProgressMeter -	chr3:96486000	5.3
[PB Info 2022-Oct-13 14:33:25] ProgressMeter -	chr3:106592000	5.4
[PB Info 2022-Oct-13 14:33:31] ProgressMeter -	chr3:122910000	5.5
[PB Info 2022-Oct-13 14:33:37] ProgressMeter -	chr3:133887000	5.6
[PB Error 2022-Oct-13 14:33:42][src/fileHandleCommon.cpp:732] Loop count 4294967277 was too big., expected count < LOOPSIZE, exiting.
[PB Error 2022-Oct-13 14:33:42][./inc/common.h:108] NvInfer ERROR: ../rtSafe/cuda/cudaConvolutionRunner.cpp (483) - Cudnn Error in executeConv: 8 (CUDNN_STATUS_EXECUTION_FAILED), exiting.
/usr/local/parabricks/binaries//bin/deepvariant hg38/genome.fa /path/fq2bam_output.bam 1 -o output/deepvariant_output.vcf --model /usr/local/parabricks/binaries//model/70/shortread/deepvariant.eng -n 4 --channel_insert_size --pileup_image_width 221 --max_reads_per_partition 1500 --partition_size 1000 --vsc_min_count_snps 2 --vsc_min_count_indels 2 --vsc_min_fraction_snps 0.12 --min_mapping_quality 5 --min_base_quality 10 --alt_aligned_pileup none --variant_caller VERY_SENSITIVE_CALLER --logfile /path/pb_deepvariant_log.txt --append
Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation

Thank you @devendra.kumar,

Are you able to try this run on 2 GPUs, so I can rule out some issues?

@gburnett
We have only one GPU at our facility !

Attached GPUs                             : 1
GPU 00000000:01:00.0
    Product Name                          : Tesla V100-PCIE-32GB
    Product Brand                         : Tesla
    Product Architecture                  : Volta

Though fq2bam runs smoothly !

What is the status of this bug ? @gburnett

Hey Devendra,

It look like your bam file could be corrupted. Can you run samtools quickcheck to verify?