DeepVariant is deepsomatic?

Hello,

I am a bit concerned about the usage of deepvariant as it exists in 4.6.0. Whether I call pangenome_aware_deepvariant or deepvariant both seem to actually call deepsomatic.

Deepsomatic has a different training model which is based on cancer tumor tissue, and it is not intended for non-model species. Deepvariant on the otherhand is ok for this purpose.

singularity exec --nv -B $PWD:$PWD parabricks_4.6.0-1.sif pbrun deepvariant --ref ref.fa --in-bam giraffe_out/sample_1.bam --gvcf --out-variants deepVar_out/sample_1_deepvar.g.vcf

[Parabricks Options Mesg]: Setting --num-streams-per-gpu based on available device memory.
Detected 2 CUDA Capable device(s), considering 2 device(s)
CUDA Driver Version / Runtime Version          13.0 / 12.9
Using model for CUDA Capability Major/Minor version number:    89
/usr/local/parabricks/binaries/bin/deepsomatic 2 4 --ref ref.fa --reads sample_1.bam -o sample_1_deepvar.g.vcf -n 6 --model /usr/local/parabricks/binaries/model/80+/shortread/deepvariant.eng -g --channel_insert_size --pileup_image_width 221 --max_reads_per_partition 1500 --partition_size 1000 --vsc_min_count_snps 2 --vsc_min_count_indels 2 --vsc_min_fraction_snps 0.12 --min_mapping_quality 5 --min_base_quality 10 --alt_aligned_pileup none --variant_caller VERY_SENSITIVE_CALLER --dbg_min_base_quality 15 --ws_min_windows_distance 80 --aux_fields_to_keep HP --p_error 0.001 --max_ins_size 1

and pangenome_aware_deepvariant is the same:

singularity exec --nv -B $PWD:$PWD parabricks_4.6.0-1.sif pbrun pangenome_aware_deepvariant --pangenome pangenome.gbz --ref ref.fa --in-bam sample_1.bam –ref-name myPG --out-variants sample_1_deepvar.vcf
Please visit 
 for detailed documentation

[Parabricks Options Mesg]: Setting --num-streams-per-gpu based on available device memory.
Detected 2 CUDA Capable device(s), considering 2 device(s)
CUDA Driver Version / Runtime Version          13.0 / 12.9
Using model for CUDA Capability Major/Minor version number:    89
/usr/local/parabricks/binaries/bin/deepsomatic 2 4 --ref ref.fa --reads sample_1.bam -o deepVar_out/sample_1_deepvar.vcf -n 6 --pangenome pangenome.gbz --model /usr/local/parabricks/binaries/model/80+/shortread/deepvariant_pangenome_aware.eng -long_reads --channel_insert_size --keep_legacy_allele_counter_behavior --keep_only_window_spanning_haplotypes --keep_supplementary_alignments --min_mapping_quality 0 --normalize_reads --sort_by_haplotypes --parse_sam_aux_fields

The pangenome deep variant version is not only invoking deepsomatic, but it is also setting the mode for –long-read despite in the default parameters saying it is short-read based.

Not only this but there is an issue in that the version is not reported. I would like to report the version of deepvariant as well as the parabricks version in a subsequent publication. Right now it is unclear.

Looking for clarity on these discrepancies before proceeding with actual datasets. Is there a reason why deepsomatic is being called instead of deepvariant?

Thanks,

Sam

Hi Sam,

Thanks for your question.

You are correct. The binary we use internally to implement all three versions (deepvariant, deepsomatic, and pangenome_aware_deepvariant) is indeed named deepsomatic. In previous releases we may have used separate binaries internally but we unified the code with the most recent release for maintainability and due to the fact that there is quite a bit of overlap. Although, this is an implementation detail and subject to change in the future. Each mode does use a different model. You can see that with the --model parameter for our binary which uses deepvariant.eng for deepvariant and deepvariant_pangenome_aware.eng for pangenome_aware_deepvariant. Similar for deepsomatic.

Regarding version numbers, the equivalent baseline tool versions are in our release notes: Output Accuracy and Compatible CPU Software Versions - NVIDIA Docs. For Parabricks v4.6.0 we match DeepVariant 1.9.0. You can read more on the release blog post: Improve Variant Calling Accuracy with NVIDIA Parabricks | NVIDIA Technical Blog.

Hope this helps.